Evaluation

Evaluation of louis

At ThreeDots we’re committed to delivering the best possible product to our customers. Beside the user experience research we have conducted, we have done extensive evaluation testing to ensure the reliability and optimality of the hardware.

User Research Evaluation

RNIB Research

In order to make the user experience as good as possible and meet customer expectations, we performed user research. After being granted ethical approval from the School of Informatics we arranged a group interview with six visually impaired people through the Royal National Institute of Blind People Lothian. The RNIB is a UK charity offering information, support and advice to almost two million people in the UK with sight loss. All of the participants agreed that knowing braille has been a huge help in their daily lives with one of them stating, "[I] use [braille] for everything; I couldn't do without my dots". Out of the six, only five could read braille. All five of them used the basic grade 1 braille daily for a variety of tasks: finding out the name of medicine they're about to take, labeling objects and clothes, or writing down phone numbers. From the User Research we incorporated the following five points in our design:

Existing refreshable braille devices are very expensive to purchase and thus not easily accessible. Furthermore, current devices can be easily damaged (e.g. when dropped or stood on) and although people could work around a broken cell, they eventually have to send them away to be repaired, which is annoying and very expensive. Our modular design gets rid of this problem. With louis you have the ability to use as many cells needed and to remove and add them instantly. This limits repairs to modules instead of the whole device and along with the low cost of the cells themselves, makes louis very affordable.
Voice integration can be very helpful, especially for tutoring apps. It should be as simple as possible and not frustrating to use. We added voice integration to our system and made it intuitive enough to use by anyone. Our voice commands are based on words that people would normally use to perform certain tasks, like: ‘open’ an app, ‘options’ to get a list of the possible voice commands, etc.
Jumbo braille is ideal for learning. Smaller sizes are used once someone is a more advanced user of braille. This is due to the fact that people might not be sensitive enough to feel six or eight pins on just the tip of one finger. We have taken this into account when redesigning the disk and pin sizes.
There are not many technological tools available to help people learn braille. Braille is also not something one can attempt on their own without help. We designed our product to be especially accessible to beginners, both with regards to the size and the variety of simple applications that make learning braille easier than ever before.
Due to the size of the cells, it might be difficult to find exactly the position of the pins. Hence, we used a guiding rail which would lead the user’s finger onto the pins.

We had already arranged for a second meeting, in order to receive feedback on whether we had indeed incorporated all the requirements in our design, so that our device was easy to use and usable. The meeting was to take place before the start of the fourth and final iteration. This would have given us the opportunity to fix any mistakes in both the hardware and software design. The users were to run louis and report on the following:

Whether it met their previous requirements and expectations
Whether it was easy to use
Whether it was ergonomic
What they believe could be improved

Unfortunately this meeting was not possible due to the current situation with COVID-19. Instead, each member of our team individually tested the software using our interactive demo.

Hardware Evaluation

Braille Output Accuracy

One of the most important aspects of louis is an accurate braille output. Without it, the main purpose of louis - teaching braille - could not be achieved. Many steps have been taken to ensure that this essential goal was accomplished:

Since we were all unfamiliar with the braille alphabet, we first ensured that our braille resources were accurate. We discovered that there are many different flavours of braille, so we settled on the Unified English Braille Code. UEBC is an English language Braille code standard.

The firmware must pass the motors the right rotation angles to arrive at the right braille output. We have assured the correctness of the firmware logic by adding assertions to the code. Both the small disk’s and big disk’s assertions have all returned true for random sequences of 26000 characters. We also automatically run unit tests, which have not failed either for the current software iteration.

The motors must be calibrated carefully in order to give accurate output. We have established that once calibration is done precisely, the motors do not introduce any inaccuracies. In the rare event of a calibration error, the user can manually recalibrate using the Calibration app that comes with the device. Instructions on how to use the app are given in the User Guide.

The motor’s rotations need to be accurate. Problems we ran into were the following:

The motors were overshooting the rotation, as depicted in Figure 1:
Figure 1: Iteration 1
Our second iteration fixed this:
Figure 2: Iteration 2
However, Figure 2 still exhibits long settling time, which, in turn, compromised the accuracy of the motor because the motor would get stuck trying to settle from a small displacement angle.

The movement of the motor was tuned to optimality using a proportional-integral-derivative (PID) controller. The goal of tuning the PID control for the motors was to maximize accuracy while also maximizing the speed of the motor and achieving repeatability for other motors.

The final result is depicted in Figure 3:
Figure 3: Iteration 3
A diagram of all iterations together, to show our progress, can be found in Figure 4:
Figure 4: Motor control using PID control technique
Next rotations were sometimes set in motion before the motor was in the correct position. All cells are now checked for their position and only when they have arrived at their destination is the next rotation started.
The motor rotates in one direction to set the big disk in the correct position, and then moves in the opposite direction to set the small disk. When setting the big disk, we noticed it would be slightly thrown forward when the rotation finished. We experimentally found the right amount of friction - enough to prevent the throwing, but avoiding the big disk from being moved by the small disk’s rotations. We perfected the solution by adding minute bumps to the big disk’s bottom at every 15° angle, which slotted into corresponding bumps on the resting surface.

All these measures have helped us achieve perfect accuracy. We evaluated our hardware on random orders of the alphabet characters. The following table summarises the tests we’ve run:

Test	Alphabet String	Errors
1	cntbrkyhwloevzgspafmdjuiqx	0
2	czwortdegjxqnmypfhvilsbuka	0
3	nwoyivqskjdhmlczpufxgrbeta	0
4	fbdmuistacywrlknzhqvxgeopj	0
5	lazgcdjokmwnufeprtivqhsybx	0

Output has been evaluated by looking at the dots and feeling the dots with the tip of the index finger. Members of our testing team have learned the braille alphabet and kept the UEBC chart at hand to double-check the displayed characters.

Hardware Evaluation

Rendering Speed

Another important aspect of louis is the character rendering speed. A feature that supports speedy render times is that cells can move in parallel, so all characters of a line can render at the same time. Thus the render time of a line of text is equal to the render time of a single character.

The tuning we applied to the motor control ensures that movement is done as efficiently as possible. The added friction helps enable fast rotation. As described on the How it Works page, the small disk has all 8 dot combinations twice and the big disk three times. This in turn allows us to have the two catches only 90° apart. Combined, this gives us the optimal rotation angles when printing characters, thus significantly increasing rendering speed.

Possibly a catch spacing of 120° or 180° could be more optimal, since it increases flexibility for the starting direction of rotations. We performed tests to compare the options. Every test evaluates 1000 random orders of the full alphabet, so a total of 26000 characters. It calculates the rotation angles for each character in each order, sums the angles per order, and returns the average of angles over all 1000 orders. The results are summarised in the following table:

Catch Spacing	90°		120°		180°
Test	Average (°)	Time (s)	Average (°)	Time (s)	Average (°)	Time (s)
1	3994.037	0.5031	4079.551	0.5074	4146.914	0.5341
2	3968.457	0.5146	4072.168	0.5121	4137.051	0.5379
3	3975.298	0.5033	4076.869	0.5115	4133.797	0.5393
4	3974.185	0.5079	4093.050	0.5378	4148.411	0.5696
5	3975.256	0.5141	4066.755	0.5220	4146.529	0.5420
6	3969.950	0.5162	4081.114	0.5158	4155.582	0.5432
7	3958.991	0.5321	4073.897	0.5101	4160.323	0.5454
8	3989.999	0.5072	4079.251	0.5105	4137.470	0.5532
9	3983.616	0.5214	4093.560	0.5278	4155.187	0.5609
10	3954.705	0.5015	4075.397	0.5204	4157.241	0.5511
Total Average	3974.449	0.5121	4079.161	0.5175	4147.851	0.5477
Average per Character	152.863	0.000020	156.891	0.000020	159.533	0.000021

We concluded that 90° is indeed optimal. On further inspection we confirmed that there are cases where 120° or 180° result in smaller rotation angles for a character, but on average 90° comes out on top.

The average rotation angle per character is 153°. The time it takes to calculate the optimal rotation angles and pass it on to the hardware is negligible (0.000020s), so the render times will be determined by the motor. From Figure 4 you can see we have optimised the movement speed and acceleration and deceleration speed. Unfortunately because we can no longer access our device due to the current global pandemic, we did not have the opportunity to officially record character render times. We know, however, that our hardware supports the highest speed that the motor supports. On the official LEGO® site it states that the motor is able to do 240-250 rotations per minute. This means 0.25 seconds per 360°. We estimate that the two acceleration periods and the two deceleration periods per character render add an additional 0.1 seconds each. The average 153° is then translated into an average of 4 * 0.1s + 153°/ 360° * 0.25s = 0.506 seconds per character render. This matches our observations when testing the hardware in general.

Hardware Evaluation

Reliability

We have ensured general reliability of all hardware and firmware. The final design has eliminated fragile parts, printing the dots directly and securely onto the disks, unlike current braille devices which work with pins and break easily. Disks are fastened tightly to the axel and the whole structure is supported so that any wiggle is strictly limited.

It is straightforward to click cells together and attach them to the main controller. The cell discovery protocol has never failed and always returns the accurate number of cells. Button presses are reliably registered and passed to the main controller.

Hardware Evaluation

User Experience

Since our target users are the visually-impaired, we were posed an interesting challenge for carrying out the UX tests after our RNIB research session. To alleviate these problems, we followed the methodology found in a study (Law and Vanderheiden, 2000) which assumes that testing UX on blindfolded (sighted) subjects is a good way to gain useful insights into possible UX problems without compromising extrapolability to the visually impaired audience:

"As part of an investigation aimed at reducing costs in user testing of people with disabilities, a user test was conducted to compare the differences between a group of 15 blind, and 15 blindfolded (sighted) subjects using a touchscreen public information kiosk that was intended for use by people who cannot see. The number and type of problems found by each group were compared, and it was found that the results between each group were mostly similar"

[Reference: Law, C. M. & Vanderheiden, G. C (2000). Reducing Sample Sizes When User Testing with People Who Have, and Who are Simulating Disabilities - Experiences with Blindness and Public Information Kiosks. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 44(26), 157–160. URL https://doi.org/10.1177/154193120004402607]

When performing our own user experience evaluation relating to the hardware, we were satisfied with the design choices we made. When we were informed in our interview session that the size of a regular 3x2 LEGO brick (24×16mm) is ideal for learning braille, we matched our braille output to these dimensions. The dots are easily distinguishable, have a comfortable feel and the small gap between the big and small disk is not tangible. The guide rail, which we also implemented in response to the research session, is a good aid to move on from one character to the next.

Hardware Evaluation

Main Areas of Improvement

We are very satisfied with our hardware design and have optimised in all main areas. A possible expansion is to also provide regular sized braille cells. This is to aid the more advanced users, since those can also enjoy all the non-educational apps that our open SDK supports. A smaller cell can easily be supported since the connector mechanism will be the same and miniaturization of the other parts is trivial.

Software Evaluation

Speech Recognition

It is essential that speech recognition is both fast and accurate, since this is the main way of communicating with louis. State-of-the-art speech recognisers can sometimes still have a hard time discerning the intended phrases, so we spent a lot of effort towards optimizing this feature. During development, we noticed that the particular microphone used in testing has a big effect on the results. Unfortunately that is why we weren’t able to do final quantitative evaluation tests, because we no longer have access to the hardware due to the global pandemic. Of course we have tested the speech recognition extensively on our laptops, so we would like to present those findings here:

Having tested various open source speech recognition APIs, we settled on Google Speech Recognition. We tweaked parameters like adjusting for background noise until we got the least errors.
We quickly discovered that we should keep the voice commands as simple as possible. The user can control all aspects of the device with just one-word and two-word command, which makes the speech recognition a lot easier and more reliable.
For a specific app, the Tutor app, we encountered a common problem: recognising spelled letters. The Tutor app tests the user’s knowledge on the individual braille alphabet characters. It is very challenging to discern ‘b’ from ‘bee’ for example, even as humans. We came up with a solution where the user can give their answers by using any word starting with the perceived braille character, e.g. ‘book’ for ‘b’, ‘cat’ for ‘c’.
Another useful functionality we developed is the await_response function, which lets apps pass through a list of responses that they could expect to hear from the user. For example if the app expects a response to a simple question like “Do you want to do this?” the app can pass the potential responses of “yes” and “no”. Until the user gives one of the options, it will continue to listen for them.
The user can ask for the possible responses by saying ‘options’. This means the user will not be constantly interrupted by ‘Invalid option’, even if the speech recogniser fails to parse the audio input correctly. The user can conveniently try again, and will receive feedback only when they appear to be stuck.
When commands like ‘exit’ are given, which break up the app flow, the user is asked for confirmation of the action. In case the speech recogniser was mistaken, the user is conveniently returned to their current activity.

We are pleased with the overall performance. The interaction flow is pleasant, despite the occasional misinterpreted audio input. louis is able to recover from such errors without being a bother to the user. We cannot rule out some minor frustrations in dealing with the device, but are confident that it is fully operational and usable.

Software Evaluation

Audio Output

When evaluating the speech synthesis we were reasonably happy with the way it sounded. We discovered some solvable errors in the audio output, for example spelling the letter ‘a’ would not give the output ‘ay’ but ‘ah’. We implemented our own pronunciation library to cover such cases.

Software Evaluation

Educational Apps

We extensively tested all our apps and made many improvements along the way. For our educational apps, we drew from established sources on how to teach braille.

The Learn app is our basic teacher and has been successful in teaching members of our team the braille alphabet.
The Tutor app evaluates the knowledge acquired from the Learn app and reinforces the learning of characters that the user gets wrong. We have noticed our own test results improving gradually.
The Riddles App was developed to stimulate learning, by giving the user interesting and comical riddles with answers that they can only find out by reading braille. This motivational approach has been previously adopted by Royal Blind, Scotland’s largest vision impairment organisation.
The Memory App implements the traditional Memory card game, for one or two players, where the ‘cards’ are individual cells corresponding to a braille alphabet character. The team has had a lot of fun practising and improving our braille!

Software Evaluation

Open SDK

One of the applications we developed was the “Headlines” app, which displays BBC news headlines and an optional short summary. The main purpose of the development of this app is to evaluate if our SDK is user-friendly for 3rd-party developers with little knowledge of our hardware. It was decided that the developer of this app would pretend as though they were a developer from the BBC. This was a useful exercise as there were a few functions that we were able to pinpoint as necessary from the perspective of a 3rd-party developer. We believe this test has helped us make the open SDK developer experience as straightforward as possible, and also provides our users with a valuable application!

Software Evaluation

Main Areas of Improvement

From our user research session, we received the feedback that for audio output a human voice would be preferred, as "synthetic voices can get tedious". We have recorded the user guide, but did not have time enough to record all the sentences louis says. Adding this will be easy to implement. The software already supports this feature, so developers adding apps to our open SDK can provide their own audio files if they wish to do so.