Textaloud Software

0 views

Skip to first unread message

Otilia Mojarro

unread,

Aug 4, 2024, 4:43:19 PM8/4/24

to fracovalex

Theuse of computer generated, synthetic speech is getting moreand more mainstream. Popular operating systems such as Windows andMacOs X include built-in text to speech (TTS) capabilities andsynthetic speech is also used in automated weather reports andnumerous telephone services to name but a few applications. Textfile readers are a common class of speech apps and as a nifty extramany of them let you turn text files into spoken audio directly.The audio file or files can then be listened on the road using somehardware mp3 player, efficiently stored on the computer for laterlistening or burned on a set of regular audio Cds.

As to the usefulness of such applications, I'd roughly dividethe user base in two. Firstly, there are ordinary people who wantto save a bit of time and give their eyes a rest, learn how aforeign language should be pronounced or simply think syntheticspeech is cool. The other class are people who must rely onsynthetic speech on a daily basis such as sight impaired screenreader users. I'm personally a sight impaired power user who knowsa lot about synths and audio and is also familiar with programming,so I'll be concentrating on aspects that usually get very littleattention, such as keybord usability, screen reader accessibility,batch conversion capabilities and transparent SAPI 4 and SAPI 5support. This page is a review of Nextup TextAloud which is apopular try-before-you-buy text reader for Windows. I'm onlycovering the features I found most useful, though I do attemptdescribing my experiences in a detailed and objective manner. I'musing the TextAloud trial version and the version number at thetime of writing is 2.064.

Reading text files out loud and changing voices is very easy.The user interface has a familiar text box for editing andselecting text, from now on commonly referred to as the texteditor. Above this control is a toolbar-like portion of the windowthat let's you quickly select the desired voice and its parametersas well as read text aloud or dump it directly to an audio file.The voice controls are surprisingly accessible; there are pushbuttons with labels for common actions and all controls appear tobe part of the tab order. Oddly enough, however, pressing shift+tabmultiple times only goes through few of the controls and returns tothe text editor after the title text field. The correct way to tabaround, though it is slightly counter intuitive, is to pressshift+tab to get to the voice controls and then tab forwards. Thecurrently selected voice isn't available in the menus but can beset in the options dialog in addition to the list box in the voicecontrols.

Including a voice panel right in the document window can save agreat deal of time in contrast to some freeware programs that onlyoffer the voice parameters in preferences. This is because despiteMicrosoft's standardization efforts, synths have differing notionsof speaking rate, pitch and volume values and often you need totweak the speed a bit to get it just right.

In addition to reading the whole document, the speak menu offersother choices such as reading the selection or starting from thecursor. However, there's no read paragraph or read to cursor optionboth of which are commonly seen in screen reading software. Anotherminor gripe is that while the voice is speaking, there are no meansof quickly stepping back of forward in units of words, sentences,paragraphs and so on. Access to such real time navigation usinghotkeys would be highly useful because it is easy to mis a sentenceor wonder what a badly pronounced word really is. Sure you canpause the voice, even using the menus, manually go bakc and finallyre-initiate reading but it just isn't very convenient.

On the bright side, controlling voice parameters like speed andpitch is possible even during playback and works better than Ianticipated. But there's a significant flaw in keyboard usability.That is when you change the speed slider on the keyboard, as soonas the change takes effect, the keybord focus jumps to the texteditor in stead of remaining in the slider being adjusted. Thismakes real time speed adjustment from the keyboard a very slow andfrustrating procedure. As a work-around, the voice parameters canbe changed one step at a time in the edit menu of the program.Additionally, I wish the voice sliders had accompanying text fieldsfor typing in the desired values directly provided that you do knowthe exact value you'd like.

Typing in text is not the only way to read it, you can alsocapture stuff from the clipboard automatically as well as openfiles in various formats. In adition to the obligatory plain text,clean plain text conversions of RTF, DOC, PDF and HTML files aresupported and they work moderately well. Not all doc or RTF filesseem to be openable and some of them throw a class not registeredexception, however.

One major point in usability is speaking a language that theuser understands and making the error messages clear andsupportive. A straight error text from windows is a good example ofhow not to do things. Such error messages are not unique tounsuccesfully opening doc and RTF files but plague many of thespeech related errors, too. Technical information can help introuble shooting situations and it should be included. However, itought not to be the primary focus of the message as far as theaverage end user is concerned. Even as a programmer, the onlysignificant detail I've learned from the errors is that TextAloudis using Microsoft's OLE Automation for controlling speech synths,the same API I've been using from Perl.

Another short coming of the import facility is that even if weare dealing with a text editor, big and little endian unicodedoesn't seem to be properly importable. Mac Unix and DOS text seemsto be, however, even without user involvement, cool. By defaultcapturing the clipboard will also prompt the user but this promptcan be disabled so data is copied directly to TextAloud. Thetrouble with the clipboard capture confirmation dialog is that itjumps in the foreground with a very short time out yet the keyboardfocus is not moved to the dialog. This makes it very difficult forsight impaired persons to interact with the clipboard dialog andthe time out is by default all too short for screen reader users toget a picture of what's going on.

Reading the text in TextAloud is a breeze. There's a dedicatedbutton for it and a choice for the same thing in the Speech menu.The only thing you need to provide is the folder in which the audiofile should be saved and even this prompt can be disabled in theoptions. The old DOS term directory is used needlessly, though, andthe ability to specify a different audio file name would have beennice. Fortunately, the defaults are smart using the same base nameas an imported file or the first few words of the text. Anotherthing missing in the dialog that starts audio file writing is thechoice of format. While you can usually go with the defaults andthe format can be changed in the options, sometimes aper-conversion override of the defaults could come in handy. Twoexamples are creating both a high and low quality version forstreaming or previewing different amounts of compression. Adedicated preview function for compressed audio files would be awelcome addition, too, Though sure you can't have everything.

One thing I see as a definite advantage of TextAloud is theamazing audio writing speed. It can be adjusted between 1 x and 150x where 1x corresponds to the speed you'd get on recording thespoken audio directly. Naturaly, higher values take more CPU time.But even the highest takes about 30 percent on a fast machine sochances are it could go even faster if the implementation permited.Extremely fast reading to a wave file is a true time saver and notfound in any of the free text readers I've come across. A big plusfor this feature alone.

As far as audio formats go, uncompressed wave files withdifferent sampling rate and channel options, as well as mp3 and wmafiles are supported unlike in most, if not all, of the freecompetitors. The choice of audio parameters is very wide, thoughoddly enough mp3 files can be written in stereo which is rarelynecessary and most likely a user mistake. When you select the WMAoption, a message box pops up leading you to the necessary WMAdownload. While this is nice, the means of providing theinformation go against common usability principles. A keyboard userusually cursors through the format list and when arriving at WMA,the focus is suddenly taken away from the list, which prevents youfrom cursoring over the WMA choice should there be other formatsafter it. It would have been a lot subtler to handle this specialcase without focus steeling. One way is showing the message in aread-only text box and disabling the OK button, when WMA isselected.

Another minor gripe is that not all Windows supported audioformats can be selected. You cannot select the mp3 codec(compressor, decompressor) used and other wave varients likeTrueSpeech or ADPCM, are unsupported. While it is true mp3 and wmaare the most popular formats, even SoundRecorder is able tointerface with the rest of the codecs on offer. For an even widerpalette of format support, running external encoder programs suchas oggenc ought to be supported. However, it should be noted thatone can convert to wave first and then do any desired postprocessing afterwords, so the lack of sound formats is not a realissue. One small but elegant touch is that if desired, TextAloudcan make the sampling rate, dynamics and number of channels used inthe audio file match those of the speech synth automatically. As acase study, AT&T Natural Voices use 16 kHz where as MicrosoftVoices use 22 kHz, for example. Not requiring the user to know thiskind of detailed info, is a sign of a smart program.

Another slight niggle of mine is that internally the audio isfirst written to a file called temp.wav, which is then converted tothe desired format if necessary and deleted when processing hascompleted. While this isn't a problem for most people, it does meanyou'll have to have the disk space for the uncompressed audiounlike in some CD Rippers, namely Cdex. I see two ways of gettingaround the problem. The first is to allocate physical RAM for smallfiles up to a user configurable limit. The second way is feeding asmall buffer of audio data through the codec in real time like mp3Audio recorders do. Again the lack of straight mp3 or wma writingis no biggy unless you are converting large files and are tight ondisk space.