best practice for specifying pronunciation?

1,166 views
Skip to first unread message

greg

unread,
Aug 17, 2010, 11:38:26 PM8/17/10
to TTS-for-Android
I'm impressed with Android's default text to speech engine (i.e.,
com.svox.pico). As expected, it mispronounces some words (as do I) and
it therefore occasionally needs some pronunciation guidance. So I'm
wondering about best practices for phonetically spelling out those
words that the pico TTS engine mispronounces.

For example, the correct pronunciation of the bird Chachalaca is CHAH-
chah-LAH-kah. Here is what the TTS engine produces:

mTts.speak("Chachalaca", TextToSpeech.QUEUE_ADD, null); // output:
chuh-KAL-uh-KUH
mTts.speak("CHAH-chah-LAH-kah", TextToSpeech.QUEUE_ADD, null); //
output: CHAH-chah-EL-AY-AYCH-dash-kuh
mTts.speak("CHAHchahLAHkah", TextToSpeech.QUEUE_ADD, null); // output:
CHA-chah-LAH-ka
mTts.speak("CHAH chah LOCKah", TextToSpeech.QUEUE_ADD, null); //
output: CHAH-chah-LAH-kah

Here are my questions.

1) Is there a standard phonetic spelling recognized by the Android TTS
engine?
2) If not, are there some general rules for making custom
pronunciation spellings that will make the spellings more likely to be
correct in future TTS engines/versions?
3) I tried passing an XML document to TextToSpeech.speak() as follows:

String text = "<?xml version=\"1.0\"?>" +
"<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis
\" " +
"xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " +
"xsi:schemaLocation=\"http://www.w3.org/2001/10/synthesis " +
"http://www.w3.org/TR/speech-synthesis/synthesis.xsd\" " +
"xml:lang=\"en-US\">" +

"That is a big car! " +
"That <emphasis>is</emphasis> a big car! " +
"That is a <emphasis>big</emphasis> car! "
"</speak>";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);

Although the TTS engine correctly read only the XML body (i.e., the
comments about the big car), I did not hear any emphasis in the TTS
output. What is the best way to specify emphasis?


- Greg

Johan Wouters

unread,
Aug 31, 2010, 10:37:39 AM8/31/10
to tts-for...@googlegroups.com
Hi Greg,

The Pico engine recognizes the <phoneme> tag with the XSAMPA alphabet.

There are no easy rules to derive a certain pronunciation from the
orthograpy, but you can use intuitive spellings and trial and error.
Capitalizing and hyphens will introduce more problems than solving them.
Using different spellings and introducing extra word boundaries (spaces) can
work.

The emphasis tag and the exclamation mark will not change the synthesis
result. Use <pitch>, <rate>, and <volume> commands instead.

Johan

Here are my questions.


- Greg

--
You received this message because you are subscribed to the Google Groups
"TTS-for-Android" group.
To post to this group, send email to tts-for...@googlegroups.com.
To unsubscribe from this group, send email to
tts-for-andro...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/tts-for-android?hl=en.

greg

unread,
Sep 3, 2010, 12:15:39 AM9/3/10
to TTS-for-Android
Hi Johan.

I'm not getting pico to act upon the <phoneme> tag or any other SSML
tag. Perhaps the problem is due to an XML parsing error that is
reported in logcat. Unfortunately, I'm not seeing this parsing error
in the XML text. Here are some tests I've run. It appears that any
text with a tag produces the "Parser error at line 1: not well-formed
(invalid token)" error entry in logcat. Any tips or links to
documentation would be very much appreciated.

- - -

text = "first hello";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);

says "first hello" and produces the following log entries:

V/TtsService( 516): TtsService.onCreate()
V/TtsService( 516): About to load /system/lib/libttspico.so,
applyFilter=true
V/TtsService( 516): TtsService.setLanguage(eng, USA, )
I/SVOX Pico Engine( 516): loaded en-US successfully
I/SynthProxy( 516): setting speech rate to 100
I/TTS received: ( 1281): first hello
V/TtsService( 516): TTS service received first hello
V/TtsService( 516): TTS processing: first hello
V/TtsService( 516): TtsService.setLanguage(eng, USA, )
I/SVOX Pico Engine( 516): Language already loaded (en-US == en-US)
I/SynthProxy( 516): setting speech rate to 100
I/SynthProxy( 516): setting pitch to 100
W/AudioFlinger( 34): write blocked for 1531 msecs, 44 delayed
writes, thread 0
xb3a0

- - -

text = "<speak xml:lang=\"en-US\">" +
" second hello " +
"</speak>";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);

says "second hello" and produces the following log entries:

V/TtsService( 516): TtsService.onCreate()
V/TtsService( 516): About to load /system/lib/libttspico.so,
applyFilter=true
V/TtsService( 516): TtsService.setLanguage(eng, USA, )
I/SVOX Pico Engine( 516): loaded en-US successfully
I/SynthProxy( 516): setting speech rate to 100
I/TTS received: ( 1324): <speak xml:lang="en-US"> second hello </
speak>
V/TtsService( 516): TTS service received <speak xml:lang="en-US">
second hello
</speak>
V/TtsService( 516): TTS processing: <speak xml:lang="en-US"> second
hello </spe
ak>
V/TtsService( 516): TtsService.setLanguage(eng, USA, )
I/SVOX Pico Engine( 516): Language already loaded (en-US == en-US)
I/SynthProxy( 516): setting speech rate to 100
I/SynthProxy( 516): setting pitch to 100
I/ ( 516): Parser error at line 1: not well-formed (invalid
token)
I/SVOX Pico Engine( 516): Warning: SSML document parsed with errors
I/SVOX Pico Engine( 516): Found supported locale en-US
I/SVOX Pico Engine( 516): Language already loaded (en-US == en-US)
W/AudioFlinger( 34): write blocked for 98 msecs, 45 delayed writes,
thread 0xb
3a0

- - -

text = "<?xml version=\"1.0\" ?>" +
"<speak xml:lang=\"en-US\">" +
" third hello " +
"</speak>";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);

says "third hello" and produces the following log entries:

V/TtsService( 516): TtsService.onCreate()
V/TtsService( 516): About to load /system/lib/libttspico.so,
applyFilter=true
V/TtsService( 516): TtsService.setLanguage(eng, USA, )
I/SVOX Pico Engine( 516): loaded en-US successfully
I/SynthProxy( 516): setting speech rate to 100
I/TTS received: ( 1409): <?xml version="1.0" ?><speak xml:lang="en-
US"> third he
llo </speak>
V/TtsService( 516): TTS service received <?xml version="1.0" ?><speak
xml:lang=
"en-US"> third hello </speak>
V/TtsService( 516): TTS processing: <?xml version="1.0" ?><speak
xml:lang="en-U
S"> third hello </speak>
V/TtsService( 516): TtsService.setLanguage(eng, USA, )
I/SVOX Pico Engine( 516): Language already loaded (en-US == en-US)
I/SynthProxy( 516): setting speech rate to 100
I/SynthProxy( 516): setting pitch to 100
I/ ( 516): Parser error at line 1: not well-formed (invalid
token)
I/SVOX Pico Engine( 516): Warning: SSML document parsed with errors
I/SVOX Pico Engine( 516): Found supported locale en-US
I/SVOX Pico Engine( 516): Language already loaded (en-US == en-US)
W/AudioFlinger( 34): write blocked for 71 msecs, 47 delayed writes,
thread 0xb
3a0

- - -

text = "<?xml version=\"1.0\" ?>" +
"<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/
synthesis\" " +
" fourth hello " +
"</speak>";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);

says "fourth hello" and produces the following log entries:

V/TtsService( 516): TtsService.onCreate()
V/TtsService( 516): About to load /system/lib/libttspico.so,
applyFilter=true
V/TtsService( 516): TtsService.setLanguage(eng, USA, )
I/SVOX Pico Engine( 516): loaded en-US successfully
I/SynthProxy( 516): setting speech rate to 100
I/TTS received: ( 1452): <?xml version="1.0" ?><speak version="1.0"
xmlns="http:
//www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/
XMLSchema-inst
ance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/
TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> fourth hello </
speak>
V/TtsService( 516): TTS service received <?xml version="1.0" ?><speak
version="
1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://
www.w3.org/20
01/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/
synthesis h
ttp://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US">
fourth hel
lo </speak>
V/TtsService( 516): TTS processing: <?xml version="1.0" ?><speak
version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://
www.w3.org/2001/XM
LSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/
synthesis http:/
/www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US">
fourth hello </
speak>
V/TtsService( 516): TtsService.setLanguage(eng, USA, )
I/SVOX Pico Engine( 516): Language already loaded (en-US == en-US)
I/SynthProxy( 516): setting speech rate to 100
I/SynthProxy( 516): setting pitch to 100
I/ ( 516): Parser error at line 1: not well-formed (invalid
token)
I/SVOX Pico Engine( 516): Warning: SSML document parsed with errors
I/SVOX Pico Engine( 516): Found supported locale en-US
I/SVOX Pico Engine( 516): Language already loaded (en-US == en-US)


On Aug 31, 10:37 am, "Johan Wouters" <johan.m.wout...@gmail.com>
wrote:

greg

unread,
Sep 5, 2010, 12:31:30 AM9/5/10
to TTS-for-Android
Thanks for the tip Johan. There are some helpful examples of the
syntax of the phoneme tag at
https://android.git.kernel.org/?p=platform/external/svox.git;a=commitdiff;h=89292811b7fe82e5c14fa13942779763627e26db

// Testing actor
text = "<speak xml:lang=\"en-US\"> Testing <phoneme
alphabet=\"xsampa\" ph=\"&#34;{k.t@`\"/>.</speak>";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);

Oddly enough, even these examples are causing the following logcat
entries:

I/ ( 291): Parser error at line 1: not well-formed (invalid
token)
I/SVOX Pico Engine( 291): Warning: SSML document parsed with errors

However now that I have working examples of the phoneme tag, I am
comfortable ignoring the parser error warnings written to logcat.

- Greg
> ance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesishttp://www.w3.org/

Johan Wouters

unread,
Sep 6, 2010, 11:21:07 AM9/6/10
to tts-for...@googlegroups.com
Hi Greg,

This warning comes from the Expat XML parser integrated in the Pico java
layer. The parser seems to always report "invalid token" even when the xml
document is well formed... Perhaps there are some stray characters at the
end of the input stream inside Expat? It does not affect the outcome of the
parsing, so for the time being ignore the warning.

Reply all
Reply to author
Forward
0 new messages