problem with zero duration audio during word-level alignment

61 views

Skip to first unread message

Willem van der Walt

unread,

Mar 26, 2019, 7:33:38 AM3/26/19

to Alberto Pettarin, aeneas-forc...@googlegroups.com

Good day,
I am using the custom TTS Speect.
When Speect does not know how to say something, it returns no audio.
I now had some text containing Japanese characters which our TTS cannot
say.
If I try to return zero duration audio from the custom script, the word
level alignment fails as per the attached log fragment.

It looks to me like audiofile.py produces the error when it test for no
samples in the returning values, but I am not sure.
In cases where I have no audio to return, what can I return that will keep
the rest of Aeneas happy?
I now have:
if not audio:
return (True, (TimeValue("0.000"), 16000, "pcm16",
numpy.array([])))

Something similar seems to work fine when the "word" is only
punctuation, e.g. when there is an ellipses symbol in the text, however
then I just return the following:
return (True, (TimeValue("0.000"), None, None, None))

Although I am unlikely to encounter lots of Japanese characters, I will
likely encounter some other things our TTS produces no audio for and which
is not punctuation.
Any ideas?
Kind regards, Willem