Word level time stamp creation or word level alignment.

bharathydasan m

unread,

Nov 3, 2015, 6:38:00 AM11/3/15

to aeneas-forced-alignment

Hi,

I am trying to use the aeneas to create time stamp for each words in the audio files, it creates the time stamps but the accuracy is very poor. Is there any special ways to find the time stamps with high accuracy level?

i exported the time stamps to audacity to validate them.

Kindly help.

Thanks,
Bharathydasan.

Alberto Pettarin

unread,

Nov 3, 2015, 11:10:28 AM11/3/15

to aeneas-forc...@googlegroups.com

On 11/03/2015 12:38 PM, bharathydasan m wrote:
> Hi,
>
> I am trying to use the aeneas to create time stamp for each words in the
> audio files, it creates the time stamps but the accuracy is very poor.
> Is there any special ways to find the time stamps with high accuracy level?
>
> i exported the time stamps to audacity to validate them.

Hi,

in what language are your materials? Does the spoken audio match the
written text? Is the audio sung or does it contain noise/music/etc.? If
you align the same audio file and the same text, but not at word level,
does the alignment improves? Can you share the materials?

aeneas was not built with word-level alignment in mind, but I did a few
tests in a couple of European languages (English, Italian, French,
German, Spanish) with audiobook-like audio, and the alignment was decent.

Someone previously commented that for word-level alignment the quality
of the TTS plays a big role --- which is understandable --- and for sure
eSpeak is not the best available TTS out there, but it is free software.

AP

Alberto Pettarin

unread,

Nov 3, 2015, 11:32:12 AM11/3/15

to aeneas-forc...@googlegroups.com

On 11/03/2015 05:13 PM, Alberto Pettarin wrote:
> Someone previously commented that for word-level alignment the quality
> of the TTS plays a big role --- which is understandable --- and for sure
> eSpeak is not the best available TTS out there, but it is free software.

The comment I was referring to is this one:

https://groups.google.com/d/msg/aeneas-forced-alignment/HqIqixxIc8s/gcibB7_rAwAJ

=== === ===

Assuming there is no evident problem with your audio or text files or
any setup problem (please run the built-in examples to check it!), you
might try setting the MFCC_FRAME_RATE in globalconstants.py to a higher
value, since it will increase the temporal resolution.

AP

bharathydasan m

unread,

Nov 4, 2015, 5:08:40 AM11/4/15

to aeneas-forced-alignment

Hi,

I have attached the audio & text file for your reference, the language of audio file is english, kindly provide some solutions to improve the accuracy levels.

For now I use the below method to process the word alignment

Step 1: Run the python command
python -m aeneas.tools.execute_task sample1.mp3 text1.txt "task_language=en|os_task_file_format=tsv|is_text_type=plain" out.txt

Step 2: Open the audio file in adacity & Import the out.txt as Labels

Step 3: In adacity i will do a word level alignment by checking the each words,

Step 4: After completing export the labels to excel

The step 3 took more time to ensure the word alignment accuracy, for me it consumed 1 hr for 1 minute with a average of 160 words per minute.

I am sure that aeneas can produce 99% to 100% of accuracy for the sentence level alignment. in the case of word-level-alignment it needs improvement. Though it need improvement, at present it will save lot of time when compared to the manual efforts. It is much better to use it.

This is a great work.

Thanks,
Bharathydasan.

audio file with text.zip

Alberto Pettarin

unread,

Nov 4, 2015, 5:36:40 AM11/4/15

to aeneas-forc...@googlegroups.com

Thank you for providing the sample, I will investigate the issue with
your specific materials.

A few notes:

1. I am not really interested in word-level synchronization, and I think
it is a bad idea to synchronize at word-level materials like audiobooks,
if the goal is to have a synchronous highlighting of the text (while it
is ok for e.g. children's books where the audio is slow and the text small).

2. Anyway, in a previous thread I stated that I am investigating the
possibility of using a commercial TTS like Nuance instead of eSpeak. I
did so in the context of supporting languages (like Arabic) that are not
available in eSpeak (the TTS aeneas uses):
https://groups.google.com/forum/#!topic/aeneas-forced-alignment/gC0k9q-PRpk

Of course, adding support for these commercial TTS should solve the
issue about word-level alignment in aeneas. The downside is that each
user will have to register an account with --- say, Nuance --- to get
the API key for accessing the cloud-based TTS. Luckily, they usually
provide some free run (Nuance: 20,000 calls per months), and then they
are pretty cheap afterwards.

Now, as I said above, word-level sync is not at the top of my list, but
if you/your company is interested to sponsor the development of this
feature, I might move it to the top.

3. Listening to your audio, it looks like you are using it for a
commercial product. If so, you might want to save your time and use one
of the commercial (paid) services providing automated word-level
synchronization. By searching this Google Group, you will find a link to
one of them.

4. Not sure if relevant (it depends whether you are fine working with
audacity or not): since v1.3.1, you can use the finetuneas HTML to fine
tuning the timings.

AP

> --
> You received this message because you are subscribed to the Google
> Groups "aeneas-forced-alignment" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to aeneas-forced-ali...@googlegroups.com
> <mailto:aeneas-forced-ali...@googlegroups.com>.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/aeneas-forced-alignment/55c4c39a-019e-464e-99f0-1d3f3ac3a6e3%40googlegroups.com
> <https://groups.google.com/d/msgid/aeneas-forced-alignment/55c4c39a-019e-464e-99f0-1d3f3ac3a6e3%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Alberto Pettarin

web: http://readbeyond.it/
web: http://www.albertopettarin.it/
twitter: http://twitter.com/acutebit/
skype: alberto_pettarin
mobile: +39 340 82 18 704

Alberto Pettarin

unread,

Nov 4, 2015, 5:55:33 AM11/4/15

to aeneas-forc...@googlegroups.com

On 11/04/2015 11:39 AM, Alberto Pettarin wrote:
> Thank you for providing the sample, I will investigate the issue with
> your specific materials.

Please find in attachment the output of aeneas, at word-level, on your
sample (word.tsv).

I also attach the JSON file for finetuneas and the audacity.tsv where I
pasted the text, to help visualize the alignment.

To me, the alignment looks pretty good. Sure, it is not perfect, but I
think it is about as good as you can get for the money you paid for it.

(Furthermore, I am not sure how worthy is insisting in
millisecond-precision, given the temporal resolution of the human hear,
and the fact that reading/displaying applications will probably
introduce unpredictable delays at run-time.)

When I will have some more time, I will try using the Nuance TTS, and
see if it improves further.

AP

audacity.tsv

word.tsv

word.tsv.json

Alberto Pettarin

unread,

Nov 5, 2015, 8:13:26 AM11/5/15

to aeneas-forc...@googlegroups.com

On 11/04/2015 11:58 AM, Alberto Pettarin wrote:
> On 11/04/2015 11:39 AM, Alberto Pettarin wrote:
>> Thank you for providing the sample, I will investigate the issue with
>> your specific materials.
>
> Please find in attachment the output of aeneas, at word-level, on your
> sample (word.tsv).

In attachment the output of aeneas, at word-level, using Nuance TTS
instead of eSpeak. (I had to cut it a bit, since Nuance only allows 500
transactions/day in the free account.)

The output does not look much different to me.

(BTW, at the beginning, the "narrated by X Y" phrase is missing in the
text file, but it is spoken in the audio.)

AP

audacity375_espeak.tsv

audacity375_nuance.tsv

word375.json

word375.tsv

word375.txt

Xavier Anguera

unread,

Nov 5, 2015, 8:26:20 AM11/5/15

to aeneas-forc...@googlegroups.com

Alberto,

in my research a while ago using TTS as the way to bridge the gap between text and speech (I shared the paper a while ago) I noticed a great improvement when eliminating silence regions between words in both sources.

It should be easy for you to implement that in Aeneas. I do not believe it would solve all the alignment problems that current exist at word level (you know what I think about this technique, this is why I abandoned this technnique) but may make things a bit better.

Let me know how it goes in case you try it.

X. Anguera

AP

--
You received this message because you are subscribed to the Google Groups "aeneas-forced-alignment" group.

To unsubscribe from this group and stop receiving emails from it, send an email to aeneas-forced-ali...@googlegroups.com.
To post to this group, send email to aeneas-forc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aeneas-forced-alignment/563B56A5.3060609%40readbeyond.it.

Alberto Pettarin

unread,

Nov 5, 2015, 8:37:14 AM11/5/15

to aeneas-forc...@googlegroups.com

Thank you for your remarks.

I forgot to add that the Nuance test of the previous post was done using
the default 40ms window. As I said, I think there is little point in
trying to separate contracted/short sounds like the "in" in the attached
screenshot.

I have not had time to test with smaller windows, but surely I will keep
in mind the observation about silences, thank you.

AP

> <mailto:aeneas-forced-alignment%2Bunsu...@googlegroups.com>.

> To post to this group, send email to
> aeneas-forc...@googlegroups.com

> <mailto:aeneas-forc...@googlegroups.com>.

> To view this discussion on the web visit
> https://groups.google.com/d/msgid/aeneas-forced-alignment/563B56A5.3060609%40readbeyond.it.
>
> For more options, visit https://groups.google.com/d/optout.
>
>

> --
> You received this message because you are subscribed to the Google
> Groups "aeneas-forced-alignment" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to aeneas-forced-ali...@googlegroups.com

> <mailto:aeneas-forced-ali...@googlegroups.com>.

> To post to this group, send email to
> aeneas-forc...@googlegroups.com

> <mailto:aeneas-forc...@googlegroups.com>.

> To view this discussion on the web visit

> https://groups.google.com/d/msgid/aeneas-forced-alignment/CAGZf01f8HE661xEN5G89dhQKKX_qHaWKeN0ovahzKzfGNYWZ_A%40mail.gmail.com
> <https://groups.google.com/d/msgid/aeneas-forced-alignment/CAGZf01f8HE661xEN5G89dhQKKX_qHaWKeN0ovahzKzfGNYWZ_A%40mail.gmail.com?utm_medium=email&utm_source=footer>.

> For more options, visit https://groups.google.com/d/optout.

short.png

bharathydasan m

unread,

Nov 6, 2015, 4:02:05 AM11/6/15

to aeneas-forced-alignment

Hi,

I will check these and post you the results latter.

Thanks,
Bharathy Dasan M.

Alberto Pettarin

unread,

Nov 9, 2015, 12:32:56 PM11/9/15

to aeneas-forc...@googlegroups.com

On 11/06/2015 10:02 AM, bharathydasan m wrote:
> Hi,
>
> I will check these and post you the results latter.

Have you tried increasing the parameter MFCC_FRAME_RATE in
globalconstants.py as I suggested in a previous email?

For example, setting it to 100 (instead of the default value 25) seems
to improve the alignment at word level a bit, even with the stock espeak
TTS, see the attached screenshot and output files.

AP

audacity_mfcc100.tsv

screen.png

word_mfcc100.tsv

bharathydasan m

unread,

Nov 11, 2015, 10:10:53 AM11/11/15

to aeneas-forced-alignment

I tried increasing the MFCC_FRAME_RATE by 50 & 75, there was some improvement while increasing .

Not with the 100, I will try this one too.

bharathydasan m

unread,

Nov 11, 2015, 10:27:05 AM11/11/15

to aeneas-forced-alignment

Hi,

I have attached the manually processed adacity output for your reference, while comparing this with recent output there is a good improvement in the results.

Manual Process.txt

Alberto Pettarin

unread,

Nov 13, 2015, 11:50:09 AM11/13/15

to aeneas-forc...@googlegroups.com

Thank you, it will be useful when investigating word granularity.

AP

> --
> You received this message because you are subscribed to the Google
> Groups "aeneas-forced-alignment" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to aeneas-forced-ali...@googlegroups.com
> <mailto:aeneas-forced-ali...@googlegroups.com>.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>.
> To view this discussion on the web visit

> https://groups.google.com/d/msgid/aeneas-forced-alignment/77e098b2-0c65-452d-be23-095212cfb559%40googlegroups.com
> <https://groups.google.com/d/msgid/aeneas-forced-alignment/77e098b2-0c65-452d-be23-095212cfb559%40googlegroups.com?utm_medium=email&utm_source=footer>.

Reply all

Reply to author

Forward