parameter suggestions for word alignment?

250 views
Skip to first unread message

Willem van der Walt

unread,
Oct 20, 2016, 7:47:50 AM10/20/16
to aeneas-forc...@googlegroups.com
Hi,
I am experimenting with the mplain format and want to find if there are
some suggestions for parameter settings for word alignment.
I realize that such a suggestion might just be a starting point.
TIA, Willem


--

This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

Please consider the environment before printing this email.

Alberto Pettarin

unread,
Oct 20, 2016, 3:46:35 PM10/20/16
to aeneas-forc...@googlegroups.com
Dear Willem, dear all,

I have not experimented as I wished with word-level alignment, however
the current version of aeneas supports the following runtime parameters
that might prove useful when working with multilevel formats (mplain or
munparsed). Specify them with the -r switch of command line tools or
programmatically via a RuntimeConfiguration object.

A. To modify the granularity of time slices:

mfcc_window_length_l1 (default: 0.500)
mfcc_window_length_l2 (default: 0.100)
mfcc_window_length_l3 (default: 0.020)

mfcc_window_shift_l1 (default: 0.200)
mfcc_window_shift_l2 (default: 0.040)
mfcc_window_shift_l3 (default: 0.005)

B. To change the TTS for each level:

tts_l1 and tts_path_l1
tts_l2 and tts_path_l2
tts_l3 and tts_path_l3

Someone reported that using festival instead of eSpeak improved the sync
quality. I have not done extensive testing to confirm this. As a rule of
thumb, I expect that the quality of the synchronization improves when
using a better TTS.

C. To speed the synthesis up, one can set to "True" the

tts_cache

rconf parameter, which will cache synthesized text fragments, avoiding
synthesizing the same text more than once. This is especially useful at
word-level on long texts, since many words appear several times. (This
rconf parameter is also useful when using paid SaaS TTS services, like
the built-in Nuance TTS API wrapper.)

All these parameters are detailed in the documentation:
https://www.readbeyond.it/aeneas/docs/runtimeconfiguration.html

D. Finally, if you want to "collapse" a multilevel sync map into one of
its levels --- typically, you want either L1 (paragraph) or L3 (word)
--- you can use the task configuration parameter:

os_task_file_levels

See
https://www.readbeyond.it/aeneas/docs/globalconstants.html#aeneas.globalconstants.PPN_TASK_OS_FILE_LEVELS

There is also a built-in example in the execute_task CLI tool:

$ aeneas_execute_task --example-flatten-3

=== === ===

Actually, there is some room for improvement in the algorithmic core. In
fact, right now the alignment is computed considering all MFCC frames,
including those with low energy (non-speech). Especially at word-level,
one should "mask" the non-speech frames, align, and project back on the
full wave. The data structures to do this are in place, but I have not
had time to work on the function yet --- to be honest, I am not sure it
is even worth, since probably a better strategy consists in integrating
ASR-like functions instead.

=== === ===

HTH,

AP

Firat Özdemir

unread,
Oct 20, 2016, 3:47:48 PM10/20/16
to aeneas-forc...@googlegroups.com
Hi Willem,

Default settings are good enough as a starting point. Then you can
play around with the window shift and window length runtime parameters
at word level and others as you wish:

-r="mfcc_window_shift_l1=0.100|mfcc_window_length_l1=0.250|mfcc_window_shift_l2=0.040|mfcc_window_length_l2=0.100|mfcc_window_shift_l3=0.005|mfcc_window_length_l3=0.020"

Firat Özdemir

unread,
Oct 20, 2016, 5:13:11 PM10/20/16
to aeneas-forc...@googlegroups.com
Caching the words for repetitions is a smart idea. I hadn't understood
it when I read it on Github.
I was thinking of another improvement to tts. It's a very low priority
but still I want to give it a try when I find the time. The idea is to
synthesize the whole text together as a single wave file, then split
it into fragments based on some "marker"s, but not necessarily the
"mark" tags (http://espeak.sourceforge.net/ssml.html). The simplest
would be to mark the fragment borders with unusually long silences.
All TTS engines seem to have some markup for that:
eSpeak: http://espeak.sourceforge.net/ssml.html
Festival: http://www.festvox.org/docs/manual-2.4.0/festival_10.html
Nuance: https://developer.nuance.com/downloads/guidelines/Using_control_sequences.pdf

One can then find the long silences with VAD or other means (perhaps
even regular binary search could work, synthetic silences should be
all null streams right?) to assign the anchors. For eSpeak it may not
be as fast as the current route, but with Festival and Nuance it would
probably make a difference.

If that works, one advantage will be calculating the best path only
once and using it for all three levels (when same window shift &
length used). The second advantage will be better tts for word level
(because there will be better intonation and stress when the word is
synthesized as part of a sentence, I assume). And obviously the time
(and money) saved by running the tts only once.

F.O.

P.S. Willem, just to clarify, I saw Alberto's reply after I sent mine.
You can safely ignore mine;)

Willem van der Walt

unread,
Oct 21, 2016, 1:50:10 AM10/21/16
to aeneas-forc...@googlegroups.com
Hi,
Firat, just a comment regarding the prosody.
With Espeak it will not make a big difference if you synthesize a word or
a chapter, where as with the newer type of synthesizers, synthesizing a
sentence at a time rather than a word, makes a big difference.
From other experiments we did, going larger than a sentence does not
really contribute much to prosody.
Thanks for your tips on the parameters.
Regards, Willem
> --
> You received this message because you are subscribed to the Google Groups "aeneas-forced-alignment" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to aeneas-forced-ali...@googlegroups.com.
> To post to this group, send email to aeneas-forc...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/aeneas-forced-alignment/CAEs3jXu2dVpPyFeT4_r6K42djroFiXa9iW%3DsSjL4HGA8838ZEA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Willem van der Walt

unread,
Oct 21, 2016, 1:56:41 AM10/21/16
to aeneas-forc...@googlegroups.com
Thanks for all this detail.
As I am using speect for some of the tests I am doing, I just want to know
if, when using mplain, the word-level is synthesized as single words or as
a sentence and then broken up.
For English, I have the choice with speect to use a voice for which single
word pronunciation is optimized, so I might use that voice for level 3
and another for levels 1 and two.
Kind regards, Willem
> --
> You received this message because you are subscribed to the Google Groups
> "aeneas-forced-alignment" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to aeneas-forced-ali...@googlegroups.com.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/aeneas-forced-alignment/9eeda69d-ec5a-8031-e630-6fdc53d37811%40readbeyond.it.
> For more options, visit https://groups.google.com/d/optout.
>
>

Alberto Pettarin

unread,
Oct 21, 2016, 3:20:05 AM10/21/16
to aeneas-forc...@googlegroups.com
You are welcome.

At the moment "word level" is done by synthesizing single words.

From the TTS point of view, you are right that synthesizing by sentence
and then splitting by word generally yields a better sounding wave.

I am not sure if this makes any difference for the aligner, though.

My current guess: it does not. But one would need to experiment to find
out. At the moment, I prefer putting my time elsewhere.

AP
--
Alberto Pettarin

web: http://readbeyond.it/
web: http://www.albertopettarin.it/
twitter: http://twitter.com/acutebit/
skype: alberto_pettarin
mobile: +39 340 82 18 704

Willem van der Walt

unread,
Oct 21, 2016, 3:39:07 AM10/21/16
to aeneas-forc...@googlegroups.com
Hi,
Since, in the case of speect, at the moment, I have a choice of good
prosody when sending a sentence, but then will get bad pronunciation of
single words, or good single words but sub-optimal prosody when
synthesizing sentences, this confirmation is very relevant to me.
At the moment "word level" is done by synthesizing single words.
I need to create aligned books for two native languages, Zulu and Tswana.
I do not speak either of them, so want to learn my way around Aeneas and
Speect/qfrency using English and Afrikaans.
By the way, When looking at the Aeneas docs again, I saw a list of
languages, known to work.
You can include Afrikaans as the Espeak Afrikaans, when used with Aeneas,
works as well as the English.
Thanks again, Willem
> https://groups.google.com/d/msgid/aeneas-forced-alignment/99050375-8063-da12-ef4e-9f0d3ae7767f%40readbeyond.it.

Xavier Anguera

unread,
Oct 21, 2016, 3:45:52 AM10/21/16
to aeneas-forc...@googlegroups.com
I advise against using single-word TTS. It does not take into account coarticulation rules.


To post to this group, send email to

--

This message is subject to the CSIR's copyright terms and conditions,
e-mail legal notice, and implemented Open Document Format (ODF)
standard. The full disclaimer details can be found at
http://www.csir.co.za/disclaimer.html.
Please consider the environment before printing this email.
--
Alberto Pettarin

web: http://readbeyond.it/
web: http://www.albertopettarin.it/
twitter: http://twitter.com/acutebit/
skype: alberto_pettarin
mobile: +39 340 82 18 704
--
You received this message because you are subscribed to the Google Groups "aeneas-forced-alignment" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aeneas-forced-alignment+unsubscr...@googlegroups.com.
To post to this group, send email to aeneas-forced-alignment@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aeneas-forced-alignment/99050375-8063-da12-ef4e-9f0d3ae7767f%40readbeyond.it.

Alberto Pettarin

unread,
Oct 21, 2016, 3:47:21 AM10/21/16
to aeneas-forc...@googlegroups.com
Hi,

Thanks for explaining your use case. I think your best option in aeneas
consists in specifying the "sentence-good" TTS for L1 and L2, and the
"word-good" TTS for L3.

I will mark Afrikaans as tested in the next release, thank you for
letting me know.

Best regards,

AP

Alberto Pettarin

unread,
Oct 21, 2016, 4:05:00 AM10/21/16
to aeneas-forc...@googlegroups.com
For the synthesized wave: agreed.

But the point is: is there evidence that synthesizing at sentence level
and then chopping words up improves the alignment?

AP
> aeneas-forced-ali...@googlegroups.com
> <mailto:aeneas-forced-alignment%2Bunsu...@googlegroups.com>.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>.
> <https://groups.google.com/d/msgid/aeneas-forced-alignment/9eeda69d-ec5a-8031-e630-6fdc53d37811%40readbeyond.it>.
>
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
>
> --
>
> This message is subject to the CSIR's copyright terms and
> conditions,
> e-mail legal notice, and implemented Open Document Format (ODF)
> standard. The full disclaimer details can be found at
> http://www.csir.co.za/disclaimer.html
> <http://www.csir.co.za/disclaimer.html>.
> Please consider the environment before printing this email.
>
>
>
> --
> Alberto Pettarin
>
> web: http://readbeyond.it/
> web: http://www.albertopettarin.it/
> twitter: http://twitter.com/acutebit/
> skype: alberto_pettarin
> mobile: +39 340 82 18 704 <tel:%2B39%20340%2082%2018%20704>
>
> --
> You received this message because you are subscribed to the Google
> Groups "aeneas-forced-alignment" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to
> aeneas-forced-ali...@googlegroups.com
> <mailto:aeneas-forced-alignment%2Bunsu...@googlegroups.com>.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/aeneas-forced-alignment/99050375-8063-da12-ef4e-9f0d3ae7767f%40readbeyond.it
> <https://groups.google.com/d/msgid/aeneas-forced-alignment/99050375-8063-da12-ef4e-9f0d3ae7767f%40readbeyond.it>.
>
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "aeneas-forced-alignment" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to aeneas-forced-ali...@googlegroups.com
> <mailto:aeneas-forced-ali...@googlegroups.com>.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/aeneas-forced-alignment/CAGZf01fnJ_Q%3DzpkwxH%2BqmwCGDjw%3DH2hD2tofnYB%3Dr5AS-1hWzg%40mail.gmail.com
> <https://groups.google.com/d/msgid/aeneas-forced-alignment/CAGZf01fnJ_Q%3DzpkwxH%2BqmwCGDjw%3DH2hD2tofnYB%3Dr5AS-1hWzg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.


--

Xavier Anguera

unread,
Oct 21, 2016, 4:09:51 AM10/21/16
to aeneas-forc...@googlegroups.com
I do not see why would you want to do that at all. Using the timings at word boundaries should be all info you need. 


            To post to this group, send email to

    To post to this group, send email to



--
You received this message because you are subscribed to the Google
Groups "aeneas-forced-alignment" group.
To unsubscribe from this group and stop receiving emails from it, send

To post to this group, send email to


--
Alberto Pettarin

web: http://readbeyond.it/
web: http://www.albertopettarin.it/
twitter: http://twitter.com/acutebit/
skype: alberto_pettarin
mobile: +39 340 82 18 704

--
You received this message because you are subscribed to the Google Groups "aeneas-forced-alignment" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aeneas-forced-alignment+unsubscr...@googlegroups.com.
To post to this group, send email to aeneas-forced-alignment@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aeneas-forced-alignment/cb3c2d92-adfd-241e-39c0-c030688807fd%40readbeyond.it.

Alberto Pettarin

unread,
Oct 21, 2016, 4:19:20 AM10/21/16
to aeneas-forc...@googlegroups.com
Exactly, that is my point: instead of tweaking the current (TTS+DTW)
approach, trying to improve the quality of TTS output, one should align
at sentence level, and then apply some technique for detecting the words
inside the sentence, whose time boundaries are known.

And this "technique" is probably something similar to what ASR systems
do, which is an entire different beast.

AP





On 10/21/2016 10:09 AM, Xavier Anguera wrote:
> I do not see why would you want to do that at all. Using the timings at
> word boundaries should be all info you need.
>
> On Fri, Oct 21, 2016 at 9:02 AM, Alberto Pettarin <alb...@readbeyond.it
> <mailto:alb...@readbeyond.it>> wrote:
>
> For the synthesized wave: agreed.
>
> But the point is: is there evidence that synthesizing at sentence
> level and then chopping words up improves the alignment?
>
> AP
>
>
>
> On 10/21/2016 09:45 AM, Xavier Anguera wrote:
>
> I advise against using single-word TTS. It does not take into
> account
> coarticulation rules.
>
> On Fri, Oct 21, 2016 at 8:17 AM, Alberto Pettarin
> <alb...@readbeyond.it <mailto:alb...@readbeyond.it>
> <mailto:alb...@readbeyond.it <mailto:alb...@readbeyond.it>>>
> aeneas-forced-ali...@googlegroups.com
> <mailto:aeneas-forced-alignment%2Bunsu...@googlegroups.com>
>
> <mailto:aeneas-forced-alignment%2Bunsu...@googlegroups.com
> <mailto:aeneas-forced-alignment%252Buns...@googlegroups.com>>.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>
> <mailto:aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>>.
> aeneas-forced-ali...@googlegroups.com
> <mailto:aeneas-forced-alignment%2Bunsu...@googlegroups.com>
>
> <mailto:aeneas-forced-alignment%2Bunsu...@googlegroups.com
> <mailto:aeneas-forced-alignment%252Buns...@googlegroups.com>>.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>
> <mailto:aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>>.
> an email to aeneas-forced-ali...@googlegroups.com
> <mailto:aeneas-forced-alignment%2Bunsu...@googlegroups.com>
> <mailto:aeneas-forced-ali...@googlegroups.com
> <mailto:aeneas-forced-alignment%2Bunsu...@googlegroups.com>>.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>
> <mailto:aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>>.
> <https://groups.google.com/d/msgid/aeneas-forced-alignment/CAGZf01fnJ_Q%3DzpkwxH%2BqmwCGDjw%3DH2hD2tofnYB%3Dr5AS-1hWzg%40mail.gmail.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/aeneas-forced-alignment/CAGZf01fnJ_Q%3DzpkwxH%2BqmwCGDjw%3DH2hD2tofnYB%3Dr5AS-1hWzg%40mail.gmail.com?utm_medium=email&utm_source=footer>>.
> Alberto Pettarin
>
> web: http://readbeyond.it/
> web: http://www.albertopettarin.it/
> twitter: http://twitter.com/acutebit/
> skype: alberto_pettarin
> mobile: +39 340 82 18 704 <tel:%2B39%20340%2082%2018%20704>
>
> --
> You received this message because you are subscribed to the Google
> Groups "aeneas-forced-alignment" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to
> aeneas-forced-ali...@googlegroups.com
> <mailto:aeneas-forced-alignment%2Bunsu...@googlegroups.com>.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/aeneas-forced-alignment/cb3c2d92-adfd-241e-39c0-c030688807fd%40readbeyond.it
> <https://groups.google.com/d/msgid/aeneas-forced-alignment/cb3c2d92-adfd-241e-39c0-c030688807fd%40readbeyond.it>.
>
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "aeneas-forced-alignment" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to aeneas-forced-ali...@googlegroups.com
> <mailto:aeneas-forced-ali...@googlegroups.com>.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/aeneas-forced-alignment/CAGZf01dYnZaP%2Bwk_nNQZ9Di%2BsbWaQ_bYToPteQUVeUy4Njqyyw%40mail.gmail.com
> <https://groups.google.com/d/msgid/aeneas-forced-alignment/CAGZf01dYnZaP%2Bwk_nNQZ9Di%2BsbWaQ_bYToPteQUVeUy4Njqyyw%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Xavier Anguera

unread,
Oct 21, 2016, 4:26:05 AM10/21/16
to aeneas-forc...@googlegroups.com
On Fri, Oct 21, 2016 at 9:16 AM, Alberto Pettarin <alb...@readbeyond.it> wrote:
Exactly, that is my point: instead of tweaking the current (TTS+DTW) approach, trying to improve the quality of TTS output, one should align at sentence level, and then apply some technique for detecting the words inside the sentence, whose time boundaries are known.

This would work, but you would probably introduce alignment errors that would transfer to the output. Why not performing ASR aligment all the way?
 
And this "technique" is probably something similar to what ASR systems do, which is an entire different beast.

Mentioned in another email: you will get good gains if you eliminate the silence from both the TTS and the input audio. It requires some book-keeping of the boundaries, but gains are significant. See my paper for a description.




 

                    To post to this group, send email to
                    aeneas-forced-alignment@googlegroups.com
        <mailto:aeneas-forced-alignment...@googlegroups.com>
                    <mailto:aeneas-forced-alignment...@googlegroups.com

            To post to this group, send email to

            To view this discussion on the web visit

        https://groups.google.com/d/msgid/aeneas-forced-alignment/99050375-8063-da12-ef4e-9f0d3ae7767f%40readbeyond.it
        <https://groups.google.com/d/msgid/aeneas-forced-alignment/99050375-8063-da12-ef4e-9f0d3ae7767f%40readbeyond.it>

        <https://groups.google.com/d/msgid/aeneas-forced-alignment/99050375-8063-da12-ef4e-9f0d3ae7767f%40readbeyond.it
        <https://groups.google.com/d/msgid/aeneas-forced-alignment/99050375-8063-da12-ef4e-9f0d3ae7767f%40readbeyond.it>>.

            For more options, visit https://groups.google.com/d/optout
        <https://groups.google.com/d/optout>
            <https://groups.google.com/d/optout
        <https://groups.google.com/d/optout>>.


        --
        You received this message because you are subscribed to the Google
        Groups "aeneas-forced-alignment" group.
        To unsubscribe from this group and stop receiving emails from
        it, send

        To post to this group, send email to
    Alberto Pettarin

    web: http://readbeyond.it/
    web: http://www.albertopettarin.it/
    twitter: http://twitter.com/acutebit/
    skype: alberto_pettarin
    mobile: +39 340 82 18 704 <tel:%2B39%20340%2082%2018%20704>

    --
    You received this message because you are subscribed to the Google
    Groups "aeneas-forced-alignment" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to

    To post to this group, send email to

    To view this discussion on the web visit


    For more options, visit https://groups.google.com/d/optout
    <https://groups.google.com/d/optout>.


--
You received this message because you are subscribed to the Google
Groups "aeneas-forced-alignment" group.
To unsubscribe from this group and stop receiving emails from it, send

To post to this group, send email to

To view this discussion on the web visit


--
Alberto Pettarin

web: http://readbeyond.it/
web: http://www.albertopettarin.it/
twitter: http://twitter.com/acutebit/
skype: alberto_pettarin
mobile: +39 340 82 18 704

--
You received this message because you are subscribed to the Google Groups "aeneas-forced-alignment" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aeneas-forced-alignment+unsubscr...@googlegroups.com.
To post to this group, send email to aeneas-forced-alignment@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aeneas-forced-alignment/656b7e93-ab0a-00c8-5609-5da410e1233b%40readbeyond.it.

Alberto Pettarin

unread,
Oct 21, 2016, 4:35:22 AM10/21/16
to aeneas-forc...@googlegroups.com
On 10/21/2016 10:26 AM, Xavier Anguera wrote:
>
> This would work, but you would probably introduce alignment errors that
> would transfer to the output. Why not performing ASR aligment all the way?

Yes, there are great free sw/open source ASRs out there, some have a
forced aligner front-end, so people interested in word-level alignment
should probably use them instead.

The goals of aeneas are different (covering a large number of languages
and ease of installation/use, to start with) and realistically, only
academics or businesses can spend time working on ASRs and significantly
improve the state of the art.

> And this "technique" is probably something similar to what ASR
> systems do, which is an entire different beast.
>
> Mentioned in another email: you will get good gains if you eliminate the
> silence from both the TTS and the input audio. It requires some
> book-keeping of the boundaries, but gains are significant. See my paper
> for a description.

I mentioned that in a previous email in this thread indeed.

Thanks,

AP

Firat Özdemir

unread,
Oct 21, 2016, 8:11:04 AM10/21/16
to aeneas-forc...@googlegroups.com
The effect from prosody might be little indeed because the algorithm
seems to be very tolerant to low quality tts. What can make more
difference is the extra 300 ms silence that espeak adds at the end of
each file. Splitting isn't necessary to improve that; truncating the
end of each fragment will suffice. It will almost certainly improve
the results in word level and it will speed up the dtw too. No 'book
keeping' will be necessary for that either.

F.O.
> --
> You received this message because you are subscribed to the Google Groups
> "aeneas-forced-alignment" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to aeneas-forced-ali...@googlegroups.com.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/aeneas-forced-alignment/47aeedfc-aae0-a6bc-a42a-e972b07813b8%40readbeyond.it.

Firat Özdemir

unread,
Oct 21, 2016, 10:18:49 AM10/21/16
to aeneas-forc...@googlegroups.com
Just noticed eSpeak has an option for that and you're considering to
let the user choose with the cew extension. Ok that sounds better than
truncation ;)

Alberto Pettarin

unread,
Nov 25, 2016, 9:31:50 AM11/25/16
to aeneas-forc...@googlegroups.com
As I said in the previous email, I am investigating this "masking"
approach, to see if it can be integrated in the upcoming v1.7.0.

I have just done a quick test by monkey-patching the current devel/
code, and indeed the resulting word-level sync map looks better than
without the masking. I attach the two files in Audacity format, the
audio file is the usual aeneas/tools/res/audio.mp3 (sonnet I).

Now I just need to run more tests and to "engineer" the function into
the code.

AP



PS: If you examine audiofilemfcc.py , it already contained most of the
code needed to mask nonspeech out, and to map indices from the masked
wave to the full wave. If I had had an extended period of time to think
more about this issue, I could have implemented better word-level
alignment months ago. I guess this experience makes for a great point in
supporting aeneas (and FLOSS projects in general)...
sonnet.new.aud
sonnet.previous.aud
Reply all
Reply to author
Forward
0 new messages