Version 1.7.0 available on devel branch

125 views
Skip to first unread message

Alberto Pettarin

unread,
Nov 24, 2016, 3:11:55 AM11/24/16
to aeneas-forc...@googlegroups.com
Dear all,

just to let you know that I pushed a release candidate v1.7.0 onto the
devel branch:

https://github.com/readbeyond/aeneas/tree/devel

Please test it if you can, especially on non-Linux OSes!

(Technical details are below.)

Next week I will invest some time to examine whether the MFCC masking
that might help with word-level alignment can be done quickly and
elegantly. If so, I will implement it so that it ships with v1.7.0. If
not, the current commit will be released as v1.7.0. A new email next
week will announce the result of this investigation and the release date.

Best regards,

Alberto Pettarin



=== === === ACKNOWLEDGMENTS === === ===

I want to thank the Centro Internazionale del Libro Parlato "A.
Sernagiotto" (International Center for the Talking Book) of Feltre, Italy:

http://www.libroparlato.org/

since they are providing me with a sponsorship for working nearly
full-time for two months on EPUB3 Audio-eBooks and related technologies,
and part of that time has been devoted to aeneas.

=== === === TECHNICAL SUMMARY === === ===

You can see the detailed changes here:

https://github.com/readbeyond/aeneas/blob/devel/docs/source/changelog.rst

Comments:

1. now the code is much cleaner, especially the "adjust boundary" functions;

2. support for outputting in TextGrid format (both long and short);

3. more robust parsing (reading) of sync maps, for those using the
convert_syncmap tool;

4. more readable listing of language values and names with "python -m
aeneas.tools.execute_task --list-values=task_language" or
"--list-values=espeak" or "--list-values=nuance" etc.;

5. ability to "remove" from the sync map (or to assign them a specific
label like "<sil>") long nonspeech intervals;

6. more tests (now more than 1,000 !) and Bash scripts to help managing
virtualenvs for testing.

IMPORTANT: breaking changes:

A. the only CLI-tool-level breaking change is the renaming of the
parameter "os_task_file_no_zero" to "task_adjust_boundary_no_zero". As
always, python -m aeneas.tools.execute_task --list-parameters will show
the current supported list.

B. additionally, if you use aeneas as library, the sync map functions
have been moved to their own aeneas.syncmap subpackage, and a few
internal functions have been renamed (check the change log).

Alberto Pettarin

unread,
Nov 27, 2016, 2:36:35 PM11/27/16
to aeneas-forc...@googlegroups.com
Dear all,

fortunately enabling the MFCC nonspeech masking was neat enough, so I am
pleased to announce that v1.7.0 will have this feature.

As mentioned in a previous email, it seems to improve a bit the
alignment at word-level, especially if you combine it with multilevel
alignment.

In the process, I added a whole lot of new unit/integration/performance
tests (now totaling 1,180 !), and fixed a couple of subtle bugs. The
v1.7.0 RC code can be found at:

https://github.com/readbeyond/aeneas/tree/devel

Please test it in the next days if you have a chance.

I intend to merge it on the master branch and publish it on PyPI on
Wednesday 2016-12-07, unless serious bugs are reported.

Best regards,

Alberto Pettarin

Alberto Pettarin

unread,
Dec 1, 2016, 3:34:28 PM12/1/16
to aeneas-forc...@googlegroups.com
Dear all,

a gentle reminder to please test the current v1.7.0 "RC" code and report
any bugs.

I also updated the aeneas Web App, which now runs the devel code:

https://aeneasweb.org

Thank you very much,

Alberto Pettarin

Willem van der Walt

unread,
Dec 2, 2016, 12:15:20 AM12/2/16
to aeneas-forc...@googlegroups.com
Hi,
I did some testing.
The textgrid output format looks fine.
I played a little with the word-level alignment, but not enough to say
anything yet.
Thanks again for a great tool.
Regards, Willem
> --
> You received this message because you are subscribed to the Google Groups
> "aeneas-forced-alignment" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to aeneas-forced-ali...@googlegroups.com.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/aeneas-forced-alignment/72234db8-88c7-8504-41cc-c134da82bc68%40readbeyond.it.
> For more options, visit https://groups.google.com/d/optout.
>
>

--

This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

Please consider the environment before printing this email.

Alberto Pettarin

unread,
Dec 2, 2016, 3:48:45 AM12/2/16
to aeneas-forc...@googlegroups.com
Excellent, thank you.

AP

Alberto Pettarin

unread,
Dec 2, 2016, 3:43:47 PM12/2/16
to aeneas-forc...@googlegroups.com
Dear all,

just to inform you that I have just merged on devel/ a new commit that
adds support for the recently launched Amazon AWS Polly TTS API, so that
it will be included in v1.7.0 as well.

AWS Polly is included in the AWS Free Tier, so you can use it for free
(up to 5,000,000 characters/month) for the first year.

Even if you deplete the free tier quota, AWS Polly is much cheaper than
the other cloud-based TTS services (Nuance, Google, IBM, etc.), which
gives you more room to experiment with word-level sync in aeneas.

For further details: https://aws.amazon.com/polly/

To use it in aeneas:

- get the devel/ code (or wait for v1.7.0)
- have an AWS account (Access Key and Secret Key)
- install the "boto3" Python module
- select the AWS Polly TTS wrapper with:

-r="tts=aws|tts_cache=True"

Best regards,

Alberto Pettarin

Mohammed Ezzat

unread,
Sep 3, 2018, 9:12:54 AM9/3/18
to aeneas-forced-alignment

Hello,
First, Thank you very much for aeneas... it's been a great help so far

I've a question, hope anybody can help
I wanted to test aws, i installed awscli and boto3 then from the terminal I used "aws configure" to add my key and secret key
and then I added -r="tts=aws|tts_cache=True" and tried to excute a task
whenever i tried to run using a language like fra-CAN that isn't part of espeak it fails, which i believe means that aws tts isn't the one in charge
I'm not sure what i'm missing, anybody got an idea?

my regards,

Alberto Pettarin

unread,
Sep 4, 2018, 4:28:33 PM9/4/18
to aeneas-forc...@googlegroups.com
Hi,

first of all, please note that the latest released version of aeneas is
1.7.3: if you are using an earlier version, it might not work properly.
If so, please update.

If you are in fact using aeneas 1.7.3, here:

https://github.com/readbeyond/aeneas/blob/master/aeneas/ttswrappers/awsttswrapper.py

I see that "fra-CAN" is listed among the languages supported by the
"aws" wrapper, so if you specify:

task_language=fra-CAN

as part of your task configuration string, it should work. Note that you
should write it as above, including capitalization, "fra-can" or
"fra_can" will not work.

A small test I did, with -v to see debug messages:

$ python -m aeneas.tools.execute_task aeneas/tools/res/audio.mp3
aeneas/tools/res/plain.txt
"task_language=fra-CAN|is_text_type=plain|os_task_file_format=json"
output/sonnet.json -r="tts=aws|tts_cache=True" -v

[DEBU] CLI: Running aeneas 1.7.4
[DEBU] CLI: Formal arguments:
[u'/home/alberto/projects/rb/aeneas/aeneas/tools/execute_task.py',
u'aeneas/tools/res/audio.mp3', u'aeneas/tools/res/plain.txt',
u'task_language=fra-CAN|is_text_type=plain|os_task_file_format=json',
u'output/sonnet.json', u'-r=tts=aws|tts_cache=True', u'-v']
[DEBU] CLI: Actual arguments: [u'aeneas/tools/res/audio.mp3',
u'aeneas/tools/res/plain.txt',
u'task_language=fra-CAN|is_text_type=plain|os_task_file_format=json',
u'output/sonnet.json']
[DEBU] CLI: Runtime configuration:
'aba_no_zero_duration=0.001|aba_nonspeech_tolerance=0.080|allow_unlisted_languages=False|c_extensions=True|cdtw=True|cew=True|cew_subprocess_enabled=False|cew_subprocess_path=python|cfw=True|cmfcc=True|downloader_retry_attempts=5|downloader_sleep=1.000|dtw_algorithm=stripe|dtw_margin=60.000|dtw_margin_l1=60.000|dtw_margin_l2=30.000|dtw_margin_l3=10.000|ffmpeg_path=ffmpeg|ffmpeg_sample_rate=16000|ffprobe_path=ffprobe|job_max_tasks=0|mfcc_emphasis_factor=0.97|mfcc_fft_order=512|mfcc_filters=40|mfcc_lower_frequency=133.3333|mfcc_mask_extend_speech_after=0|mfcc_mask_extend_speech_before=0|mfcc_mask_log_energy_threshold=0.699|mfcc_mask_min_nonspeech_length=1|mfcc_mask_nonspeech=False|mfcc_mask_nonspeech_l1=False|mfcc_mask_nonspeech_l2=False|mfcc_mask_nonspeech_l3=False|mfcc_size=13|mfcc_upper_frequency=6855.4976|mfcc_window_length=0.100|mfcc_window_length_l1=0.100|mfcc_window_length_l2=0.050|mfcc_window_length_l3=0.020|mfcc_window_shift=0.040|mfcc_window_shift_l1=0.040|mfcc_window_shift_l2=0.020|mfcc_window_shift_l3=0.005|safety_checks=True|task_max_audio_length=0|task_max_text_length=0|tts=aws|tts_api_retry_attempts=5|tts_api_sleep=1.000|tts_cache=True|tts_l1=espeak|tts_l2=espeak|tts_l3=espeak|vad_extend_speech_after=0.000|vad_extend_speech_before=0.000|vad_log_energy_threshold=0.699|vad_min_nonspeech_length=0.200'
...

[DEBU] ExecuteTask: Setting synthesizer...
[DEBU] Synthesizer: Selecting TTS engine...
[DEBU] Synthesizer: TTS engine: AWS Polly TTS API
[DEBU] AWSTTSWrapper: No tts_path specified in rconf, setting default
TTS path
[DEBU] TTSCache: Cache initialized
[DEBU] AWSTTSWrapper: TTS path is None
[DEBU] AWSTTSWrapper: TTS cache? True
[DEBU] AWSTTSWrapper: Has Python call? True
[DEBU] AWSTTSWrapper: Has C extension call? False
[DEBU] AWSTTSWrapper: Has subprocess call? False
[DEBU] Synthesizer: Selecting TTS engine... done
[DEBU] ExecuteTask: Setting synthesizer... done
[DEBU] ExecuteTask: STEP 3 BEGIN (synthesize text)
[DEBU] Synthesizer: Synthesizing text...
[DEBU] AWSTTSWrapper: Calling TTS engine via Python
[DEBU] AWSTTSWrapper: Synthesizing multiple via a Python call...
[DEBU] AWSTTSWrapper: Calling TTS engine using multiple generic function...
[DEBU] AWSTTSWrapper: Determining codec and sample rate...
[DEBU] AWSTTSWrapper: Reading codec and sample rate from OUTPUT_AUDIO_FORMAT
[DEBU] AWSTTSWrapper: Determining codec and sample rate... done
[DEBU] AWSTTSWrapper: codec: pcm_s16le
[DEBU] AWSTTSWrapper: sample rate: 16000
[DEBU] AWSTTSWrapper: Examining fragment 0 (cache)...
[DEBU] AWSTTSWrapper: Fragment not cached: synthesizing and caching
[DEBU] AWSTTSWrapper: Synthesizing fragment to '/tmp/tmp4IoWaF.cache.wav'...
[DEBU] AWSTTSWrapper: Language to voice code: 'fra-CAN' => 'Chantal'
[DEBU] AWSTTSWrapper: Calling helper function
[DEBU] AWSTTSWrapper: Importing boto3...
[DEBU] AWSTTSWrapper: Importing boto3... done
[DEBU] AWSTTSWrapper: Sleep delay: 1.000
[DEBU] AWSTTSWrapper: Retry attempts: 5
[DEBU] AWSTTSWrapper: Sleeping to throttle API usage...
[DEBU] AWSTTSWrapper: Sleeping to throttle API usage... done
[DEBU] AWSTTSWrapper: Posting...
[CRIT] AWSTTSWrapper: Unexpected exception on HTTP POST. Are you offline?
[CRIT] AWSTTSWrapper: Unable to locate credentials
[WARN] AWSTTSWrapper: An unexpected error occurred while calling
_synthesize_multiple_python
[WARN] AWSTTSWrapper: Unexpected exception on HTTP POST. Are you
offline? : Unable to locate credentials


Note, however, that I do not have my AWS key set in my environment, so
the POST to the AWS Polly API failed. But the log above shows that the
voice "Chantal" was selected for Canadian French.

HTH,

Alberto Pettarin



On 09/03/2018 03:12 PM, Mohammed Ezzat wrote:
>
> Hello,
> First, Thank you very much for aeneas... it's been a great help so far
>
> I've a question, hope anybody can help
> I wanted to test aws, i installed awscli and boto3 then from the
> terminal I used "aws configure" to add my key and secret key
> and then I added-r="tts=aws|tts_cache=True" and tried to excute a task

Mohammed Ezzat

unread,
Sep 9, 2018, 7:12:14 AM9/9/18
to aeneas-forced-alignment
Hi, thank you for your reply and sorry it took me so long
I'm pretty sure i have the version you mentioned, I guess that I'm having an issue where the aeneas can't get the path to aws tts
I did run it again, and that's what i got

[DEBU] SD: Synthesizing at least 2.500 seconds

[DEBU] Synthesizer: Selecting TTS engine...
[DEBU] Synthesizer: TTS engine: eSpeak
[DEBU] ESPEAKTTSWrapper: No tts_path specified in rconf, setting default TTS path
[DEBU] TTSCache: Cache initialized
[DEBU] ESPEAKTTSWrapper: TTS path is             espeak
[DEBU] ESPEAKTTSWrapper: TTS cache?              True
 

my command was:

python -m aeneas.tools.execute_task 184167047620.mp4  184167047620.txt "task_language=fra-CAN|os_task_file_format=srt|is_text_type=plain|task_adjust_boundary_nonspeech_min=0.100|PPV_TASK_ADJUST_BOUNDARY_NONSPEECH_REMOVE|is_audio_file_detect_head_max=2.5" 184167047620_fran.srt -r="tts=aws|tts_cache=True" -v


Alberto Pettarin

unread,
Sep 11, 2018, 4:51:08 PM9/11/18
to aeneas-forc...@googlegroups.com
Please make sure you have the latest released version of aeneas (1.7.3).

If you run with the -v flag, one of the first lines should contain:

[DEBU] CLI: Running aeneas 1.7.3

or, alternatively, you can

$ python -c "import aeneas; print(aeneas.__version__)"
1.7.3

or (if installed via pip)

$ pip freeze | grep aeneas
aeneas==1.7.3

Other than that, on my laptop it seems to work, so I am not sure why it
does not work in your case: I would need to see the entire log produced
with -v . Can you upload it to some pastebin and post/email me the link?

Best regards,

Alberto Pettarin


PS: note that "PPV_TASK_ADJUST_BOUNDARY_NONSPEECH_REMOVE" is not a valid
key for the config string, it is the internal symbol when using aeneas
as a Python library.





On 09/09/2018 01:12 PM, Mohammed Ezzat wrote:
> Hi, thank you for your reply and sorry it took me so long
> I'm pretty sure i have the version you mentioned, I guess that I'm
> having an issue where the aeneas can't get the path to aws tts
> I did run it again, and that's what i got
>
> [DEBU] SD: Synthesizing at least 2.500 seconds
> [DEBU] Synthesizer: Selecting TTS engine...
> [DEBU] Synthesizer: TTS engine: eSpeak
> [DEBU] ESPEAKTTSWrapper: No tts_path specified in rconf, setting default
> TTS path
> [DEBU] TTSCache: Cache initialized
> [DEBU] ESPEAKTTSWrapper: TTS path is             espeak
> [DEBU] ESPEAKTTSWrapper: TTS cache?              True
>
> my command was:
>
> /python -m aeneas.tools.execute_task 184167047620.mp4  184167047620.txt
> "task_language=fra-CAN|os_task_file_format=srt|is_text_type=plain|task_adjust_boundary_nonspeech_min=0.100|PPV_TASK_ADJUST_BOUNDARY_NONSPEECH_REMOVE|is_audio_file_detect_head_max=2.5"
> 184167047620_fran.srt -r="tts=aws|tts_cache=True" -v
> /
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "aeneas-forced-alignment" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to aeneas-forced-ali...@googlegroups.com
> <mailto:aeneas-forced-ali...@googlegroups.com>.
> To post to this group, send email to
> aeneas-forc...@googlegroups.com
> <mailto:aeneas-forc...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/aeneas-forced-alignment/9f67417d-7e77-4df4-8ec2-46073411e2be%40googlegroups.com
> <https://groups.google.com/d/msgid/aeneas-forced-alignment/9f67417d-7e77-4df4-8ec2-46073411e2be%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Alberto Pettarin

web: http://readbeyond.it/
web: http://www.albertopettarin.it/
twitter: http://twitter.com/acutebit/
skype: alberto_pettarin
mobile: +39 340 82 18 704

Alberto Pettarin

unread,
Sep 11, 2018, 4:51:14 PM9/11/18
to aeneas-forc...@googlegroups.com
Please make sure you have the latest released version of aeneas (1.7.3).

If you run with the -v flag, one of the first lines should contain:

[DEBU] CLI: Running aeneas 1.7.3

or, alternatively, you can

$ python -c "import aeneas; print(aeneas.__version__)"
1.7.3

or (if installed via pip)

$ pip freeze | grep aeneas
aeneas==1.7.3

Other than that, on my laptop it seems to work, so I am not sure why it
does not work in your case: I would need to see the entire log produced
with -v . Can you upload it to some pastebin and post/email me the link?

Best regards,

Alberto Pettarin


PS: note that "PPV_TASK_ADJUST_BOUNDARY_NONSPEECH_REMOVE" is not a valid
key for the config string, it is the internal symbol when using aeneas
as a Python library.





On 09/09/2018 01:12 PM, Mohammed Ezzat wrote:
> Hi, thank you for your reply and sorry it took me so long
> I'm pretty sure i have the version you mentioned, I guess that I'm
> having an issue where the aeneas can't get the path to aws tts
> I did run it again, and that's what i got
>
> [DEBU] SD: Synthesizing at least 2.500 seconds
> [DEBU] Synthesizer: Selecting TTS engine...
> [DEBU] Synthesizer: TTS engine: eSpeak
> [DEBU] ESPEAKTTSWrapper: No tts_path specified in rconf, setting default
> TTS path
> [DEBU] TTSCache: Cache initialized
> [DEBU] ESPEAKTTSWrapper: TTS path is             espeak
> [DEBU] ESPEAKTTSWrapper: TTS cache?              True
>
> my command was:
>
> /python -m aeneas.tools.execute_task 184167047620.mp4  184167047620.txt
> "task_language=fra-CAN|os_task_file_format=srt|is_text_type=plain|task_adjust_boundary_nonspeech_min=0.100|PPV_TASK_ADJUST_BOUNDARY_NONSPEECH_REMOVE|is_audio_file_detect_head_max=2.5"
> 184167047620_fran.srt -r="tts=aws|tts_cache=True" -v
> /
Reply all
Reply to author
Forward
0 new messages