Hello Andrew,
welcome to the aeneas mailing list.
Thank you for providing the input files and parameters, I was indeed
able to reproduce your problem, and thus I acknowledge there is a bug
somewhere in aeneas.
The segfault is generated in the cdtw C extension when it tries to align
a zero-length interval of real audio with a non-zero length synthetic
audio, and this happens at a leaf (in fact, the very first leaf) of the
"tree" representing the text, i.e. at level 3 (word level). [1]
While one can argue that it is not nice that the C code in a C extension
can segfault crashing the whole Python interpreter [0], I think that the
Python code should not have called the cdtw C code in the first place in
such a situation.
After inspecting your WAV file, I tried adding the following parameters
to the task config string [2]:
is_audio_file_head_length=13.400|is_audio_file_tail_length=71.200
to the config string, to exclude the head and the tail of your audio
file, and this happens to prevent the bug to be triggered, so you get an
output file eventually.
Unfortunately, if you also add the "--presets-word" switch at runtime in
addition to the head/tail parameters [3], you will trigger again the
bug, but now at a later leaf (level 3, fragment 372), so it does not
help that much.
For the record, I also converted your input file in plain format, one
word per line, and run with the parameters [4], and it completes,
although probably the alignment quality is worse than with mplain.
I think the fix for this bug should be rather simple, it is just a
matter of finding the place where the check for zero-length is missing
in the recursion tree. I will investigate this later tonight.
Finally, let me just comment that perhaps aeneas is not the right tool
if you need word-level, high-precision alignment while also feeding long
audio files with noise in it (clapping, laughing, overlapping speakers).
If you work with materials in English, you have plenty of alternatives,
not sure if you checked them out (
https://github.com/pettarin/forced-alignment-tools ).
Anyway, thank you for reporting this issue and providing the input
files/parameters to reproduce it, I will keep you posted.
Best regards,
Alberto Pettarin
For Willem: thank you for sharing your script. Just a note: if you want
to prevent zero-length fragments, the parameter in your config string
should be: task_adjust_boundary_no_zero=True as of v1.7.1. Note the
"=True" part, and the fact that it was renamed for consistency since
v1.7.0. Documentation is at:
https://www.readbeyond.it/aeneas/docs/globalconstants.html#aeneas.globalconstants.PPN_TASK_ADJUST_BOUNDARY_NO_ZERO
Also note that this parameter affects the "post-processing" of the sync
map tree computed via DTW, so it would help with this particular issue.
Footnotes:
[0] TODO for me: add a safety net in cdtw to prevent the segfault.
[1] the fact that the execution tree has a node performing an alignment
between a zero-length interval in the real audio and a non-zero-length
synthetic interval is a symptom of a mis-alignment at the previous level
(level 2, sentence) --- however this might happen, given how aeneas works.
[2] command line:
$ python -m aeneas.tools.execute_task qanda_2012_ep99_climate.wav
qanda_2012_ep99_climate.mp4.txt
"task_language=eng|is_text_type=mplain|os_task_file_format=json|is_audio_file_head_length=13.400|is_audio_file_tail_length=71.200"
qanda_2012_ep99_climate.mplain.json -v -l=mplain.log
[3] command line:
$ python -m aeneas.tools.execute_task qanda_2012_ep99_climate.wav
qanda_2012_ep99_climate.mp4.txt
"task_language=eng|is_text_type=mplain|os_task_file_format=json|is_audio_file_head_length=13.400|is_audio_file_tail_length=71.200"
qanda_2012_ep99_climate.mplain.json -v -l=mplain.log --presets-word
[4] command line:
$ python -m aeneas.tools.execute_task qanda_2012_ep99_climate.wav
qanda_2012_ep99_climate.plain.words.txt
"task_language=eng|is_text_type=plain|os_task_file_format=json|is_audio_file_head_length=13.400|is_audio_file_tail_length=71.200|task_adjust_boundary_nonspeech_string=REMOVE"
qanda_2012_ep99_climate.plain.removed.json -v -l=plain.log --presets-word
On 02/15/2017 02:45 AM, QA Collective wrote:
> Hello Alberto & All,
>
> First of all, congratulations on a really great piece of software. Even
> though I haven't quite got it working yet ... I can already see that it
> is a very professionally run project with attention to detail and
> excellent documentation and good old fashioned quality design! I'm
> anticipating that aeneas may be able to do a lot of useful work in my
> pursuit of training DNNs for speech recognition. Unfortunately, I've
> been getting a seg fault while running Aeneas on multilevel mode. I'll
> try to be detailed without going too far...
>
> *What I'm doing: *trying to achieve word level alignment from a TV talk
> show which provides transcripts which are well written but do not
> include noises and misspeak from the speakers. I'm providing input
> transcript in MPLAIN format, requesting CSV or SMIL output and providing
> the '--presets-word' option.
>
> *What happens:* Aeneas appears to work as expected (from what I can see
> *What I have tried:* The one thing I have tried that works is running
> Aeneas on PLAIN formatted word level input (one word per line) and not
> passing the '--presets-word' parameter. Unfortunately the results
> returned by this have words being badly out of sync after about 60
> seconds. This is a complete list of the things I've tried to get
> multilevel alignment working before writing this email ... with no
> success thus far.
>
> * Ensure that all Aeneas diagnostics pass
> * Ensure the input file is clean - removed punctuation and extra \n so
> that there is only 1 x \n between sentences and \n\n between paragraphs
> * Install and run Aeneas on both Python 3.5 and 2.7
> * Pass the -r="c_extensions=False" parameter
> * Change requested output formats
> * Provide a WAV file (PCM 16khz 1 chan little endian) in stead of the
> MP4 video file
>
> The more options I try, whenever I see the log above I now wonder if the
> problem is:
>
> * A bug in the C extensions
> * Bad wav file input
> * Possibly a combination of both the above?
> * I need to compile the C extensions manually (but I thought this was
> done inside the pip install process?)
> * Still bad TXT input?
>
>
> I'd appreciate any suggestions, guidance or help. I feel as though I'm
> quite close to getting it working for hundreds of hours of audio!
>
>
> Thanks in advance,
>
> Andrew
>
> --
> You received this message because you are subscribed to the Google
> Groups "aeneas-forced-alignment" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
aeneas-forced-ali...@googlegroups.com
> <mailto:
aeneas-forced-ali...@googlegroups.com>.
> <mailto:
aeneas-forc...@googlegroups.com>.
> <
https://groups.google.com/d/msgid/aeneas-forced-alignment/d528429f-94b3-4d51-80ab-ec517ae121cb%40googlegroups.com?utm_medium=email&utm_source=footer>.
Alberto Pettarin
web:
http://readbeyond.it/
web:
http://www.albertopettarin.it/
twitter:
http://twitter.com/acutebit/
skype: alberto_pettarin
mobile: +39 340 82 18 704