On 09/13/2016 06:46 AM, Zhang He wrote:
> Hi Alberto,
> This is He ZHANG from China, I'm trying to develop an read-along app for
> English Learning purpose, and I found aeneas extremly helpful. Thank
> you for your amazing work!
>
> *Only with one problem, If the content of text and audio files are only
> 95%~ the same, eg. some of the texts are not spoken in the audio, the
> Aeneas would still output a sync map, but the time mark is wrong. *
> *
Hi,
welcome to the aeneas mailing list!
> *Is Aeneas designed to handle the non-perfect audio-text alignment? *
No, aeneas is not designed to handle such situation:
"Audio should match the text: large portions of spurious text or audio
might produce a wrong sync map"
(from
https://github.com/readbeyond/aeneas/#limitations-and-missing-features )
However, the exact answer actually depends on the structure of those
spurious portions, and on the granularity of the fragments (sentence- vs
word- level sync) you are using.
In general, if your spurious text comes as a large, contiguous chunk ---
for example, you have an head before and/or a tail after the main,
"correct" text --- then aeneas will trip over it. Fortunately, in that
case you can instruct aeneas to ignore X seconds from the start or Y
seconds from the end of the audio file for the purpose of computing the
alignment. To do so you can specify the following parameters in your
configuration string:
is_audio_file_head_length=10
and/or
is_audio_file_tail_length=20
(skip 10 seconds from the beginning, 20 seconds from the end)
For a live example, run:
$ python -m aeneas.tools.execute_task --example-head-tail
See also the documentation:
https://www.readbeyond.it/aeneas/docs/globalconstants.html#aeneas.globalconstants.PPN_TASK_IS_AUDIO_FILE_HEAD_LENGTH
https://www.readbeyond.it/aeneas/docs/globalconstants.html#aeneas.globalconstants.PPN_TASK_IS_AUDIO_FILE_TAIL_LENGTH
Clearly, this will require you to "manually" inspect each audio file, to
evaluate the length of the head/tail, which is inconvenient. However,
there are also options to specify a minimum and a maximum duration for
both the head and the tail, and aeneas will try to figure it out
automatically. Run:
$ python -m aeneas.tools.execute_task --example-sd
for an example.
If instead your spurious chunk is in the middle of the audio file,
probably aeneas simply is not able to process it correctly.
On the other hand, if the spurious parts are scattered through the text
(e.g., sometimes the narrator skips or adds a word, or inverts a few
words, etc.), then aeneas should be able to deal with those.
Finally, let me note that aeneas is known to work well at sentence or
sub-sentence granularity, while it is not perfect at word granularity.
There is another thread in this mailing list about the latter issue.
> If the answer is yes, can you show me how to configure aeneas to handle
> the situation?
> If not, can you point to me what might be the solution?
Try experimenting with the different parameters mentioned above. If
aeneas does not work for you, you can try using another forced aligner.
You can find a list here:
https://github.com/pettarin/forced-alignment-tools
Best regards,
AP