Hi All! 👋
I'm new to this forum, but have been following progress on Aeneas for a while now.
I think it's a great tool and this looks like a great community :)
We're thinking about using Aeneas in a number of projects, which is exciting.
We will need word level timings so I'm using 1.7.3 and I separated the text file I'm submitting so each word is on a separate line and am using the flag --presets-word to get more accurate timings.
If I'm understanding the docs correctly adding --presets-word should handle the MFCC nonspeech masking for me. Perhaps I'm misunderstanding the definition of a multi line text format or need to fine tune manually.
My text file looks like this:
I'm
used
to
the
idea
of
dying
while
I
have
no
desire
to
die
for
the
like's
of
you.
Basically I run something like:
python -m aeneas.tools.execute_task \
kirk07.mp3 \
kirk07.txt \
"task_language=eng|os_task_file_format=json|is_text_type=plain" \
kirk07map.json --presets-word
The only issue is that I'm not seeing gaps between words in the outputted JSON timings. ie the end time of a word matches the start time of the next word, which is especially problematic when there are large pauses in the speech.
Here's an example of the output I get:
{
"begin": "1.240",
"children": [],
"end": "1.320",
"id": "f000007",
"language": "eng",
"lines": [
"dying"
]
},
{
"begin": "1.320",
"children": [],
"end": "1.800",
"id": "f000008",
"language": "eng",
"lines": [
"while"
]
},
There's a significant pause between "dying" and "while" which is not reflected in the output, and actually "while" occurs sometime after 4 seconds, although I wonder if that is a separate issue.
I've looked back through some of the discussions in this forum but to no avail, any suggestions are gratefully received.
Thanks in advance!
Mark