Thanks for the clarification.
So basically you are saying that a human perceives:
0.000 1.200 What have you been doing all these years?
1.200 2.000 GAP (SILENCE/NONSPEECH)
2.000 3.600 I've been going to bed early.
and you would like to get:
0.000 1.200 What have you been doing all these years?
2.000 3.600 I've been going to bed early.
while aeneas produces:
0.000 1.600 What have you been doing all these years?
1.600 3.600 I've been going to bed early.
or
0.000 2.000 What have you been doing all these years?
2.000 3.600 I've been going to bed early.
=== === ===
There might be several reasons for why aeneas is not giving you what you
expect.
The current algorithm to create the "gap" is the following:
1. determine all nonspeech intervals (NSI) using the built-in voice
activity detector (VAD)
2. align without gaps
3. for each pair of consecutive fragments, check if the transition point
between the two fragments occurs inside a NSI (and it is the only
transition point in that NSI), where the length of the NSI is >=
SPECIFIEDLENGTH: if so, create the "gap"
A first reason for the "error" might be that, with default parameters,
the VAD determines a nonspeech intervals only if it has length >= 0.200
s. So, even if you set task_adjust_boundary_nonspeech_min=0.01, it is
"shadowed" by the VAD setting. You can try lowering the minimum length
of nonspeech in the VAD ( -r="vad_min_nonspeech_length=0.040" ), but
note that it does not make sense setting it to a value smaller than the
MFCC shift (default: 0.040s), and therefore the MFCC shift is the
ultimate lower bound to any gap length.
Another reason might be that the transition point is determined to be
outside a nonspeech interval, maybe at the MFCC frame just before or
after it. => You might try enabling the MFCC nonspeech masking and see
if it helps ( -r="mfcc_mask_nonspeech=True" ) .
The VAD might not label correctly the nonspeech interval --- you can try
increasing the vad_log_energy_threshold rconf parameter and see if it helps.
Finally, a bug is ALWAYS an option. ;)
HTH,
AP
Note: since you say that it works for longer pauses, I guess you are
also specifying "task_adjust_boundary_nonspeech_string=REMOVE" in the
task config string, but I just want to mention it for other users who
might not know.
On 05/19/2017 09:23 PM, Cristian Gradisteanu wrote:
> Sorry for the misunderstanding.
>
> If I want the caption to completely disappear between sentences (when
> there is a small pause in narration) I can do that by
> using: task_adjust_boundary_*nonspeech_min=0.01 *but even there is a