incorrect time stamp while decoding with chain model

crpatel

unread,

Mar 27, 2018, 7:12:40 AM3/27/18

to kaldi-help

I have trained chain model using my own data. The model performs correctly with respect to WER. However, the time stamps given by it are not accurate. Time stamps seems accurate at the beginning of audio but start shifting when it approaches towards the end of audio. Our audio files are of length between 2 to 8 minutes. The shift is particularly visible in long files. It seems that the offset in the word time stamp is increasing towards the end of the audio.

My decoding pipeline is as follows:
lattice-push | lattice-align-words | lattice-to-ctm-conf

Could you please explain why this is happening?

Jan Trmal

unread,

Mar 27, 2018, 10:37:30 AM3/27/18

to kaldi-help

Use the correct --frame-shift parameter (chain models usually run on factor 3 subsampling of the original audio parametrization rate).

y.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/31be4565-9890-408f-afff-193ec36dda40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

crpatel

unread,

Mar 28, 2018, 2:19:03 AM3/28/18

to kaldi-help

I have already multiplied the --frame-shift by 3. The time shift is gradually increasing towards the end of the audio. For example towards the end of the audio, I can observe the difference of around 1 second between time shown in ctm and the actual time in audio.

On Tuesday, 27 March 2018 20:07:30 UTC+5:30, Yenda wrote:

Use the correct --frame-shift parameter (chain models usually run on factor 3 subsampling of the original audio parametrization rate).
y.

On Tue, Mar 27, 2018 at 7:12 AM, crpatel <chirag...@gmail.com> wrote:

I have trained chain model using my own data. The model performs correctly with respect to WER. However, the time stamps given by it are not accurate. Time stamps seems accurate at the beginning of audio but start shifting when it approaches towards the end of audio. Our audio files are of length between 2 to 8 minutes. The shift is particularly visible in long files. It seems that the offset in the word time stamp is increasing towards the end of the audio.

My decoding pipeline is as follows:
lattice-push | lattice-align-words | lattice-to-ctm-conf

Could you please explain why this is happening?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,

Mar 28, 2018, 1:51:54 PM3/28/18

to kaldi-help

Kaldi isn't really designed to operate on such long files. It would normally be better to break it up into smaller chunks and decode those chunks.

It's hard to diagnose this without knowing more about your decoding pipeline-- e.g. is it using the program online2-nnet3-wav-latgen-faster or an offline decoding setup? Does lattice-align-words print any warnings? If offline, check that the number of frames in the file is what you thought it should be based on the duration in seconds.

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/80bf20a5-5a1c-4623-92cf-089bbdb7400e%40googlegroups.com.

Reply all

Reply to author

Forward