Mismatch between audio duration and number of frames returned by "ali-to-phones --write-lengths=true ...."

30 views
Skip to first unread message

Sergei Tushev

unread,
Sep 2, 2024, 10:09:03 AM9/2/24
to kaldi-help
Hello.  

I use default MFCC --frame-length=25. So, for 0.75 sec audio I should get 30 frames without offset.  
When I align this audio with nnet3-align-compiled, I get only 26 frames, e.g. "utt-1  SIL 10 ; AA 6 ; SIL 10".  
What could be the reason?  

Daniel Povey

unread,
Sep 2, 2024, 10:46:27 AM9/2/24
to kaldi...@googlegroups.com
Probably end effects, I think one of the mfcc options relates to that (whether num frames is reduced by end effects), but also Kaldi models have end effects, as you would get from convolution without padding.
--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/213c101a-380c-41d1-b56f-c0d0791e9ecfn%40googlegroups.com.

Sergei Tushev

unread,
Sep 2, 2024, 11:04:41 AM9/2/24
to kaldi-help
Thank you. 
If I understand correctly, there is no way to fix this quickly without changing the model?

понедельник, 2 сентября 2024 г. в 17:46:27 UTC+3, Daniel Povey:
Probably end effects, I think one of the mfcc options relates to that (whether num frames is reduced by end effects), but also Kaldi models have end effects, as you would get from convolution without padding.

On Monday, September 2, 2024, Sergei Tushev <tushev...@gmail.com> wrote:
Hello.  

I use default MFCC --frame-length=25. So, for 0.75 sec audio I should get 30 frames without offset.  
When I align this audio with nnet3-align-compiled, I get only 26 frames, e.g. "utt-1  SIL 10 ; AA 6 ; SIL 10".  
What could be the reason?  

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,
Sep 3, 2024, 3:48:19 AM9/3/24
to kaldi...@googlegroups.com
Not really, but the end effect should be fixed and symmetric so you can compensate for it in post processing


On Monday, September 2, 2024, Sergei Tushev <tushev...@gmail.com> wrote:
Thank you. 
If I understand correctly, there is no way to fix this quickly without changing the model?

понедельник, 2 сентября 2024 г. в 17:46:27 UTC+3, Daniel Povey:
Probably end effects, I think one of the mfcc options relates to that (whether num frames is reduced by end effects), but also Kaldi models have end effects, as you would get from convolution without padding.

On Monday, September 2, 2024, Sergei Tushev <tushev...@gmail.com> wrote:
Hello.  

I use default MFCC --frame-length=25. So, for 0.75 sec audio I should get 30 frames without offset.  
When I align this audio with nnet3-align-compiled, I get only 26 frames, e.g. "utt-1  SIL 10 ; AA 6 ; SIL 10".  
What could be the reason?  

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/20076d81-9ec7-404b-891f-4d02f9042a67n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages