Decode Aspire without segments

233 views
Skip to first unread message

Ernst Nusterer

unread,
Feb 12, 2019, 3:30:12 PM2/12/19
to kaldi-help
Hello,

I managed to get the NN3 running (thanks a lot guys !!) and now I want to prepare the decoding.
I am using mostly m-ailabs data in German, without segments, and use the Aspire NN3 recipe.

When I run

local/nnet3/decode.sh dev exp/tdnn_lstm_1a_chain_online/graph_pp data/dev exp/tdnn_lstm_1a_chain_online/

(data is in data/dev , utterances are not segmented, dev_hires does not exist)

I get an error:

....
Succeeded creating CMVN stats for dev_hires
+ utils/fix_data_dir.sh data/dev_hires
fix_data_dir.sh: kept all 5000 utterances.
+ utils/validate_data_dir.sh --no-text data/dev_hires
utils/validate_data_dir.sh: Successfully validated data-directory data/dev_hires
+ '[' 1 -le 3 ']'
+ echo 'local/generate_uniformly_segmented_data_dir.sh: Generating uniform segments with length 10 and overlap 5.'
local/generate_uniformly_segmented_data_dir.sh: Generating uniform segments with length 10 and overlap 5.
+ '[' -d data/dev_uniformsegmented_hires ']'
+ '[' '!' -f data/dev_hires/segments ']'
+ utils/data/get_segments_for_data.sh data/dev_hires
utils/data/get_utt2dur.sh: segments file does not exist so getting durations from wave files
utils/data/get_utt2dur.sh: could not get utterance lengths from sphere-file headers, using wav-to-duration
run.pl: 4 / 4 failed, log is in data/dev_hires/log/get_durations.*.log
utils/data/get_utt2dur.sh: there was a problem getting the durations


in  data/dev_hires/log/get_durations.1.log  there is

 1 # wav-to-duration --read-entire-file=false scp:data/dev_hires/split4utt/1/wav.scp ark,t:data/dev_hires/split4utt/1/utt2dur
  2 # Started at Tue Feb 12 21:26:20 CET 2019
  3 #
  4 wav-to-duration --read-entire-file=false scp:data/dev_hires/split4utt/1/wav.scp ark,t:data/dev_hires/split4utt/1/utt2dur
  5 LOG (wav-to-duration[5.5.195~1-6f565]:main():wav-to-duration.cc:92) Printed duration for 0 audio files.
  6 # Accounting: time=0 threads=1
  7 # Ended (code 1) at Tue Feb 12 21:26:20 CET 2019, elapsed time 0 seconds

in data/dev_hires there are 2 empty files: segments and utt2dur. I assume that the empty segmenst file is the problem, but I can not find where it comes from.

Thanks a lot for your help,
Ernst

Daniel Povey

unread,
Feb 12, 2019, 3:54:03 PM2/12/19
to kaldi-help
You have to find out why wav2duration failed, e.g. was data/dev_hires/split4utt/1/wav.scp empty, and if so, why?
Maybe your original wav.scp was empty or mal-formed?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/7660ff58-1340-4776-a7c5-7875f17ebcce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ernst Nusterer

unread,
Feb 12, 2019, 5:24:21 PM2/12/19
to kaldi-help

Shujian Liu

unread,
Jul 3, 2019, 8:20:44 PM7/3/19
to kaldi-help
Hi,
   I got the same issue recently. An easy fix would be to avoid split and merge results for utt2dur. Just process the whole file here: https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/utils/data/get_utt2dur.sh#L97. I think this commit breaks the aspire decode code: https://github.com/kaldi-asr/kaldi/pull/2326. I am still digging into this.

Daniel Povey

unread,
Jul 3, 2019, 8:45:51 PM7/3/19
to kaldi-help
I don't think it's about that, because all 4 of the jobs failed, and anyway there would be at least one wav file if there was at least one utterance.
I suspect the wav.scp may have been absent or empty to start with, or the data dir would have failed validation.  Check that you can validate the data dir before calling that.



--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Jul 3, 2019, 8:49:31 PM7/3/19
to kaldi-help
Anyway, having empty segments and utt2dur would have broken validation.  In all cases you should figure out why the data-dir was invalid initially.
Likely some previous step went wrong.  You could delete the directory, go through all the stages that created it, and see which one makes it fail to validate; see utils/data/validate_data_dir.sh.

Shujian Liu

unread,
Jul 4, 2019, 12:48:21 AM7/4/19
to kaldi-help
Hi Dan, I finally found out the reason. It is due to the fast writing speed of SSD disk. The full reason is complicated to explain but people using mechanical disk will not see this.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Shujian Liu

unread,
Aug 6, 2019, 12:41:14 PM8/6/19
to kaldi-help
Short answer: in https://github.com/kaldi-asr/kaldi/blob/master/egs/aspire/s5/local/generate_uniformly_segmented_data_dir.sh#L65, you can save to a temporary file and then copy back to data/${data_set}_hires/segments 

Long answer: ">" works in a way that it creates an empty file and then adds line by line. This is not a problem for mechanical disk since it is slow, but for SSD, this will cause an error. generate_uniformly_segmented_data_dir.sh calls get_segments_for_data.sh calls get_utt2dur.sh calls utils/data/split_data.sh. In this line (https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/utils/split_data.sh#L119), $data/segment is not supposed to be there but with SSD disk, there wii be an empty segment file and then it will to all files to be empty (such as utt2dur)

Hope this helps.

- Shujian



On Tuesday, February 12, 2019 at 3:30:12 PM UTC-5, Ernst wrote:

Daniel Povey

unread,
Aug 6, 2019, 1:14:15 PM8/6/19
to kaldi-help
Thanks a lot.  Can you please create a PR to fix it?  Let me know if you can't, so I can ask someone else.
Write to segments.tmp... mktemp tends to give a lot of system-compatibility hassles.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/72cd4c54-542b-49ae-b3fc-53160433f301%40googlegroups.com.

Shujian Liu

unread,
Aug 6, 2019, 1:31:54 PM8/6/19
to kaldi-help


On Tuesday, August 6, 2019 at 1:14:15 PM UTC-4, Dan Povey wrote:
Thanks a lot.  Can you please create a PR to fix it?  Let me know if you can't, so I can ask someone else.
Write to segments.tmp... mktemp tends to give a lot of system-compatibility hassles.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages