WaveData: can read only PCM data, format id in file is: 3

mkp...@umich.edu

unread,

Apr 23, 2018, 5:32:30 PM4/23/18

to kaldi-help

Hey Dan,

I'm receiving this error message in log files when running make_mfcc.sh. What is PCM data? And what what does it mean by the format id in file is 3?

# compute-mfcc-feats --verbose=2 --config=/z/mkperez/Replication/Scripts/config/mfcc_config scp,p:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/log/wav_test_adults_man.1.scp ark:- | copy-feats --compress=true ark:- ark,scp:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.ark,/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.scp

# Started at Mon Apr 23 17:21:46 EDT 2018

#

compute-mfcc-feats --verbose=2 --config=/z/mkperez/Replication/Scripts/config/mfcc_config scp,p:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/log/wav_test_adults_man.1.scp ark:-

copy-feats --compress=true ark:- ark,scp:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.ark,/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.scp

ERROR (compute-mfcc-feats[5.0.23~1-f7b2f]:Read():wave-reader.cc:170) WaveData: can read only PCM data, format id in file is: 3

Thanks,

MP

Daniel Povey

unread,

Apr 23, 2018, 5:41:36 PM4/23/18

to kaldi-help

It means your wav files have some form of compression. Reading those is not supported, you'll have to copy (e.g. using sox) to pcm-format wav, which is just 2-byte integers with linear sampling.

Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/766b576c-a31d-40a4-b135-64abf542a9a2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jan Trmal

unread,

Apr 23, 2018, 5:43:22 PM4/23/18

to kaldi-help

wav is essentially just a container that can contain audio data coded in many different ways. Kaldi supports only linear PCM coding, your wav has audio stored in a different code. You could use sox to convert it. You can do it even on-the-fly, using this format of wav.scp

audio sox input.wav -t wav -r 16000 -b 16 - |

(-- for example -- you will have to figure using the right switches using man, sox changes them once in a while)

y.

--

mkp...@umich.edu

unread,

Apr 23, 2018, 7:23:55 PM4/23/18

to kaldi-help

Thank you Dan and Yenda,

I'm using sox as you've said (piping the command directly into the wav.scp file) and am now getting a 4-byte chunk name error. Is this because I am using sox wrong? Or could this be indicating a larger underlying problem with the wav files?

Best,

MP

# compute-mfcc-feats --verbose=2 --config=/z/mkperez/Replication/Scripts/config/mfcc_config scp,p:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/log/wav_test_adults_man.1.scp ark:- | copy-feats --compress=true ark:- ark,scp:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.ark,/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.scp

# Started at Mon Apr 23 19:12:55 EDT 2018

#

compute-mfcc-feats --verbose=2 --config=/z/mkperez/Replication/Scripts/config/mfcc_config scp,p:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/log/wav_test_adults_man.1.scp ark:-

copy-feats --compress=true ark:- ark,scp:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.ark,/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.scp

sh: audio: command not found

ERROR (compute-mfcc-feats[5.0.23~1-f7b2f]:Read4ByteTag():wave-reader.cc:75) WaveData: expected 4-byte chunk-name, got read errror

On Monday, April 23, 2018 at 5:43:22 PM UTC-4, Yenda wrote:

wav is essentially just a container that can contain audio data coded in many different ways. Kaldi supports only linear PCM coding, your wav has audio stored in a different code. You could use sox to convert it. You can do it even on-the-fly, using this format of wav.scp
audio sox input.wav -t wav -r 16000 -b 16 - |
(-- for example -- you will have to figure using the right switches using man, sox changes them once in a while)
y.

On Mon, Apr 23, 2018 at 5:32 PM, <mkp...@umich.edu> wrote:

Hey Dan,
I'm receiving this error message in log files when running make_mfcc.sh. What is PCM data? And what what does it mean by the format id in file is 3?

# compute-mfcc-feats --verbose=2 --config=/z/mkperez/Replication/Scripts/config/mfcc_config scp,p:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/log/wav_test_adults_man.1.scp ark:- | copy-feats --compress=true ark:- ark,scp:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.ark,/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.scp
# Started at Mon Apr 23 17:21:46 EDT 2018
#
compute-mfcc-feats --verbose=2 --config=/z/mkperez/Replication/Scripts/config/mfcc_config scp,p:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/log/wav_test_adults_man.1.scp ark:-
copy-feats --compress=true ark:- ark,scp:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.ark,/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.scp
ERROR (compute-mfcc-feats[5.0.23~1-f7b2f]:Read():wave-reader.cc:170) WaveData: can read only PCM data, format id in file is: 3

Thanks,
MP

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,

Apr 23, 2018, 7:25:46 PM4/23/18

to kaldi-help

audio was not supposed to be part of the command, I think Yenda was using it as an example of a recording-id.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/0b35d763-52e1-42d4-b7e3-8cad98b446ce%40googlegroups.com.

mkp...@umich.edu

unread,

Apr 24, 2018, 12:48:14 PM4/24/18

to kaldi-help

Thank you so much for all the help, I was able to run decoding just fine, however I am achieving a WER rate (all deletions) of 100%.

I’m using the TIDIGITs dataset but I am investigating the whether or not ASR would be able to detect any phones/words which were said when the dataset is recorded with a sampling rate of 420Hz (hardware limited). For this reason my new_TIDIGITS dataset is recorded at 420Hz, but I’ve re-recorded this 420Hz sampling using a separate device at 8k sampling rate because other ASR toolkits don’t go as low as 420Hz. For the results above I was using the 8k sampling rate data, so I am now thinking of using the original 420Hz data samples, however, I am now getting an error:

# compute-mfcc-feats --verbose=2 --config=/z/mkperez/imuphone/kaldi/scripts//mfcc_config scp,p:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/log/wav_test_adults_man.1.scp ark:- | copy-feats --compress=true ark:- ark,scp:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.ark,/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/raw_mfcc_test_adults_man.1.scp

# Started at Tue Apr 24 12:45:35 EDT 2018

#

compute-mfcc-feats --verbose=2 --config=/z/mkperez/imuphone/kaldi/scripts//mfcc_config scp,p:/z/mkperez/imuphone/kaldi/data/full_bassAlsoFull/test_adults_man/val_mfcc/log/wav_test_adults_man.1.scp ark:-

ASSERTION_FAILED (compute-mfcc-feats[5.0.23~1-f7b2f]:MelBanks():mel-computations.cc:126) : 'first_index != -1 && last_index >= first_index && "You may have set --num-mel-bins too large."'

From a high-level view would a Kaldi trained acoustic model be able to decode on 420Hz sampled data using mfcc features?

Daniel Povey

unread,

Apr 24, 2018, 1:53:46 PM4/24/18

to kaldi-help

You won't be able to do ASR with 420Hz speech. Even 8kHz sampling noticeably degrades the intelligibility. You won't usually even be able to hear the first formant if the Nyquist is 220Hz. In fact, the so-called "voiceband" that regular telephones accept, is 300 to 3400 Hz, which starts *above* the highest frequency that your system can hear.

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/809cb2ff-afde-4f14-a72f-de031923aaec%40googlegroups.com.

Reply all

Reply to author

Forward