Problem using a ffmpeg call in wav.scp

nicolas

unread,

Nov 11, 2015, 9:04:49 AM11/11/15

to kaldi-help

I have been trying to use ffmpeg in kaldi to work directly with any kind of file that ffmpeg can convert to wav. So I tried to set it up in the wav.scp file, e.g.

SAMPLE ffmpeg -y -i SAMPLE.mp4 -ar 16000 -ac 1 -f wav - |

However this does not work. The log says:

Stream #0:0 -> #0:0ERROR (extract-segments:Read():wave-reader.cc:224) Expected 4294967295 bytes in RIFF chunk, but after first data block there will be 70 + 4294967295 bytes (we do not support reading multiple data chunks). (pcm_s16le (native) -> pcm_s16le (native))

From what I have managed to understand, when ffmpeg is used in a pipe it works in a "streaming", that is that the wav file is written as it is generated and so the size of the sample, that is read at the beginning, is not given, which seems to confuse kaldi.

Does anyone have managed to get this to work? I could solve it using somekind of workaround using a temporary file, but is there a cleaner way?

Jan Trmal

unread,

Nov 11, 2015, 10:01:37 AM11/11/15

to kaldi-help

you can try it to pipe through sox

something like
SAMPLE ffmpeg -y -i SAMPLE.mp4 -ar 16000 -ac 1 -f wav - | sox -t wav - -t wav -

we have a good experience with sox "normalizing" the format.

or something to that extent (apparently you will need recent sox (14.4?) for this).

Wav file structure can be quite complex and it's not really not worth of the time to implement "full" wav reader (if it's even possible).

y.

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nise...@gmail.com

unread,

Nov 11, 2015, 11:07:56 AM11/11/15

to kaldi-help, nise...@gmail.com

I did tried the sox normalisation before and it didnt work. Output using the sox normalisation

ERROR (extract-segments:Read():wave-reader.cc:258) WaveData: failed to read data chunk (read no bytes)

WARNING (extract-segments:Read():feat/wave-reader.h:149) Exception caught in WaveHolder object (reading).

Jan Trmal

unread,

Nov 11, 2015, 11:09:13 AM11/11/15

to kaldi-help, nise...@gmail.com

yeah, but this error looks more like the command line is not correctly formed. Can you please send how the wav.scp line looks like?

y.

nise...@gmail.com

unread,

Nov 11, 2015, 11:35:28 AM11/11/15

to kaldi-help, nise...@gmail.com

Of course

SAMPLE_B01 ffmpeg -y -i db/train/SAMPLES_B01.flv -ar 16000 -ac 1 -f wav - | sox -t wav - -t wav - |

El miércoles, 11 de noviembre de 2015, 14:04:49 (UTC), nicolas escribió:

Daniel Povey

unread,

Nov 11, 2015, 1:44:28 PM11/11/15

to kaldi-help, nise...@gmail.com

Try doing the same thing to a temporary file and seeing if the same error happens. It's a little confusing. Perhaps sox is not properly normalizing the format? Check that sox itself can read the file again, and use 'soxi' to print info on the file.
Dan

nise...@gmail.com

unread,

Nov 12, 2015, 4:09:11 AM11/12/15

to kaldi-help, nise...@gmail.com

So executing the order in comand-line:

ffmpeg -y -i SAMPLE.mp4 -ar 16000 -ac 1 -f wav - | sox -t wav - -t wav - | soxi -

Input File : '-' (wav)

Channels : 1

Sample Rate : 16000

Precision : 16-bit

Duration : 37:16:57.73 = 2147483647 samples ~ 1.00663e+07 CDDA sectors

Sample Encoding: 16-bit Signed Integer PCM

Here you can see that something fuzzy is going on, as ffmpeg does not write the correct duration of the sample.

Using a temporary file works fine:

SAMPLE ffmpeg -y -i SAMPLE.mp4 -ar 16000 -ac 1 -f wav /tmp/sample.wav | cat /tmp/samples.wav |

However in a parallel executing you should prepare a different wav.scp for each run, as each should use a differente temporary file, as I think you cannot have variables (or bash command $(mktemp) ) inside scp file.

El miércoles, 11 de noviembre de 2015, 14:04:49 (UTC), nicolas escribió:

Daniel Povey

unread,

Nov 12, 2015, 1:48:18 PM11/12/15

to kaldi-help, Vassil Panayotov, nise...@gmail.com

It looks like both ffmpeg and sox have decided to act in 'streaming mode' where they do not put the correct size information in the header. I think Vassil has solved similar problems in the past.

Dan

Vassil Panayotov

unread,

Nov 13, 2015, 4:13:45 AM11/13/15

to kaldi...@googlegroups.com, Vassil Panayotov, nise...@gmail.com

Yes, I encountered a similar problem sometime ago. The easy fix, that
Dan proposed back then, looks something like:

ID ffmpeg -y -i ID.mp4 -ar 16000 -ac 1 -f wav /tmp/tmp.$$; cat
/tmp/tmp.$$; rm /tmp/tmp.$$ |

I haven't tried the actual command above, but you get the idea. In my
case I ended up not using this method(instead I just converted the
input to WAVs beforehand), but using bash variables in Kaldi's
"rxfilenames" should be possible(if needed), or at least it was ~3
years ago..

Vassil

Daniel Povey

unread,

Nov 13, 2015, 12:15:30 PM11/13/15

to kaldi-help, Vassil Panayotov, nise...@gmail.com

BTW, if someone wants to take the trouble to extend the wav-reading code to support reading data with multiple chunks and with inaccurate length information, I would be open to merging it.

Dan

giuli...@gmail.com

unread,

Apr 11, 2016, 8:11:34 PM4/11/16

to kaldi-help, vassil.p...@gmail.com, nise...@gmail.com, dpo...@gmail.com

Just to let you know a few days ago I opened an issue (https://trac.ffmpeg.org/ticket/5409) at ffmpeg about NIST sphere + shorten format. ffmpeg developers quickly reacted and now I am able to correctly decode all the sphere files that I have access to.
The only unsupported files seems to be NIST sphere containing shortpack or wavpack streams.

So I think it would be great to have complete support to ffmpeg in kaldi, as now it is a viable free alternative to sph2pipe.

Bests,
Giulio

Daniel Povey

unread,

Apr 11, 2016, 9:49:31 PM4/11/16

to Giulio Paci, Kirill Katsnelson, kaldi-help, Vassil Panayotov, nise...@gmail.com

We had started work on modifying the wav-reading code to support reading multiple chunks, which I think was the pain point with ffmpeg, but Kirill (cc'd) is supposed to be testing it and this is taking a while.

Dan

Kirill Katsnelson

unread,

Apr 22, 2016, 1:16:04 PM4/22/16

to dpo...@gmail.com, Giulio Paci, Kirill Katsnelson, kaldi-help, Vassil Panayotov, nise...@gmail.com

I am really sorry about sitting on this for such a long time. I think I
should just release that patch with a couple "golden" wave files that it
would read. Holding off because there are no more thorough tests just
does not work, timewise.

Daniel Povey

unread,

Apr 22, 2016, 1:25:01 PM4/22/16

to Kirill Katsnelson, Giulio Paci, Kirill Katsnelson, kaldi-help, Vassil Panayotov, nise...@gmail.com

OK but if so, please make sure the file size is *extremely* small. We don't want to load up the git repo with tons of binary crap.

Dan

Kirill Katsnelson

unread,

Apr 22, 2016, 1:39:00 PM4/22/16

to dpo...@gmail.com, Giulio Paci, Kirill Katsnelson, kaldi-help, Vassil Panayotov, nise...@gmail.com

I was thinking of synthesizing a 2-3 sample long file in a binary
editor, so the output can be verified in test code. How would I
otherwise confirm the test. I believe that should fit the definition of
extremely small.

I still agree this is "binary crap," and testing e. g. little/big endian
etc. I want to frame a better testing facility, but that applies to many
areas in Kaldi, and there is only one of me.

-kkm

Daniel Povey

unread,

Apr 22, 2016, 1:39:44 PM4/22/16

to Kirill Katsnelson, Giulio Paci, Kirill Katsnelson, kaldi-help, Vassil Panayotov, nise...@gmail.com

sounds fine..

peter.b...@playfultechnology.co.uk

unread,

Sep 1, 2017, 9:34:24 AM9/1/17

to kaldi-help, k...@smartaction.com, giuli...@gmail.com, kirill.k...@smartaction.com, vassil.p...@gmail.com, nise...@gmail.com, dpo...@gmail.com

I think that Kaldi needs a StreamReader class, that can be used instead of the Wave* classes when the input is a stream of raw PCM audio. Such a class would have the audio format parameters set at initialization, expect no RIFF header, and read from the stream until EOF was encountered.

Daniel Povey

unread,

Sep 1, 2017, 2:45:58 PM9/1/17

to peter.b...@playfultechnology.co.uk, kaldi-help, Kirill Katsnelson, Giulio Paci, Kirill Katsnelson, Vassil Panayotov, nise...@gmail.com

I can see that that would be useful.
The main piece of code that that would directly interact with is class
OnlineFeaturePipeline (and wrappers thereof), which has the following
interfaces that are relevant:

void AcceptWaveform(BaseFloat sampling_rate, const
VectorBase<BaseFloat> &waveform);
// InputFinished() tells the class you won't be providing any
// more waveform.
void InputFinished();

Probably a suitable interface would be as follows:
struct PcmStreamReaderOptions {
// format options here.
// Also the chunk size (number of seconds of data to read
// per chunk) could be set here.
};

class PcmStreamReader {
public:
// Constructor. Does not block.
PcmStreamReader(const PcmStreamReaderOptions &opts,
std::istream &is);

// Returns the sampling rate in Hz, as specified in 'opts'.
BaseFloat SamplingRate() const;

// Attempts to read the next chunk of data from the stream.
// May block while reading it. The chunk of data will represent
// a signal of duration not exceeding the chunk size specified
// in the options. It may only be less than the specified chunk
// size if we have reached the end of the input.
// If after reading 'chunk' we have reached the end of the input
// (EOF), 'finished' will be set to true; otherwise it will be set to false.
void ReadNextChunk(Vector<BaseFloat> *chunk,
bool *finished);
private:
...

};

But in general I don't like to check in code that's not used anywhere in Kaldi.
It's not clear to me where we could actually use this. It would be possible
to write a simple decoder that read from just one pcm stream and
decoded online, but that would be very wasteful if you had to run the
program again each time. Writing a server that accepts some kind of
protocol request seems to start to get beyond the core scope of Kaldi,
and I'm concerned about how we would maintain such a thing.

Dan

peter.b...@playfultechnology.co.uk

unread,

Sep 1, 2017, 8:06:35 PM9/1/17

to kaldi-help, peter.b...@playfultechnology.co.uk, k...@smartaction.com, giuli...@gmail.com, kirill.k...@smartaction.com, vassil.p...@gmail.com, nise...@gmail.com, dpo...@gmail.com

Here's some background. As mentioned before, I'm trying to transcribe a podcast. To that end, I decode the audio with ffmpeg, pipe the PCM to a speech recognition component, and then pipe the transcribed text to a script that uploads it to a Wiki.

My speech recognition component was orignally built with CMU Sphinx 4. It wasn't difficult to set that up to read raw PCM from stdin and write text to stdout, but the accuracy was terrible, so I'm trying to substitute Kaldi. Decoding to a file and then transcribing that shows that Kaldi's accuracy is likely to be good enough for manual editing, which is my usabiliy threshold. However, Kaldi's IO assumes the presence of a RIFF header, which isn't present when ffmpeg decodes to a raw pcm stream.
So I'll create a stream reader class, and an example program that uses it, and then post a pull request so that you can include it in Kaldi's codebase going forward if you like it.

Ewald

unread,

Sep 1, 2017, 8:17:47 PM9/1/17

to kaldi...@googlegroups.com

--
Go to http://kaldi-asr.org/forums.html find out how to join
---

Jan Trmal

unread,

Sep 1, 2017, 8:57:18 PM9/1/17

to kaldi-help, Kirill Katsnelson, nise...@gmail.com, giuli...@gmail.com, dpo...@gmail.com, peter.b...@playfultechnology.co.uk, vassil.p...@gmail.com, kirill.k...@smartaction.com

And why don't you just set the ffmpeg output format to wav?

Y.

...

Kirill Katsnelson

unread,

Sep 1, 2017, 9:13:27 PM9/1/17

to peter.b...@playfultechnology.co.uk, Kirill Katsnelson, nise...@gmail.com, giuli...@gmail.com, dpo...@gmail.com, kaldi-help, vassil.p...@gmail.com

I though ffmpeg could stream wav to stdout with "-f wav". I vaguely remember adding the support for the streamed wav file (it has a special value in the length field). Make sure to use the 16-bit signed integer PCM ("-acodec pcm_s16le"). Did you try that, and if you did, what did not work exactly?

-kkm

peter.b...@playfultechnology.co.uk

unread,

Sep 2, 2017, 2:18:48 PM9/2/17

to kaldi-help, peter.b...@playfultechnology.co.uk, k...@smartaction.com, nise...@gmail.com, giuli...@gmail.com, dpo...@gmail.com, vassil.p...@gmail.com

OK, that works, but I'm now getting an unrelated error, which I'll post in another thread.

Reply all

Reply to author

Forward