--snip-edges=false policy

Daniel Povey

unread,

Mar 17, 2016, 6:39:25 PM3/17/16

to kaldi-developers

Everyone,

Whenever we start a new recipe (e.g. using new data, or a new version of an existing recipe, like s5c->s5d), let's make it a practice to add the config variable

--snip-edges=false

in all feature-extraction config files in conf/, such as mfcc.conf and mfcc_hires.conf.

This will ensure that the number of frames is related to the length of the file in a consistent and obvious way.

The original Kaldi feature-extraction code aimed to be compatible with HTK, which truncates two or three frames so each frame fits entirely in the file; but this is a hassle and has led to all kinds of inconsistency regarding the meaning of 'segments' files. It's been hard to switch away from that, because if you change the config any existing recipe, the alignments would no longer match the existing extracted features if people re-ran the mfcc feature extraction.

I think the only way is to make the change is to use --snip-edges=false whenever we write recipes from scratch.

Dan

Daniel Povey

unread,

Apr 1, 2016, 3:00:00 PM4/1/16

to kaldi-developers

Everyone,
For now this recommendation to add '--snip-edges=false' to new recipes is on hold- just leave it as it was. A user (Nicolas Serrano) found a problem regarding the interaction between this option and online decoding that is very hard to fix, i.e. it would require a lot of code changes. Since the problem of the num-frames being a little less than you'd expect from the length of the wave was not that severe in the first place, I'm inclined to just leave things as they were.

Dan

Tony Robinson

unread,

Apr 2, 2016, 1:38:35 PM4/2/16

to kaldi-de...@googlegroups.com

I'd like to restate my position from a couple of years ago in favour of --snip-edges=false (I'll provide a ref to this mailing list if anyone needs it).

In the '90's we tackled all of these issues. Assuming that the frame rate is 10ms, it's so much more intuitive that the first frame is from 0ms to 10ms and the next one from 10ms to 20ms than any other frame shift that dependent on models, decoder and everything else. Long term I believe it helps development, as well as providing some sort of recognition of the frames at the very start and end of files.

The problem exists at two levels. At the signal processing level I reversed the speech so that the was something to fill up the hamming window. As the NN input level you can assume that frames at t<0 are the same as t = 0, i.e. just replicate the first and last acoustic features. At the frame level we are only losing one or two frames and the main argument here is that it's easies to keep track of recognition times if --snip-edges=false. The acoustic feature level is much more serious as it depends on the model as well - I assume that we are not talking about that as num-frames is only "a little less than you'd expect "

I lost my argument a couple of years ago and Kaldi has excelled since then so it's clearly not the biggest issue.

Tony

--
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To post to this group, send email to kaldi-de...@googlegroups.com.
Visit this group at https://groups.google.com/group/kaldi-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/CAEWAuySkQjgd6N_TdSp3QfUH9h2VUGwKYG0%2Bzn0oSOzcwVC7bA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
Speechmatics is a trading name of Cantab Research Limited
We are hiring: www.speechmatics.com/careers
Dr A J Robinson, Founder, Cantab Research Ltd
Phone direct: 01223 794096, office: 01223 794497
Company reg no GB 05697423, VAT reg no 925606030
51 Canterbury Street, Cambridge, CB4 3QG, UK

Daniel Povey

unread,

Apr 2, 2016, 1:40:06 PM4/2/16

to kaldi-developers

I'll try to think about how the online-decoding code could be modified to support --snip-edges=false.

Dan

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/57000391.3000306%40speechmatics.com.

Matthew Aylett

unread,

Apr 2, 2016, 5:13:23 PM4/2/16

to kaldi-de...@googlegroups.com

Hi

For speech synthesis (I know - not a big concern for Kaldi at the moment) being able to easily align features with original speech (or synthesised speech is fundamental for us poor TTS people who have to listen ro output speech and wonder why it sounds bad.

So I am also very much in favour of -snip-edges=false as a working option.

best

Matthew

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/57000391.3000306%40speechmatics.com.

Reply all

Reply to author

Forward