nnet2 SMBR step makes decoder to generate long silences

286 views
Skip to first unread message

nise...@gmail.com

unread,
Jan 22, 2016, 11:56:44 AM1/22/16
to kaldi-help
Hi all,

I am performing a standard nnet2 system training using a particulary noisy database. Using nnet1, the final system (with SMBR) get around 34 of WER, while the nnet2 final system gets 40. I realised that previous to SMBR both system were around 37, so in nnet2, SMBR degraded the results.

The reason is that in test samples decoded with nnet2 after SMBR, some (less than 10% of the test) contain long silences and only a few words were decoded, so a lot of insertions are made. In those same samples, previous to SMBR the expected number of words were correctly decoded. For instance a sample containing 20 words, will get 20 words recognised previous to SMBR, and 4 after SMBR is performed.

In an old post (http://sourceforge.net/p/kaldi/discussion/1355348/thread/e25f4d44/?limit=25), someone had a similar problem and it seems to be related with the iVector estimation, when there is a big difference between the amount of silence in train and test. I tried the proposed solution (change --maxcounts to less than 100), now the long silences are gone but the results on the samples that this did not occur are worse.

Do any of you have any experience with any similar problem? Could also this be happening in the lattice decoding of the training during the SMBR?

Thanks in advance,

N

Daniel Povey

unread,
Jan 22, 2016, 1:43:35 PM1/22/16
to kaldi-help
I think the problem is likely either with the iVectors or with the SMBR, as separate issues.  Try the silence-weighting option (down-weighting silence counts in iVector estimation, e.g. to 0.0001) to fix the iVector issue.

Are you sure that in the nnet2 recipe you actually ran SMBR and not BMMI?  Because I think the default example scripts run BMMI- it's an option to the discriminative training script I think.  They behave differently.  Possibly there was divergence- check the objective functions.  But discriminative training is always a little sensitive.
Dan


Dan

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mne...@semanticmachines.com

unread,
Jan 22, 2016, 2:25:10 PM1/22/16
to kaldi-help, dpo...@gmail.com
I have a similar question. I was trying models built from Fisher and (downsampled) Librispeech, and the GMM portion looked fine, but the nnet2 models were a disaster. The error rate was around 90%, and it appeared that the main reason was a deletion rate of around 80%. (The recognition output appeared to be regions of moderately accurate text surrounded by vast regions of emptiness).

Because the GMM model gave about the same performance as the pure Fisher GMM, I was wondering if this was an iVector issue. But are the parameter changes you are suggesting below for run-time or for training-time? I will try them anyway to see if they make a difference.

And more generally, are there any other gotchas that might have caused this problem during training? (FYI I am only using the 460 hour clean portion of Librispeech, so far)

Daniel Povey

unread,
Jan 22, 2016, 2:29:30 PM1/22/16
to mne...@semanticmachines.com, kaldi-help
I have a similar question. I was trying models built from Fisher and (downsampled) Librispeech, and the GMM portion looked fine, but the nnet2 models were a disaster. The error rate was around 90%, and it appeared that the main reason was a deletion rate of around 80%. (The recognition output appeared to be regions of moderately accurate text surrounded by vast regions of emptiness).

That's higher error rate than I would expect even if you had a data mismatch.  Did you try it on Fisher or Librispeech data?  One possible cause is that you rebuilt the iVector extractor at some point and re-extracted iVectors for either training or test but not both-- check the file times. When you rebuilt it gives you a different incompatible vector space.


Because the GMM model gave about the same performance as the pure Fisher GMM, I was wondering if this was an iVector issue. But are the parameter changes you are suggesting below for run-time or for training-time? I will try them anyway to see if they make a difference.

Run-time is the most important here.  It's not clear if that change (RE downweighting silence) is helpful in training..

Dan

Vimal Manohar

unread,
Jan 22, 2016, 4:59:30 PM1/22/16
to kaldi...@googlegroups.com, mne...@semanticmachines.com
There's a parameter --adjust-priors in steps/nnet2/train_discriminative.sh that you can set to true. This is used by default in nnet1. This might give a little improvement wrt deletions, if any.
You can first check if the iVectors are an issue by training a system without iVectors.
iVectors need to be estimated only on the speech regions of the utterances. There is a two-pass decoding approach used in the script egs/aspire/s5/local/multi_condition/prep_test_aspire.sh to downweight silence parts. You can try to look into that.

Vimal
 
--
Vimal Manohar
PhD Student
Electrical & Computer Engineering
Johns Hopkins University

nise...@gmail.com

unread,
Jan 25, 2016, 12:35:18 PM1/25/16
to kaldi-help, dpo...@gmail.com
I just tried the silence-weighting option as you suggested and it did help the results, but the improvement from SMBR in nnet2 is still far from the same in nnet1. I have still to better tune this parameter though.

As it has been suggested, do you think that this option may help if it is also applied during the CE training, or in the denominator lattice extraction of the training of SMBR? The training data I am using is quite noisy and contain long silences. 

Also, could you tell me where could I refer (apart from reading the source code) to better understand the issue between silences and iVectors?

Daniel Povey

unread,
Jan 25, 2016, 3:57:14 PM1/25/16
to nise...@gmail.com, kaldi-help
Vimal, the --adjust-priors option seems to only exist in the newer version of the script, train_discriminative2.sh.  Using that might be helpful.
Since the deletions only appear after sMBR training, it's unlikely to be related to the iVectors.  (However, make sure that you are not making a silly mistake like forgetting to supply the ivector-related options when you do discriminative training.
It would be useful to know what command line you used for your nnet2 training and for your discriminative training.
Dan


nise...@gmail.com

unread,
Jan 26, 2016, 6:09:16 AM1/26/16
to kaldi-help, nise...@gmail.com, dpo...@gmail.com
The recipe I am using is https://github.com/kaldi-asr/kaldi/blob/master/egs/tedlium/s5/local/online/run_nnet2_ms_disc.sh


and the command is:

steps/nnet2/train_discriminative2.sh --cmd $train_cmd --stage -10 --effective-lrate 0.000005 --criterion smbr --drop-frames false --num-epochs 4 --parallel-opts ""  --num-jobs-nnet 6 --num-threads 1 --remove-egs true exp/nnet2_online/nnet_ms_sp_degs exp/nnet2_online/nnet_ms_sp_smbr_0.000005

I will try out the option you suggested. Also, I double check the iVector options and everything is fine, thanks for the advice though, it's easy to mess up 

nise...@gmail.com

unread,
Jan 26, 2016, 12:35:49 PM1/26/16
to kaldi-help, nise...@gmail.com, dpo...@gmail.com
I have tested a few things. 

1.- Using -adjust-priors in SMBR training makes no improvement
2.- Using --sil-weight=0.01 in decoding , helps but it is still far from the SMBR nnet1 improvement
3.- I test an option I found in train_discriminative2.sh, set one_silence_class=false in SMBR training, it helps as much as option 2.
4.- Use sil-weight=0.01 in decoding after 3.-, degrades the results.

It seems a silence related issue with SMBR, how is silence so tricky in this step?

Daniel Povey

unread,
Jan 26, 2016, 2:02:23 PM1/26/16
to nise...@gmail.com, kaldi-help
Hm.  Both sMBR and iVectors can sometimes cause weirdness with silence and noise.  It could be a combination of the two, but this will tend to be hard to debug.  Try decoding intermediate epochs (e.g. decode epoch1.mdl and so on).
With AsPIRE we found that sMBR helped on one test test but decreased performancet on another.  Ultimately this is going to be hard to debug.  I am trying to find alternatives to iVectors for low-cost adaptation, as they don't seem to be very robust to unseen noise etc.

Make sure that you didn't overwrite the iVector-extractor directory and then fail to regenerate the iVectors and the models.   However, this should cause the results to be very bad- I doubt this is your problem.

Dan

nise...@gmail.com

unread,
Jan 29, 2016, 12:50:05 PM1/29/16
to kaldi-help, nise...@gmail.com, dpo...@gmail.com
Just revised the test I did and I found an error in my previous report. The option that helped them most was using adjust_prior in SMBR training and silweight in decoding. This two manage to improve nnet2 SMBR around half of the improvement of SMBR with nnet1.

I have also tried to change --online to false while decoding and it substantially degraded the results, so I think that the problem is mostly caused by a test with long chunk of silences and noise, so I should better choose another more representative test, or segment my current one.

I will report if I manage some more improvements on this.

Thanks for your assistance!

Daniel Povey

unread,
Jan 29, 2016, 2:35:52 PM1/29/16
to nise...@gmail.com, kaldi-help
OK.  SMBR helps less in nnet2 even when it's working.  I think the reason is that nnet1 uses the newbob learning rates schedule (which is similar to early stopping, but implemented via decreasing the learning rate), which gives you a less aggressively trained model to start discriminative training with.  At some point we are going to implement newbob on nnet3 and maybe nnet2.

Dan

Reply all
Reply to author
Forward
0 new messages