Hi all,
I am performing a standard nnet2 system training using a particulary noisy database. Using nnet1, the final system (with SMBR) get around 34 of WER, while the nnet2 final system gets 40. I realised that previous to SMBR both system were around 37, so in nnet2, SMBR degraded the results.
The reason is that in test samples decoded with nnet2 after SMBR, some (less than 10% of the test) contain long silences and only a few words were decoded, so a lot of insertions are made. In those same samples, previous to SMBR the expected number of words were correctly decoded. For instance a sample containing 20 words, will get 20 words recognised previous to SMBR, and 4 after SMBR is performed.
In an old post (
http://sourceforge.net/p/kaldi/discussion/1355348/thread/e25f4d44/?limit=25), someone had a similar problem and it seems to be related with the iVector estimation, when there is a big difference between the amount of silence in train and test. I tried the proposed solution (change --maxcounts to less than 100), now the long silences are gone but the results on the samples that this did not occur are worse.
Do any of you have any experience with any similar problem? Could also this be happening in the lattice decoding of the training during the SMBR?
Thanks in advance,
N