Weird performance degradation when resampling with sox instead of kaldi

178 views
Skip to first unread message

Guanbo Wang

unread,
Dec 9, 2021, 9:27:23 PM12/9/21
to kaldi-help
Hi all,

I trained a 16kHz ASR model, and also want to test performance on some commonly used test sets, like eval2000 and rt03, of which the sampling rate is 8k.
So here is my `conf/mfcc_hires.conf`:

```
--use-energy=false
--num-mel-bins=40  
--num-ceps=40  
--low-freq=20    
--high-freq=-400
--allow-upsample=true
```

and a line in `wav.scp`:

```
en_4156-A sph2pipe -f wav -p -c 1 /path/to/LDC/LDC2002S09/hub5e_00/english/en_4156.sph |
```

In this way, resampling is implemented by kaldi while extracting mfcc. WER is 28.17.


But kaldi's resampling is not straightforward to use elsewhere. So I modified the `wav.scp` to:

```
en_4156-A sph2pipe -f wav -p -c 1 /path/to/LDC/LDC2002S09/hub5e_00/english/en_4156.sph | sox - -t wav -r 16000 - |
```

or explicitly

```
en_4156-A sph2pipe -f wav -p -c 1 /path/to/LDC/LDC2002S09/hub5e_00/english/en_4156.sph | sox -t wav -e signed-integer -r 8k -b 16 -c 1  - -t wav -e signed-integer -r 16k -b 16 -c 1 -G - | 
```

In this way, resampling is implemented by sox, and kaldi would do extracting directly. I believe there is no difference between these 2 ways.
However, this time the WER is 35.13, much worse than kaldi resampling.

These results are from eval2000, and it's quite simple to reproduce with a 16k model, I believe.

Any insight? Much appreciate it if anyone gives some help.
Thanks in advance.

Best,
Guanbo

Daniel Povey

unread,
Dec 9, 2021, 11:31:09 PM12/9/21
to kaldi-help
I'm not sure, perhaps it might have something to do with signal energy levels?  Perhaps sox is renormalizing somehow and it has a different effect than what Kaldi did?


--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/8043f0ac-7ec0-450e-8531-34df0b4a2850n%40googlegroups.com.

Jan Yenda Trmal

unread,
Dec 10, 2021, 9:45:33 AM12/10/21
to kaldi-help
Some of the resampler filters can affect phase of the signal (or add higher frequencies weirdness).
Sox should have filter "rate" that could give better results, but I dunno if it will really help

Vijayaditya Peddinti

unread,
Dec 10, 2021, 10:56:27 AM12/10/21
to kaldi...@googlegroups.com
 I have seen similar behavior when energy levels mismatch significantly between training and test data. The energy levels can “vary” even if the modules are not doing consistent format conversion. Given that WAV format flags are being modified I would recommend extracting the audio and examining it to confirm everything is as expected numerically.

I spent a few days on similar issues when processing one of the aspire test sets in kaldi.


Vijay

Reply all
Reply to author
Forward
0 new messages