Weird performance degradation when resampling with sox instead of kaldi

Guanbo Wang

unread,

Dec 9, 2021, 9:27:23 PM12/9/21

to kaldi-help

Hi all,

I trained a 16kHz ASR model, and also want to test performance on some commonly used test sets, like eval2000 and rt03, of which the sampling rate is 8k.

So here is my `conf/mfcc_hires.conf`:

```

--use-energy=false 
--num-mel-bins=40   
--num-ceps=40  
--low-freq=20    
--high-freq=-400 
--allow-upsample=true

```

and a line in `wav.scp`:

```

en_4156-A sph2pipe -f wav -p -c 1 /path/to/LDC/LDC2002S09/hub5e_00/english/en_4156.sph |

```

In this way, resampling is implemented by kaldi while extracting mfcc. WER is 28.17.

But kaldi's resampling is not straightforward to use elsewhere. So I modified the `wav.scp` to:

```

en_4156-A sph2pipe -f wav -p -c 1 /path/to/LDC/LDC2002S09/hub5e_00/english/en_4156.sph | sox - -t wav -r 16000 - |

```

or explicitly

```

en_4156-A sph2pipe -f wav -p -c 1 /path/to/LDC/LDC2002S09/hub5e_00/english/en_4156.sph | sox -t wav -e signed-integer -r 8k -b 16 -c 1 - -t wav -e signed-integer -r 16k -b 16 -c 1 -G - |

```

In this way, resampling is implemented by sox, and kaldi would do extracting directly. I believe there is no difference between these 2 ways.

However, this time the WER is 35.13, much worse than kaldi resampling.

These results are from eval2000, and it's quite simple to reproduce with a 16k model, I believe.

Any insight? Much appreciate it if anyone gives some help.

Thanks in advance.

Best,

Guanbo

Daniel Povey

unread,

Dec 9, 2021, 11:31:09 PM12/9/21

to kaldi-help

I'm not sure, perhaps it might have something to do with signal energy levels? Perhaps sox is renormalizing somehow and it has a different effect than what Kaldi did?

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/8043f0ac-7ec0-450e-8531-34df0b4a2850n%40googlegroups.com.

Jan Yenda Trmal

unread,

Dec 10, 2021, 9:45:33 AM12/10/21

to kaldi-help

Some of the resampler filters can affect phase of the signal (or add higher frequencies weirdness).

Sox should have filter "rate" that could give better results, but I dunno if it will really help

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyT7XZ%3Dcb7jmUftJm3EDAhvN-otE%3DBiQYYbMPs6Zry8bHA%40mail.gmail.com.

Vijayaditya Peddinti

unread,

Dec 10, 2021, 10:56:27 AM12/10/21

to kaldi...@googlegroups.com

I have seen similar behavior when energy levels mismatch significantly between training and test data. The energy levels can “vary” even if the modules are not doing consistent format conversion. Given that WAV format flags are being modified I would recommend extracting the audio and examining it to confirm everything is as expected numerically.

I spent a few days on similar issues when processing one of the aspire test sets in kaldi.

Vijay

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAFReZQYwxzQ9Rv%3DNAPq9ot3VzBQHX0q4%3DNrYHKS3KkqrP8x2rw%40mail.gmail.com.

Reply all

Reply to author

Forward