Telephone line filter (or impulse response)

Jan Trmal

unread,

Dec 1, 2015, 7:41:42 PM12/1/15

to kaldi-help

Hi,

does anyone have any experience with doing something smarter than just plain downsampling (or band-limiting to 80-3700Hz?) when processing hi-res audio using CTS models?

Or, in case the training set is not homogeneous -- in Babel, there are both 48kHz audio (I assume desk microphone) and 8k telephone speech. I've heard some people mentioning they used telephone line filter to preprocess the audio. I've heard this on several occasions in different contexts and never occurred to me to ask for specifics and apparently. I wasn't able to google anything specific.

Thanks,

y.

Jan Trmal

unread,

Dec 1, 2015, 8:06:08 PM12/1/15

to kaldi-help

Apologies for not reading it before clicking the send button.

What I'm asking is, is it indeed important to do this for training, especially when we can assume that the audio-channel is fixed for the given speaker? Or will the speaker normalization techniques (such as fMLLR) or DNNs in general be able to cancel this out?

And if it indeed is still important, what would be the reasonable-effort solution?

y.

Daniel Povey

unread,

Dec 2, 2015, 6:31:52 PM12/2/15

to kaldi-help

Talk to Vijay about this, he is looking into something like this for AsPIRE-- apparently a Singapore team got about 1% abs improvement from implementing some solution for this problem.

Dan

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward