Noisy data issue

52 views

Skip to first unread message

pavitra kulkarni

unread,

Sep 18, 2024, 1:00:28 AM9/18/24

to kaldi...@googlegroups.com

Dear All,We have trained a Kaldi TDNN model with Hindi + English mix dataset of
around 4700 hours – the train set more of less clean. However, our test
scenario cases are short, single sentences and sometimes with background
noise or noisy audios (IVR usecases), and sometimes audio with
unintelligbile speech due to noise, channel issues, fast speech, etc.We have included few noisy audios (only around 25000 files) also for
training, we have used tag(silence) for noisy data and mapped it to sil.
However, in our dataset we dont have any tags – just plain sentences.We are using vosk server for ASR deployment.Few audios are attached for the below issues.
Issues we are facing are as follows:1) In case of background noise/speech (which is difficult to understand),
ASR detects meaningful words. We want the ASR to return empty
string/sil.
2) Sometimes audio is very much unintelligbile – we want ASR not to recognize anything (is it possible?)
3) In some cases, even though the audios are heard as “can’t” clearly, the
ASR decodes “cant” as “can” – just an example – same for “yes/no” too.These kind of scenarios where it decodes the total opposite words are bit concerning for us.Can you suggest few things we can do, apart from training the model again (maybe with more tags related to noise)?Can we somehow use the confidence score to decide if we consider the decoded output?Any other idea?It took more than a week in NVIDIA A6000 48GB GPU. Any suggestions on
pre-processing / post-processing/ LM changes would be of great help for
us.Thanks in advance.

--

Regards

pavitra kulkarni

test_2.wav

test_1.wav

test_3.wav

1060147127

unread,

Sep 18, 2024, 1:00:43 AM9/18/24

to pavitra kulkarni

这是来自QQ邮箱的假期自动回复邮件。

您好，我最近正在休假中，无法亲自回复您的邮件。我将在假期结束后，尽快给您回复。

Reply all

Reply to author

Forward

0 new messages