Noisy data issue

49 views
Skip to first unread message

pavitra kulkarni

unread,
Sep 18, 2024, 1:00:28 AM9/18/24
to kaldi...@googlegroups.com
Dear All,We have trained a Kaldi TDNN model with Hindi + English mix dataset of
around 4700 hours – the train set more of less clean. However, our test
scenario cases are short, single sentences and sometimes with background
 noise or noisy audios (IVR usecases), and sometimes audio with
unintelligbile speech due to noise, channel issues, fast speech, etc.We have included few noisy audios (only around 25000 files) also for
training, we have used tag(silence) for noisy data and mapped it to sil.
 However, in our dataset we dont have any tags – just plain sentences.We are using vosk server for ASR deployment.Few audios are attached for the below issues.
Issues we are facing are as follows:1) In case of background noise/speech (which is difficult to understand),
ASR detects meaningful words. We want the ASR to return empty
string/sil.
2) Sometimes audio is very much unintelligbile – we want ASR not to recognize anything (is it possible?)
3) In some cases, even though the audios are heard as “can’t” clearly, the
 ASR decodes “cant” as “can” – just an example – same for “yes/no” too.These kind of scenarios where it decodes the total opposite words are bit concerning for us.Can you suggest few things we can do, apart from training the model again (maybe with more tags related to noise)?Can we somehow use the confidence score to decide if we consider the decoded output?Any other idea?It took more than a week in NVIDIA A6000 48GB GPU. Any suggestions on
pre-processing / post-processing/ LM changes would be of great help for
us.Thanks in advance.

--
Regards
pavitra kulkarni
test_2.wav
test_1.wav
test_3.wav

1060147127

unread,
Sep 18, 2024, 1:00:43 AM9/18/24
to pavitra kulkarni
这是来自QQ邮箱的假期自动回复邮件。
 
您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
Reply all
Reply to author
Forward
0 new messages