I'm planning to augment my training data with various noise types, like synthetic white noise and recorded background noise. a few questions:- what's the best way to achieve this? are there options to augment on the fly during nnet training or should all the augmentations be stored first and included in wav.scp, segments, .. as usual?
- in general, what's the established/expected gain on clean speech when training with noisy data?
--Thanks for the feedback.
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/863647bb-1f52-4de7-9420-fb57e7f2db1f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
but insteadcreates a wav.scp that creates them on the fly. David, is that right?
I notice the scripts aren't very detailed about the inputs, particularly (for augment_data_dir.py), bg_noise_dir and fg_noise_dir aren't explained.. I'm not sure if those are documented anywhere?
adding to Vimal's email since I had already started..We have scripts reverberate_data_dir.py and augment_data_dir.py which can do these kinds of things.You might want to look for examples of these.From what I can tell from glancing at the script, it probably doesn't actually create new wav files, but insteadcreates a wav.scp that creates them on the fly. David, is that right?It might be nice if we had an script that could dump a data-dir that into actual wav files, for cases where you'll be accessing that multilple times (or for when the isotropic-noise files are long).I notice the scripts aren't very detailed about the inputs, particularly (for augment_data_dir.py), bg_noise_dir and fg_noise_dir aren't explained.. I'm not sure if those are documented anywhere?In general, as Vimal says, we'd count it as a win if it didn't degrade on clean speech.
On Tue, Feb 13, 2018 at 3:34 PM, Armin Oliya <armin...@gmail.com> wrote:
I'm planning to augment my training data with various noise types, like synthetic white noise and recorded background noise. a few questions:- what's the best way to achieve this? are there options to augment on the fly during nnet training or should all the augmentations be stored first and included in wav.scp, segments, .. as usual?- in general, what's the established/expected gain on clean speech when training with noisy data?Thanks for the feedback.
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/4fc8e0d4-a1de-413c-9435-3618721654ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.