Data augmentation with background noise

1,997 views
Skip to first unread message

Armin Oliya

unread,
Feb 13, 2018, 3:34:47 PM2/13/18
to kaldi-help

I'm planning to augment my training data with various noise types, like synthetic white noise and recorded background noise. a few questions:

- what's the best way to achieve this? are there options to augment on the fly during nnet training or should all the augmentations be stored first and included in wav.scp, segments, .. as usual?
- in general, what's the established/expected gain on clean speech when training with noisy data?


Thanks for the feedback.

Vimal Manohar

unread,
Feb 13, 2018, 5:20:14 PM2/13/18
to kaldi...@googlegroups.com
On Tue, Feb 13, 2018 at 3:34 PM Armin Oliya <armin...@gmail.com> wrote:

I'm planning to augment my training data with various noise types, like synthetic white noise and recorded background noise. a few questions:

- what's the best way to achieve this? are there options to augment on the fly during nnet training or should all the augmentations be stored first and included in wav.scp, segments, .. as usual?
- in general, what's the established/expected gain on clean speech when training with noisy data?
It might not help if the test data is "really clean". The performance might even degrade.

Vimal
 


Thanks for the feedback.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/863647bb-1f52-4de7-9420-fb57e7f2db1f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Vimal Manohar
PhD Student
Electrical & Computer Engineering
Johns Hopkins University

Daniel Povey

unread,
Feb 13, 2018, 5:21:04 PM2/13/18
to kaldi-help, David Snyder
adding to Vimal's email since I had already started..

We have scripts reverberate_data_dir.py and augment_data_dir.py which can do these kinds of things.
You might want to look for examples of these.
From what I can tell from glancing at the script, it probably doesn't actually create new wav files, but instead
creates a wav.scp that creates them on the fly.  David, is that right?
It might be nice if we had an script that could dump a data-dir that into actual wav files, for cases where you'll be accessing that multilple times (or for when the isotropic-noise files are long).

I notice the scripts aren't very detailed about the inputs, particularly (for augment_data_dir.py), bg_noise_dir and fg_noise_dir aren't explained.. I'm not sure if those are documented anywhere?

In general, as Vimal says, we'd count it as a win if it didn't degrade on clean speech.



--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

David Snyder

unread,
Feb 13, 2018, 5:37:13 PM2/13/18
to kaldi-help
but instead
creates a wav.scp that creates them on the fly.  David, is that right?

Yes, that's correct.

 I notice the scripts aren't very detailed about the inputs, particularly (for augment_data_dir.py), bg_noise_dir and fg_noise_dir aren't explained.. I'm not sure if those are documented anywhere?

Some detail was provided in the usage message (although maybe it's not clear enough):

"Noises are separated into background and foreground noises which are added together or "
"separately. Background noises are added to the entire recording, and repeated as necessary "
"to cover the full length. Multiple overlapping background noises can be added, to simulate "
"babble, for example. Foreground noises are added sequentially, according to a specified "
"interval."

There's not necessarily any difference between the background and foreground noises, but the way they are handled by the script is different (as described above). The bg-noise-dir and fg-noise-dir are just data directories containing wav.scp files. If you want to see an example of this script's usage, take a look at something like https://github.com/kaldi-asr/kaldi/blob/master/egs/sre16/v2/run.sh#L146


On Tuesday, February 13, 2018 at 5:21:04 PM UTC-5, Dan Povey wrote:
adding to Vimal's email since I had already started..

We have scripts reverberate_data_dir.py and augment_data_dir.py which can do these kinds of things.
You might want to look for examples of these.
From what I can tell from glancing at the script, it probably doesn't actually create new wav files, but instead
creates a wav.scp that creates them on the fly.  David, is that right?
It might be nice if we had an script that could dump a data-dir that into actual wav files, for cases where you'll be accessing that multilple times (or for when the isotropic-noise files are long).

I notice the scripts aren't very detailed about the inputs, particularly (for augment_data_dir.py), bg_noise_dir and fg_noise_dir aren't explained.. I'm not sure if those are documented anywhere?

In general, as Vimal says, we'd count it as a win if it didn't degrade on clean speech.


On Tue, Feb 13, 2018 at 3:34 PM, Armin Oliya <armin...@gmail.com> wrote:

I'm planning to augment my training data with various noise types, like synthetic white noise and recorded background noise. a few questions:

- what's the best way to achieve this? are there options to augment on the fly during nnet training or should all the augmentations be stored first and included in wav.scp, segments, .. as usual?
- in general, what's the established/expected gain on clean speech when training with noisy data?


Thanks for the feedback.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Armin Oliya

unread,
Feb 22, 2018, 4:52:30 PM2/22/18
to kaldi-help
Thank you all :) 

xiaofeng wu

unread,
Mar 11, 2019, 10:14:41 AM3/11/19
to kaldi-help
hi, 

A very nice question!

A follow up question.

From the code I can see that kaldi do the noise augmentation once and will not change that during the whole training (please correct me if I am wrong)... my concern is that: in this way, isn't the networks will easily associate a specific noise with a specific utterance?
Maybe do the noise augmentation on the fly during each epoch will be more appropriate? After all, the noise augmentation speed will be much faster than the training speed, with a proper arrange, there is almost no training speed loss....

Thanks!

Vimal Manohar

unread,
Mar 11, 2019, 11:17:56 AM3/11/19
to kaldi-help
We do data augmentation multiple times for each utterance (3x in https://github.com/kaldi-asr/kaldi/blob/master/egs/aspire/s5/local/nnet3/run_ivector_common.sh). You can increase even more and decrease the number of epochs by the same factor, so effectively it is like doing "augmentation on the fly" during each old-epoch.

Vimal


For more options, visit https://groups.google.com/d/optout.


--
Vimal Manohar
PhD Student
Center for Language and Speech Processing
Johns Hopkins University
Baltimore, MD

xiaofeng wu

unread,
Mar 11, 2019, 12:04:19 PM3/11/19
to kaldi-help
got it, thanks a million!
Reply all
Reply to author
Forward
0 new messages