Data augmentation strategy

1,803 views
Skip to first unread message

Dayalu

unread,
Jul 22, 2019, 12:50:24 AM7/22/19
to kaldi-help
Hi all,
I'm using about 3kh training data (fisher+librispeech+swbd+ami...) to train an English ASR model, which is going to use for recognizing telephone audios.
I want to ask for the best data augmentation strategy.
1. As I known, there are speed augmentation, volume augmentation, and "reverb babble music noise clean" augment options (in 

swbd/s5c/local/chain/multi_condition/run_tdnn_aug_1a.sh)

 What kind of augmentation to use? 
2. Apply which kind of augmentation method in which kind of corpus? For my training data, there are book-reading/telephone/room conversation data. Maybe noise and reverberation augmentation on relatively clean book-reading data. And speed augmentation on matched telephone data? 
3.If there is any general rule for a good augmentation strategy.
Dayalu

Daniel Povey

unread,
Jul 22, 2019, 1:06:00 AM7/22/19
to kaldi-help
I would use the noise+reverb on all the data.
It helps with out-of-domain test data more than the spee perturbation does, according to our experiments.
Dan


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/7a069f62-716d-4de0-9260-2a08db3ff9fd%40googlegroups.com.

Dayalu

unread,
Jul 22, 2019, 4:23:10 AM7/22/19
to kaldi-help
Thank you!
And for the SpecAugment, I know there is a recipe in  mini_librispeech/s5/local/chain/tuning/run_tdnn_1i.sh . But in a thread it is said improvements on small dataset have been got. But not yet for bigger dataset. So does it make sense for my training to spend to on trying specaugment? 

Daniel Povey

unread,
Jul 22, 2019, 3:51:31 PM7/22/19
to kaldi-help
With that much data, I wouldn't bother with SpecAugment unless you want a huge model.
I believe we have managed to get some improvements from SpecAugment by using larger-than-normal models.
For that amount of data (3kh), the model would have to be quite enormous, I think, for SpecAugment to really matter.
(e.g. 100M parameters).
Dan


On Mon, Jul 22, 2019 at 1:23 AM Dayalu <jiang...@gmail.com> wrote:
Thank you!
And for the SpecAugment, I know there is a recipe in  mini_librispeech/s5/local/chain/tuning/run_tdnn_1i.sh . But in a thread it is said improvements on small dataset have been got. But not yet for bigger dataset. So does it make sense for my training to spend to on trying specaugment? 

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

reza ali

unread,
Aug 10, 2019, 6:03:35 AM8/10/19
to kaldi-help
Hi
1- In the example of data augmentation in swbd, don't use speed perturb. why?

2- In data augmentation (e.g. noise + reverb + speed perturbed) must make the network bigger than speed perturbed or the same network? what do you suggest?

3- I have about 70-hour clean dataset, Is it good to make big data with SpecAugment? which one is better, only SpecAugment or SpecAugment + reverb ... augmentation or...?

best regards

Daniel Povey

unread,
Aug 10, 2019, 10:40:31 AM8/10/19
to kaldi-help
Hi
1- In the example of data augmentation in swbd, don't use speed perturb. why?

I think we found that combining the two methods wasn't really more helpful than just using one or the other.
(Of course, this might be different if there were less data to start with.)

2- In data augmentation (e.g. noise + reverb + speed perturbed) must make the network bigger than speed perturbed or the same network? what do you suggest?

I'd say about the same size. 

3- I have about 70-hour clean dataset, Is it good to make big data with SpecAugment? which one is better, only SpecAugment or SpecAugment + reverb ... augmentation or...?

I'd use the reverb+augmentation for now.  We are still experimenting with SpecAugment.  It works on mini_librispeech, but not larger datasets, and we're not sure why.

Dan

 

best regards

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Ray

unread,
Feb 26, 2020, 2:02:14 AM2/26/20
to kaldi-help
Hi all,

I have few questions about data augmentation on Fisher corpus for telephone speech scenario here:
1. For volume perturbation, why not combine the clean data with the perturbed data together as training set?
    
2. I do see in Dan's this paper that when reverberated data is combined with clean data, it can benefit AMI and eval2000 test sets but not Aspire dev test set.
   So should I combine clean and reverberated data for Fisher? And will it still applied when the data set is Fisher + Librispeech?

3. If the previous two hypotheses correct, looks like the best augmentation pipeline for Fisher would be:
    (3-fold reverberation with simulated RIR + point-source noise -> volume perturbation) + (clean data), is this correct? (not considering run_aug_common.sh for now)

Thanks,
Ray

Dan Povey於 2019年8月10日星期六 UTC-5上午9時40分31秒寫道:

Hi
1- In the example of data augmentation in swbd, don't use speed perturb. why?

I think we found that combining the two methods wasn't really more helpful than just using one or the other.
(Of course, this might be different if there were less data to start with.)

2- In data augmentation (e.g. noise + reverb + speed perturbed) must make the network bigger than speed perturbed or the same network? what do you suggest?

I'd say about the same size. 

3- I have about 70-hour clean dataset, Is it good to make big data with SpecAugment? which one is better, only SpecAugment or SpecAugment + reverb ... augmentation or...?

I'd use the reverb+augmentation for now.  We are still experimenting with SpecAugment.  It works on mini_librispeech, but not larger datasets, and we're not sure why.

Dan

 

best regards

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Feb 26, 2020, 2:05:33 AM2/26/20
to kaldi-help
It all depends on the test data.  If the test data is very dirty, like it was in aspire, it's probably not a good idea to include the clean data.  If the test set contains clean data- which it normally will- you want to include the clean dataset.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/2776ccaa-7205-4baa-829a-828badfb63bc%40googlegroups.com.

jason

unread,
Mar 1, 2020, 10:33:46 AM3/1/20
to kaldi-help
Hi Dan, 

For in-domain test data, if we have 3kh training data, what data augmentation strategies mentioned above should we try?

Thanks.

在 2020年2月26日星期三 UTC+8下午3:05:33,Dan Povey写道:

Daniel Povey

unread,
Mar 1, 2020, 10:01:23 PM3/1/20
to kaldi-help
With 3kh training data and in-domain test data, I don't think augmentation is necessary.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/4ad776cb-c4b1-4ad6-aa83-36ae52f6799d%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages