SpecAugment

3,363 views
Skip to first unread message

Wonkyum Lee

unread,
Apr 22, 2019, 6:54:45 PM4/22/19
to kaldi-help
https://arxiv.org/pdf/1904.08779.pdf

Probably, it's worth replicating this augmentation with kaldi chain model. Has any one tried this approach before? 

Daniel Povey

unread,
Apr 22, 2019, 6:58:28 PM4/22/19
to kaldi-help
Yes, we are aware of this and hope to do some work on it within the next couple of weeks.


On Mon, Apr 22, 2019 at 6:54 PM Wonkyum Lee <won...@gridspace.com> wrote:
https://arxiv.org/pdf/1904.08779.pdf

Probably, it's worth replicating this augmentation with kaldi chain model. Has any one tried this approach before? 

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/15e81bf7-9148-441b-afec-52b435b8f7d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jianwei zhou

unread,
May 13, 2019, 3:09:55 AM5/13/19
to kaldi-help
Any update?

在 2019年4月23日星期二 UTC+8上午6:58:28,Dan Povey写道:
Yes, we are aware of this and hope to do some work on it within the next couple of weeks.


On Mon, Apr 22, 2019 at 6:54 PM Wonkyum Lee <won...@gridspace.com> wrote:
https://arxiv.org/pdf/1904.08779.pdf

Probably, it's worth replicating this augmentation with kaldi chain model. Has any one tried this approach before? 

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

shaheen kader

unread,
May 13, 2019, 3:53:49 AM5/13/19
to kaldi-help
@Prof Dan, Eagerly looking forward to this. 

Daniel Povey

unread,
May 13, 2019, 1:52:03 PM5/13/19
to kaldi-help
I believe Phani Nidadavolu might have some preliminary results on this.  He saw some improvements but they were
very small--nothing like what the Google people saw.  But he's looking into it more I think.

Dan



To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Jeff Brower

unread,
May 13, 2019, 2:52:59 PM5/13/19
to kaldi-help
Dan-

The paper does mention "handcrafted" so warping, masking, etc might need a methodology of its own.  I wonder if they ran through a series of augmentation combinations automatically, over time, to find which ones work best.

-Jeff

Daniel Povey

unread,
May 13, 2019, 3:24:31 PM5/13/19
to kaldi-help
Yeah, they must have done some kind of tuning.
With regard to the time-warping part of what they did:
I don't believe this should be fundamentally different from the speed augmentation that we routinely do.
Our initial paper did show comparisons with the simple speed modification, versus a reconstruction that
makes it faster but spectrally the same, and the simple one worked better.   The reconstruction 
version (in our old experiments) should be quite similar to their time-warping, I would think.

Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Rudolf A. Braun

unread,
May 17, 2019, 2:04:43 PM5/17/19
to kaldi-help
Apparently time warping is not important: https://github.com/Kyubyong/specAugment

I got an okay improvement with the very first implementation I tried. I just do the augmentation once before training though. I think it would be better to do the augmentation on-the-fly during training, so it never sees the same augmentation again, but I'm not sure how one would do that.

Daniel Povey

unread,
May 17, 2019, 2:26:56 PM5/17/19
to kaldi-help
You're right, it would be better to do it on the fly.
It just occurred to me possibly the easiest way would be to do it as an nnet layer.  Would be a "non-simple" one so might be easier if I do it myself.
Of course you'd have to transform from MFCCs back to log-mels first, but that's something we already do in some recipes (e.g. CNN).

Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
Message has been deleted

fei

unread,
Jun 18, 2019, 5:19:04 AM6/18/19
to kaldi-help
Hi ,Dan
    Is there the implement  related to the SpecAugment in kaldi ?

Thank you 

在 2019年5月18日星期六 UTC+8上午2:26:56,Dan Povey写道:

reza ali

unread,
Jun 18, 2019, 8:56:59 AM6/18/19
to kaldi-help
Hi
have some experiment to use data augmentation for asr.
I don't know to follow that paper or not?  

Daniel Povey

unread,
Jun 18, 2019, 10:30:52 AM6/18/19
to kaldi-help
We (well, mostly Phani Sankar and Ashish Arora)  have been doing tons of experiments trying to reproduce SpecAugment but we have not got any success.  I have heard of experiments at IDIAP and at Facebook that have been done with similar results.  People have had improvements versus a non-augmented baseline, but not against standard augmentation methods.  We are still trying various things, but at this pont it doesn't look promising.

Dan


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Jun 19, 2019, 1:10:38 PM6/19/19
to kaldi-help
After hearing more information, I think I may have mis-stated the results.  There are some results which are somewhat encouraging after all, I think, from at least one of these places.  We're continuing to work on it.
I don't think the results will be anything like as dramatic as what the Google people saw, but it looks like there may be some improvement over our standard augmentation setup.

Dan

Rémi Francis

unread,
Jun 20, 2019, 5:24:42 AM6/20/19
to kaldi-help
I talked to the first author of SpecAugment, he said that they used 900 epochs for LibriSpeech for their best results, but start noticing improvements over the baseline after 200 epochs.


On Tuesday, 18 June 2019 15:30:52 UTC+1, Dan Povey wrote:
We (well, mostly Phani Sankar and Ashish Arora)  have been doing tons of experiments trying to reproduce SpecAugment but we have not got any success.  I have heard of experiments at IDIAP and at Facebook that have been done with similar results.  People have had improvements versus a non-augmented baseline, but not against standard augmentation methods.  We are still trying various things, but at this pont it doesn't look promising.

Dan


On Tue, Jun 18, 2019 at 8:57 AM reza ali <aliiire...@gmail.com> wrote:
Hi
have some experiment to use data augmentation for asr.
I don't know to follow that paper or not?  

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Armando

unread,
Jun 20, 2019, 6:11:00 AM6/20/19
to kaldi-help
900 epochs? on a 1000 h corpus?

Rémi Francis

unread,
Jun 20, 2019, 6:23:24 AM6/20/19
to kaldi-help
The paper states 32 TPUs for 7 days, that's 4 epochs per TPU per day, it seems in the right range.

Daniel Povey

unread,
Jun 20, 2019, 12:26:55 PM6/20/19
to kaldi-help
OK.  Here at Hopkins we initially weren't getting any improvement at all from SpecAugment-style frequency masking, but did start to see some improvement after increasing  the num-epochs.  So there might be some hope, but we have to confirm.

Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Shujian Liu

unread,
Jun 27, 2019, 7:12:11 PM6/27/19
to kaldi-help
"The paper states 32 TPUs for 7 days"
That is 43,000 dollars for each run!

Jeff Brower

unread,
Jun 28, 2019, 5:58:23 PM6/28/19
to kaldi-help
Shujian-

Not if you're running on TPUs at a data center on the Columbia River.

-Jeff

Daniel Povey

unread,
Jul 8, 2019, 12:51:17 AM7/8/19
to kaldi-help

Update:

we have managed to get some improvements out of SpecAugment, at least on a small dataset (which
is where we'd expect the most improvement).  See here:
there is about 10% relative improvement on mini-Librispeech, using both CNN+TDNN-F and TDNN-F-only topologies.

(Note: on this setup, CNN+TDNN-F is about 10% or more (relative) better than TDNN-F only, but this
is atypical; on larger data, they are much closer, and I'd normally use TDNN-F only, for speed)

Dan



To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
Message has been deleted

Daniel Povey

unread,
Jul 8, 2019, 11:10:36 AM7/8/19
to kaldi-help
Not sure what you're doing there, but all my changes are checked into that branch, I just confirmed, and it is working.
You may have mixed versions.

On Mon, Jul 8, 2019 at 6:03 AM Armando <armando.m...@gmail.com> wrote:
do we have to add
spec-augment-layer
to config_to_layer in parser.py?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

Armando

unread,
Jul 8, 2019, 12:18:54 PM7/8/19
to kaldi-help
yes, I thought i had deleted my previous message, it's ok

Armando

unread,
Jul 9, 2019, 2:29:27 AM7/9/19
to kaldi-help
I see that 1i and 1h (with and without SpecAugment) have the same number of epochs. Didn't you notice imorovements by increasing the number of epochs?


On Monday, July 8, 2019 at 6:51:17 AM UTC+2, Dan Povey wrote:

Daniel Povey

unread,
Jul 9, 2019, 1:06:33 PM7/9/19
to kaldi-help
I didn't try increasing the number of epochs.  Go ahead and try (also increasing the number of parameters), if it gives further improvements we can update the example.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Rémi Francis

unread,
Jul 10, 2019, 7:37:57 AM7/10/19
to kaldi-help
Have you got results on a bigger dataset?

Daniel Povey

unread,
Jul 10, 2019, 10:52:59 AM7/10/19
to kaldi-help
Not yet, but we're working on it.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Armando

unread,
Jul 11, 2019, 9:13:36 AM7/11/19
to kaldi-help
I'm training on 600h data set, so I'm using the swbd set up with specaugment, but so far, it's degrading performance with respect to the same iteration of the specaugment-less model (about 15% relative worse), but I've done not even 2 epochs, maybe I'll see something better as training goes on

Rudolf A. Braun

unread,
Jul 11, 2019, 10:41:15 AM7/11/19
to kaldi-help
Is that without dropout (as in did you remember to turn it off)?

Daniel Povey

unread,
Jul 11, 2019, 11:46:16 AM7/11/19
to kaldi-help
Turning off dropout may not be important; it didn't affect WER in mini-librispeech and I removed it to simplify the setup.
Ashish was also doing experiments on Switchboard with this SpecAugment setup, and also didn't see improvements.
Bear in mind that the Kaldi recipes use unusually small models and, of course, already use augmentation, so overfitting may be less of an issue than for more typical setups.
Ashish is trying larger models and more epochs to see if it starts helping then.

Dan


Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

orum farhang

unread,
Sep 8, 2019, 3:38:26 AM9/8/19
to kaldi-help
Hi,

Does anyone have a good news on Spec Augment using Kaldi receipes? Do you guys have any setup suggestion (model size, parameters, model type etc.) to start with it and try this?

Many Thanks

ashish arora

unread,
Sep 8, 2019, 11:47:01 PM9/8/19
to kaldi...@googlegroups.com

I did experiments with switchboard as training set and eval2000 as test set. The aim of these experiments was to try to make specAugmentation frequency part work for switchboard setup. The setup was similar to specAugment implementation (using utterance level mean for masking).  These experiments were done without i-vectors and used clean data for tree building. I tried 80-dimensional features, more epochs, 15 and 27 coefficients for masking. It helped in improving results from 14.1% WER to 13.4% WER. Its results were as follows:

Modification

eval2000

Speed perturbation (40 dim)

14.1

Frequency masking (40 dim) (13 coeff) (550 iter)

13.7

Frequency masking (80 dim) (27 coeff) (550 iter)

13.6

Frequency masking and speed perturbation (80 dim) (27 coeff) (550 iter)

13.4


- After there was 10% relative improvement for the minilibrispeech with specAugmentation . I started doing experiments with the new online implementation of specAugment with the default parameters for switchboard and AMI datasets. In addition to using 80 dimensional features and more epochs as above, I did experiments with larger model. I tried both recurrent networks and wider TDNN networks. TDNN model performed better than LSTM model. For Swbd the results improved from 12.9% WER to 12.6% WER and for AMI the WER improved from 35.1% WER to 34.9% WER. I am not able to get improvement on significant switchboard and AMI yet. 

But, during the experiments for AMI setup, it helped in finding a better tuned tdnn model and improved the results from 40.5% WER on test to 35.1% WER. Also, 80 dimensional features worked better as compared to 40 dimensional features for AMI dataset. The WER with 80 dimensional feature is 35.1% WER on test and WER with 40 dimensional feature is 35.7% WER. Its results are as follows:

EXP

WER dev

WER eval

Remark

0

36.6

40.5

Default model 8M parameters (40 dimension)

1

32.5

36

Baseline (34M parameters) (80 dimension)

2

 -

35.7

Baseline + dropout (40 dimension)

3

31.7

35.1

Baseline + dropout (80 dimension)

4

31.3

34.9

Baseline + spec augment (80 dimension)


Thanks,
Ashish


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Itai Peer

unread,
Sep 9, 2019, 3:21:53 AM9/9/19
to kaldi-help
thanks Ashish for the answer and the results

I looked in RESULTS for kadli-swbd recipes and saw better resutls with several recipes , like with re-scoring , and i-vec  

assuming of-course i did not mix with training-testing sets   , there are several results with fisher mixed over-there , i ignored that.. 
for example   

do you think that spec-augmentation can improve further even if the most heavy guns in the arsenal are used as well ? 
I'm not too familiars with SWB , so i'm not sure if i-vec for example can benefit too much here ,since it is quite "clean" corpora , all telephones , , 2 speakers ,  ( i dont remember if it was streo or mono recording , so diarization was issue there? )  


בתאריך יום שני, 9 בספטמבר 2019 בשעה 06:47:01 UTC+3, מאת ashish arora:
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

ashish arora

unread,
Sep 9, 2019, 2:04:13 PM9/9/19
to kaldi...@googlegroups.com
The results shared for the switchboard experiments were without i-vectors. The results are based on switchboard run_tdnn_7q recipe but without i-vectors. On the baseline, i-vectors help by around 10%. I haven't done the experiment with including ivectors for switchboard but AMI results include i-vectors also. Yeah, it is quite "clean" stereo recoding but I guess diarization is not major issue here as speakers are mostly taking turns while speaking (little overlapping regions).

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/5da931df-ca17-4c11-9d03-f24b89220ede%40googlegroups.com.

Daniel Povey

unread,
Sep 9, 2019, 8:18:26 PM9/9/19
to kaldi-help
Ashish: maybe part of the issue was that my implementation masks with zeros, not with the utterance level mean.  It's after batchnorm, so it's like using he global mean.  But perhaps if it were combined with online CMVN, like what we recently merged from Karel, it would be better than the current best baseline?  (Because then the utterance-level mean would be essentially zero.) . Perhaps you could try that?  (I don't recall whether our current Swbd example uses online-CMVN.) . If not I hope you'll find out what differences between my nnet3 implementation and their paper might be responsible for the difference.

ashish arora

unread,
Sep 9, 2019, 8:29:39 PM9/9/19
to kaldi...@googlegroups.com
Ok, thanks. I will do the experiments combining with online CMVN and will compare the implementation with the paper.

reza ali

unread,
Oct 30, 2019, 3:41:53 AM10/30/19
to kaldi-help
This is an implement of SpecAugment using PyTorch, Is it useful?
----
Do the results improve?

mili lali

unread,
May 21, 2020, 2:04:54 PM5/21/20
to kaldi-help
New paper "SPECAUGMENT ON LARGE SCALE DATASETS"
Reply all
Reply to author
Forward
0 new messages