Hi, I have 2 questions about spec-augment.idct-layer name=idct input=input dim=40 cepstral-lifter=22 affine-transform-file=$dir/configs/idct.matbatchnorm-component name=batchnorm0 input=idctspec-augment-layer name=spec-augment freq-max-proportion=0.5 time-zeroed-proportion=0.2 time-mask-max-frames=20About the idct-layer:I understand that Kaldi performs an idct (Inverse Discrete Cosine Transform) to get the filterbanks out of the MFCCs. But why does one apply spec-augment on the filterbanks rather than on the MFCCs ?
I also do not understand what the cepstral-lifter parameter does. Can somebody explain me please?
About the spec-augment layer:I understand that according to the SpecAugment paper: https://arxiv.org/pdf/1904.08779.pdf time-mask-max-frames is that parameter that applies the transformation to make the network robust to small losses of speech segments but what is freq-max-proportion=0.5(If it was an integer I would assume it would be the number of consecutive mel frequency channels according to the paper, but since this number is a decimal I have no clue).
I also do not understand what is time-zeroed-proportion=0.2?
Thanks,Merry xmas and new year
--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/6896a28e-5464-4314-aa95-f830710130dbn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/bb6cf25d-1367-494a-a73b-71bd0ea03a47n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/3ce8006a-9152-4220-a8c9-2e5cdf8cc977n%40googlegroups.com.