In the CNN-TDNN training portion I found that MFCCs are converted to Mel Filterbank. It says same in the docs as well. The CNN-TDNN diagram also shows input features taking in 200 ivectors and 40 Mel Filter banks. Is there a particular reason as to why mel filter banks are used here where as the previous steps use MFCC?
What I understand is that the reason MFCC is still used is because they are more easily compressible, being de-correlated; we dump them to disk with compression to 1 byte per coefficient. Since we dump all the coefficients, so it’s equivalent to filter-banks times a full-rank matrix without any information loss. Plus it is most familiar thing in the field so it is preferred.
But is there a reason as to why we need to convert it in CNN-TDNN?
Script for Reference:
https://github.com/anish9208/gramvaani_hindi_asr/blob/main/kaldi/asr/Run_cnn-tdnn.sh
--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/4bf699f4-9d78-4f56-a634-3229394fddc6n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/a4b35452-186e-4467-8213-eb924bfb09dan%40googlegroups.com.