make_mfcc.sh uses the standard settings of the compute-mfcc program. It also automatically reads the "config/mfcc.conf" file if it is present (see if you have that file and what it contains).
compute-mfcc-feats
Create MFCC feature files.
Usage: compute-mfcc-feats [options...] <wav-rspecifier> <feats-wspecifier>
Options:
--blackman-coeff : Constant coefficient for generalized Blackman window. (float, default = 0.42)
--cepstral-lifter : Constant that controls scaling of MFCCs (float, default = 22)
--channel : Channel to extract (-1 -> expect mono, 0 -> left, 1 -> right) (int, default = -1)
--debug-mel : Print out debugging information for mel bin computation (bool, default = false)
--dither : Dithering constant (0.0 means no dither) (float, default = 1)
--energy-floor : Floor on energy (absolute, not relative) in MFCC computation (float, default = 0)
--frame-length : Frame length in milliseconds (float, default = 25)
--frame-shift : Frame shift in milliseconds (float, default = 10)
--high-freq : High cutoff frequency for mel bins (if < 0, offset from Nyquist) (float, default = 0)
--htk-compat : If true, put energy or C0 last and use a factor of sqrt(2) on C0. Warning: not sufficient to get HTK compatible features (need to change other parameters). (bool, default = false)
--low-freq : Low cutoff frequency for mel bins (float, default = 20)
--min-duration : Minimum duration of segments to process (in seconds). (float, default = 0)
--num-ceps : Number of cepstra in MFCC computation (including C0) (int, default = 13)
--num-mel-bins : Number of triangular mel-frequency bins (int, default = 23)
--output-format : Format of the output files [kaldi, htk] (string, default = "kaldi")
--preemphasis-coefficient : Coefficient for use in signal preemphasis (float, default = 0.97)
--raw-energy : If true, compute energy before preemphasis and windowing (bool, default = true)
--remove-dc-offset : Subtract mean from waveform on each frame (bool, default = true)
--round-to-power-of-two : If true, round window size to power of two. (bool, default = true)
--sample-frequency : Waveform data sample frequency (must match the waveform file, if specified there) (float, default = 16000)
--snip-edges : If true, end effects will be handled by outputting only frames that completely fit in the file, and the number of frames depends on the frame-length. If false, the number of frames depends only on the frame-shift, and we reflect the data at the ends. (bool, default = true)
--subtract-mean : Subtract mean of each feature file [CMS]; not recommended to do it this way. (bool, default = false)
--use-energy : Use energy (not C0) in MFCC computation (bool, default = true)
--utt2spk : Utterance to speaker-id map rspecifier (if doing VTLN and you have warps per speaker) (string, default = "")
--vtln-high : High inflection point in piecewise linear VTLN warping function (if negative, offset from high-mel-freq (float, default = -500)
--vtln-low : Low inflection point in piecewise linear VTLN warping function (float, default = 100)
--vtln-map : Map from utterance or speaker-id to vtln warp factor (rspecifier) (string, default = "")
--vtln-warp : Vtln warp factor (only applicable if vtln-map not specified) (float, default = 1)
--window-type : Type of window ("hamming"|"hanning"|"povey"|"rectangular"|"blackmann") (string, default = "povey")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
So it generates a feature vector of length 13 (num-ceps option).
The deltas and acc are usually calculated on-the-fly in different setups, e.g. train_deltas.sh passes the features through the add-deltas program which has these default settings:
add-deltas
Add deltas (typically to raw mfcc or plp features
Usage: add-deltas [options] in-rspecifier out-wspecifier
Options:
--delta-order : Order of delta computation (int, default = 2)
--delta-window : Parameter controlling window for delta computation (actual window size for each delta order is 1 + 2*delta-window-size) (int, default = 2)
--truncate : If nonzero, first truncate features to this dimension. (int, default = 0)
That means that the final vector to that setup has a length of 39 (13 mfcc+delta+acc).
train_lda.mllt on the other hand takes the 13 mfcc features, splice each frames with 4 frames on the left and right (so 9*13 = 117) and uses an LDA transform to generate 40 features (as provided in the config of the script).
DNN setups, like nnet3/run_tdnn.sh usually uses the hires setup (config/mfcc_hires.conf), which has 40 melfilters converted into 40 MFCCs and uses a splicing (9 frames, like above) and combines it with an iVector (usually dim 100) to give some 217 features at input.
I don't know of any setup that uses 29 features.
Some papers on TIMIT (eg Alex Graves' thesis) use 26 features, which is 13 MFCC + delta and no acc.