make_mfcc.sh and mfcc code produce different results

258 views
Skip to first unread message

Patrick Lange

unread,
Jun 8, 2016, 5:44:20 PM6/8/16
to kaldi-help
Hi everybody,

I am trying to write a standalone executable that creates i-vectors given a wav file. This is working but I noticed that the mfcc features I compute with the code snipped posted below creates a different result than the make_mfcc.sh script which the config also posted below.
I converted the make_mfcc.sh features to ascii using 'featbin/copy-feats'. 

Both computations were performed on the same machine. I am using a Mac and kaldi version: commit e69198c3dc5633f98eb88e1cdf20b2521a598f21 Date:   Tue May 24 18:49:48 2016 -0400

Any form of help is appreciated.

Mfcc code snipped

    /* Read in wav file from disk */
    std::ifstream is(argv[1]);
    WaveData wave;    
    wave.Read(is);
    
    /* Pretend data is single channel - even if it is not */ 
    SubVector<BaseFloat> waveform(wave.Data(), 0);

    /* Set frame rate and other windowing options */
    FrameExtractionOptions frameOptions;
    frameOptions.samp_freq = 16000.0f;
    frameOptions.frame_length_ms = 20.0f;
   
    /* Setting lower and upper frequency bounds for mel filter banks */
    MelBanksOptions melOptions;
    melOptions.low_freq = 20.0f;
    melOptions.high_freq = 3700.0f;

    /* Use above options and set feature size to 20 */
    MfccOptions mfccOptions;
    mfccOptions.frame_opts = frameOptions;
    mfccOptions.mel_opts = melOptions;
    mfccOptions.num_ceps = 20;

    /* Compute raw mfcc features from mono channel wav data */ 
    Mfcc mfcc (mfccOptions);
    Matrix<BaseFloat> rawFeats;
    mfcc.Compute(waveform, 
                 1,               // no vtln
                 &rawFeats,
                 NULL);
    cout << rawFeats;


mfcc.conf
--sample-frequency=16000 
--frame-length=20
--low-freq=20
--high-freq=3700
--num-ceps=20


Results from code:
  13.9194 -6.20482 5.36231 0.432148 8.21561 -6.46539 16.7616 4.21117 14.4411 21.9639 26.3801 6.92541 -5.15593 -1.33545 3.00117 -0.716762 4.21981 -3.06601 -2.80127 -0.302359 
  15.6087 -11.5753 12.3931 1.5197 -3.94153 5.30744 15.4031 -9.58937 9.16496 12.6909 15.1715 13.0469 -5.50145 -10.9331 -3.20312 -4.64419 5.79768 3.57596 -2.49785 1.71942 
...

Results from script:
  13.95804 -5.767715 4.978804 0.3954878 7.523161 -6.437352 15.82921 2.985139 14.08624 20.9063 24.92505 7.740386 -4.701997 -0.2935853 2.572047 -0.8026302 4.727565 -4.034141 -3.06127 -2.037492 
  15.59722 -11.24418 12.0876 1.127604 -4.225174 5.707622 14.96908 -8.872856 8.297761 11.07793 12.72406 13.63766 -6.131134 -7.948906 -3.914194 -5.713324 4.174091 2.842557 -0.8066573 2.018597 





Daniel Povey

unread,
Jun 8, 2016, 5:52:29 PM6/8/16
to kaldi-help
The MfccOptions class initializes the MelBanksOptions member variable
with 23 as the number of mel-banks; your code doesn't reproduce this.
You don't need to use those extra variables for the other configs.
Just set them like
mfcc_options.mel_opts.low_freq = 20.0;
etc.

Dan
> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Patrick Lange

unread,
Jun 8, 2016, 6:49:58 PM6/8/16
to kaldi-help, dpo...@gmail.com
Thank you for the hint. I changed my code as follows. Is there some setting make_mfcc.sh is using I am still forgetting to set in my code? I have a feeling that the difference is still too big to result from binary -> ascii conversion only.

    /* Read in wav file from disk */
    std::ifstream is(argv[1]);
    WaveData wave;    
    wave.Read(is);
    
    /* Pretend data is single channel - even if it is not */ 
    SubVector<BaseFloat> waveform(wave.Data(), 0);

    MfccOptions mfccOptions;
    mfccOptions.frame_opts.samp_freq = 16000.0f;
    mfccOptions.frame_opts.frame_length_ms = 20.0f;
    mfccOptions.mel_opts.num_bins = 23;
    mfccOptions.mel_opts.low_freq = 20.0f;
    mfccOptions.mel_opts.high_freq = 3700.0f;
    mfccOptions.num_ceps = 20;

    /* Compute raw mfcc features from mono channel wav data */ 
    Mfcc mfcc (mfccOptions);
    Matrix<BaseFloat> rawFeats;
    mfcc.Compute(waveform, 
                 1,               // vtln_warp
                 &rawFeats,
                 NULL);
    cout << rawFeats;


new results from code
  13.9194 -5.84237 4.92261 0.431565 7.71649 -6.48692 15.7738 2.94408 14.0069 20.8271 25.1748 7.55862 -4.68174 -0.273268 2.58557 -0.774122 4.77482 -4.10706 -3.00028 -2.01903 
  15.6087 -11.2865 12.0478 1.09525 -4.16659 5.60072 14.6016 -8.94944 8.54475 11.0587 13.0309 13.7087 -6.10499 -8.04935 -3.89641 -5.7781 4.0427 2.82611 -0.810036 2.02079 

Daniel Povey

unread,
Jun 8, 2016, 7:09:38 PM6/8/16
to Patrick Lange, kaldi-help
I don't think there is such a setting, but I you could try running
make_mfcc.sh and isolate the baseline to a single command, which
should help you debug it. And try to make your code as like the
reference code as possible till you find it.
Incidentally, your config does not make a lot of sense because you
have --sample-frequency=16000, but high-freq=3700-- it should be
closer to 8kHz than 4kHz if the sample frequency is 16kHz.
Dan


> mfcc.conf
> --sample-frequency=16000
> --frame-length=20
> --low-freq=20
> --high-freq=3700
> --num-ceps=20

Patrick Lange

unread,
Aug 8, 2016, 7:19:16 PM8/8/16
to kaldi-help, patrick...@googlemail.com, dpo...@gmail.com
A colleague figured out why the results were still different. It was because the scripts stored the features compressed (probably using copy-feats with compression) while my code does not compress the features.

Yours,
Patrick
Reply all
Reply to author
Forward
0 new messages