What is the difference between the main mfcc_op, the lite mfcc kernel and experimental microfrontend mfcc op?

Michael O'Cleirigh

unread,

Jan 31, 2021, 10:29:43 PM1/31/21

to SIG Micro

Hello,

I'm working on a micropython implementation of the micro_speech example.

The out of the box example requires the wav to mfcc conversion to be done externally to the tensor layers.

The micro_speech demo makes use of the experimental microfrontend which is defined in: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/microfrontend

Within that code there is an op called 'AudioMicrofrontend':

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/microfrontend/ops/audio_microfrontend_op.cc

There is also a c++ mfcc op defined in tensorflow lite:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/mfcc.cc

There is also a c++ implementation in main tensorflow:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/mfcc_op.cc

Could someone comment on what the differences are for the 3 different tensorflow implementations?

As they appear to exist at the regular, lite and micro level why are they not in the out of the box supported set of convertible ops?

At the moment as a first step I want to just passthrough to the same call that the micro_speech example uses but I thought longer term it could make sense to figure out how to support having the mfcc tensor op work directly.

Thanks for any help on this,

Michael

Pete Warden

unread,

Feb 1, 2021, 5:27:06 PM2/1/21

to Michael O'Cleirigh, SIG Micro

Hi Michael,

those are very good questions! I've been involved with a lot of these implementations, so I'll answer inline as best I can.

The out of the box example requires the wav to mfcc conversion to be done externally to the tensor layers.

We're considering moving the feature generation into the graph as ops in the future, to try to make it easier to deploy audio models without having to this work in the application layer.

Incidentally, the process we're using is a bit different than classic MFCC (though it is still using FFTs to generate spectrograms). If you have access to the TinyML book, pages 214 to 217 have a lot of detail (and let me know if you need a copy).

The micro_speech demo makes use of the experimental microfrontend which is defined in: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/microfrontend
Within that code there is an op called 'AudioMicrofrontend':
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/microfrontend/ops/audio_microfrontend_op.cc

This code is used for training the model used in the micro_speech demo. You can see the training script here:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/train/train_micro_speech_model.ipynb

That notebook calls into the speech training library, where it uses the op here:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/speech_commands/input_data.py#L45

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/speech_commands/input_data.py#L473

There is also a c++ mfcc op defined in tensorflow lite:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/mfcc.cc
There is also a c++ implementation in main tensorflow:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/mfcc_op.cc

These implementations are of a regular MFCC op, that proved to not be as accurate as the microfrontend version that we moved to. They're purely legacy and no longer used anywhere I'm aware of, at least in Micro.

Could someone comment on what the differences are for the 3 different tensorflow implementations?
As they appear to exist at the regular, lite and micro level why are they not in the out of the box supported set of convertible ops?
At the moment as a first step I want to just passthrough to the same call that the micro_speech example uses but I thought longer term it could make sense to figure out how to support having the mfcc tensor op work directly.

Does my explanation of the differences between the microfrontend and the MFCC op help? As I mention at the start, we are starting to think about rolling this kind of feature generation into the model eventually, but for now it's an application-level responsibility to preprocess the data correctly.

--
You received this message because you are subscribed to the Google Groups "SIG Micro" group.
To unsubscribe from this group and stop receiving emails from it, send an email to micro+un...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/micro/b82b79bf-8e62-46bc-aacc-45b133a94f28n%40tensorflow.org.

Michael O'Cleirigh

unread,

Feb 2, 2021, 10:10:07 PM2/2/21

to SIG Micro, Pete Warden, SIG Micro, Michael O'Cleirigh

Hi Pete,

Thanks your explanation is perfect and gives me the context I need.

I didn't realize how much detail the TinyML book had for the micro examples so your references there have been very useful aswell.

I thought this might be the example to use to work on externalizing the ops in micropython but now I think I should really implement the base examples first and then figure out how to move the ops from the firmware into the application file system from the working examples in a later phase.

I'm going to make an audio_frontend module for my micropython firmware and have it thinly wrap the microfrontend so that the wav bytearray to tensor input conversion. Skipping over the different mfcc'ism op's for now.

Regards,

Michael

Pete Warden

unread,

Feb 3, 2021, 7:32:45 PM2/3/21

to Michael O'Cleirigh, SIG Micro

Thanks for the update Michael, that plan sounds good to me! Let me know how you get on, and if there's anything we can do to help.

Reply all

Reply to author

Forward