Onset and Frames with TFLite - Size and latency reduction of the model

Joris Astier

unread,

Jun 23, 2022, 12:47:20 PM6/23/22

to Magenta Discuss

Hello everyone!

We are currently using the Onset and Frames model under TFLite in production on our Unity mobile app to detect in real time the notes played on the piano by our users. The results are quite promising but we have two issues:

1. The lightest model is 75MB. We can't exceed 150MB to publish an Android app. This leaves us little size for the rest of the app (we have embedded the model in the app). We are now restricted in our evolutions by the size limits.

2. We would like if possible to reduce the inference time of the model (currently we get 110 ms on Android and 80 ms on iOS).

Here is how the model currently works in production: we get 50ms of audio as input and we simply need the model to return note numbers (from 1 to 88, corresponding to all the keys of a piano), assuming that several notes can be played at the same time (piano chords).

It seems to me that Onset and Frames does a lot more than just predict which notes have been played. It also predicts the tempo, the duration of each note, ... But we only need the notes that are played at the moment.

This brings me to my question: is it possible to modify some intrinsic parameters of the model that are not useful for our use case (calculation of the duration of the notes, the tempo, ...) to reduce the weight and the latency of the model?

If you need more details, I'm available to give you some!

Thanks in advance for your help!

Joris

Václav Volhejn

unread,

Jun 24, 2022, 5:00:52 AM6/24/22

to Joris Astier, Magenta Discuss

Hi Joris,

Here are two general tips that aren't specific to the model:

- try model quantization if you haven't, TFLite has good support. This reduces model size by 4x because the weights go from float32 and should also help with inference time (though probably not 4x).

- a big factor in latency is also which framework/runtime you use. You could try e.g. ONNX Runtime and compare the performance. Here is a paper that compares TFLite with some other runtimes as well (though it is just an arxiv preprint).

Good luck!

Václav

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Sayak Paul

unread,

Jun 24, 2022, 5:08:34 AM6/24/22

to Václav Volhejn, Joris Astier, Magenta Discuss

Recent work on FRILL from Google Brain provides a simple framework for obtaining small yet performant models for mobile:

* Distillation

* Quantization-aware training so that quantization-induced errors are minimized

* Factorization of the bottleneck to keep it less memory-heavy

Sayak Paul | sayak.dev

Curtis "Fjord" Hawthorne

unread,

Jun 29, 2022, 2:03:19 PM6/29/22

to Sayak Paul, Mike Tyka, Václav Volhejn, Joris Astier, Magenta Discuss

Hi Joris,

+Mike Tyka who created those TFLite demos. Glad to hear those models have been useful for you!

What I'd recommend is training new versions of the models tailored for your use case. So, definitely remove the velocity prediction head, and play around with removing the LSTM layer or onset head. Then convert the trained model to TFLite, possibly using the quantization techniques others mentioned. Unfortunately, we don't have the time to support this effort, but everything you need to get started should be in the open source code.