Real-time Onset and Frames

842 views
Skip to first unread message

Ryan Kelln

unread,
Oct 8, 2019, 10:52:33 PM10/8/19
to Magenta Discuss
Hi, Curtis and Adam, and other Magenteers (?),

I'm working  on a project to do onset and frame detection from real-time audio. Just getting started but I'd love some feedback as I'm already running into various issues.

I am working in Python but need the uni-directional model. I'm fairly new to Tensorflow / Magenta so there is a bunch of things I may not be understanding correctly.

I was hoping to build a quick test where I load a pretrained uni-directional model and start figuring out how to feed it data from pyaudio. I have the pyaudio code up and running, and can get audio data in a variety of formats but loading and connecting the model is eluding me.

The checkpoint mentioned in this issue seems to be what I need:
https://github.com/tensorflow/magenta-js/issues/265
(This is still missing the .meta file btw)

Is there a SavedModel version somewhere? I was trying to make my own from that checkpoint but running into a few issues:
  1) bidirectional=False only seems to work with use_cudnn=True
  2) bidirectional=False, use_cudnn=True has some issue more opaque to me:

  W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key offsets/conv0/BatchNorm/beta not found in checkpoint

I'd probably prefer a tflite or other frozen pretrained model, at least for my initial tests.

I don't know much about saving or loading models yet, currently doing a lot of reading but thought I'd see if something was already available or some examples of using onset and frames in python with existing checkpoints to do inference (not training).

Thanks,
Ryan

Michael Tyka

unread,
Oct 10, 2019, 2:30:27 PM10/10/19
to Magenta Discuss
Hi Ryan,

I've spent some time this year working on real-time on-device execution of magenta models, with some success.
Unfortunately, as you found, this is not trivial and conversion of complex rnn models like onsets and frames can require 
a bunch of graph surgery. 
For now, one avenue is to use tensorflow.js as that comes with a pretrained frozen model. 
Also, the bidirectional model can also be applied in overlapping chunks to create pseudo-continuous transcription.

We're working on porting models to tflite and my hope is that we can release something in the coming months.

Mike

Ryan Kelln

unread,
Oct 11, 2019, 11:05:33 AM10/11/19
to Magenta Discuss
Thanks Michael, my timeline is the next few months too, I'd be happy to work with you or anyone else on this. (My goal is performance-quality open source real-time piano transcription software.)

I'll see what sort of real-time audio stuff I can work up in js for the first tests, but I'm skeptical of the performance.
Is there any way to load the js models back into python? That was the first thing I looked into but didn't find any info about it.

I thought I'd do a quick test with my own (barely) trained uni model but it turns out my gpu doesn't have enough memory to train with the defaults. Ha. Sigh.

Cheers,
R

Adam Roberts

unread,
Oct 13, 2019, 1:12:10 PM10/13/19
to Ryan Kelln, Magenta Discuss
The js model was trained using the Python code so it certainly should be possible to do the inference in Python. 

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Ryan Kelln

unread,
Oct 14, 2019, 6:11:43 PM10/14/19
to Magenta Discuss, ryan...@gmail.com
Thanks Adam, any hints how I'd go about loading the model in Python?

Using this as reference, I figured I could turn it into some sort of Python loadable form (from the sharded weights_manifest.json form):
https://stackoverflow.com/questions/51948810/how-to-load-tfjs-model-into-python-using-keras-tensorflow/51958601

I ended up with a command like this:

tensorflowjs_converter \
   --input_format=tfjs_layers_model \
   --output_format=keras \
   /path/to/models/onsets_frames_uni/weights_manifest.json \
   /path/to/models/onsets_frames_uni_keras/onsets_frames_uni.h5

But get an error: TypeError: The JSON content is required to be a `dict`, but found <type 'list'>.

Which led me to this:
https://github.com/tensorflow/tfjs/issues/1280

Which makes me think I don't quite understand how all this works. :)

Thanks,
R
To unsubscribe from this group, send email to magenta...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta...@tensorflow.org.

Zhiguang Eric Zhang

unread,
Oct 14, 2019, 7:58:45 PM10/14/19
to Ryan Kelln, Magenta Discuss
Hi Ryan,



I've done onset extraction but haven't worked with it in Python, less in Magenta.  When you mention realtime, I have to bring up C or C++ for performance reasons.  I've done realtime FFT-based algorithms, but not onset extraction in particular.


best wishes,
Eric

To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Zhiguang Eric Zhang

unread,
Oct 14, 2019, 8:06:36 PM10/14/19
to Ryan Kelln, Magenta Discuss
as a matter of fact I don't think you need to train any deep learner to do realtime piano transcription.   you can do both pitch tracking and onset detection without machine learning.  then you can just have some music theory model to isolate the frames once you nail down the key, etc.

Adam Roberts

unread,
Oct 14, 2019, 10:08:27 PM10/14/19
to Zhiguang Eric Zhang, Ryan Kelln, Magenta Discuss
Ryan,

You should be able to run it with this Python implementation: https://github.com/tensorflow/magenta/tree/master/magenta/models/onsets_frames_transcription
The TFJS converter will not work since we had to implement it by hand in TFJS.

Also, you do not need to re-implement in C++ since TF runs C++ under the hood and this model is fast enough to run in realtime.  You could use the bidirectional model and chunk it, but the best way is to use a unidirectional model (like the one in TFJS) and run inference intermittently, saving the LSTM states as you go. It will take a bit of work, but it shouldn't be too bad!

-Adam

Ryan Kelln

unread,
Oct 14, 2019, 10:41:36 PM10/14/19
to Adam Roberts, Zhiguang Eric Zhang, Magenta Discuss
Thanks Eric, I have been doing some research into other options, and figured I may need to write the (audio input portion of the) app in C/C++ eventually for speed reasons but was just looking for a proof of concept in whatever is easiest to put together for now (and see just how bad the performance is).

If you have some links to open source software that does onset extraction I'm all ears. Stuff I've discovered:
https://aubio.org/
http://kichiki.github.io/waon/

re: Adam ok great, thank. Yes, I tried setting that model for onsets_frames_transcription_transcribe and that worked! I think I was having some issues using in my custom code, but likely something else I didn't understand, lemme go back to the python tests and I'll give you an update with more details.

Cheers,
R

Zhiguang Eric Zhang

unread,
Oct 14, 2019, 10:45:09 PM10/14/19
to Ryan Kelln, Adam Roberts, Magenta Discuss

Ryan Kelln

unread,
Oct 30, 2019, 10:23:31 PM10/30/19
to Magenta Discuss, ada...@google.com
OK, I thought I had the tfjs model working but I think I may have just missed an error message and didn't actually check that the midi wasn't garbage. I had something else go wrong and had to rebuild my env and gpu driers (sigh) and now I think I have the various errors sorted out, matching my original ones:

When I use the tfjs model, I get:
Could not find trained model in model_dir: /path/to/model/tfjs, running initialization to predict.

This still produces a midi, but its just noise. 
Is there anything special I need to do to get it to recognize the tfjs model format? The model_dir is the directory containing the shards and weights_manifest.json file right?

So if the tfjs model won't work I thought I'd go back to the unidirectional checkpoint from:
I get different errors depending on hparams:

$ onsets_frames_transcription_transcribe -\
        --model_dir=/path/onsets_uni_model \
        --checkpoint_path=/path/onsets_uni_model/model.ckpt-583632 \
        --hparams="bidirectional=False,use_cudnn=True" \
        piano.wav

Results in:
    (0) Not found: Key offsets/conv0/BatchNorm/beta not found in checkpoint
        [[node save/RestoreV2 (defined at /lib/python2.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
    (1) Not found: Key offsets/conv0/BatchNorm/beta not found in checkpoint
        [[node save/RestoreV2 (defined at /lib/python2.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
        [[save/RestoreV2/_3]]

If I change to use_cudnn=True:

    (0) Not found: Key frame/cudnn_lstm/stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/cudnn_compatible_lstm_cell/bias not found in checkpoint
        [[node save/RestoreV2 (defined at /lib/python2.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
    (1) Not found: Key frame/cudnn_lstm/stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/cudnn_compatible_lstm_cell/bias not found in checkpoint
        [[node save/RestoreV2 (defined at /lib/python2.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
        [[save/RestoreV2/_149]]


I believe this last error may be from model.py: lstm_layer():

if use_cudnn:
  ...
else:
  ...
  with tf.variable_scope('cudnn_lstm'):
      (outputs, unused_state_f,
       unused_state_b) = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(...)


Which seems odd to use a bidirectional rnn if cudnn==False? (I can write up a github issue for this if you like?)

So assuming cudnn=True is correct, if I look at the uni model using inspect_checkpoint.py I don't see any variables under offsets (frame, onsets, and velocity only).
So was this model made before the commit when offsets were added? Seems strange the model would only do onsets.
So forget about this model or is there someway to easily rescue it?

Sorry for all the questions, but hopefully solving this will help others too.

Thanks again!
Ryan
To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.

Adam Roberts

unread,
Oct 31, 2019, 1:42:38 PM10/31/19
to Ryan Kelln, Curtis Hawthorne, Michael Tyka, Magenta Discuss
It's possible the model code has diverged since that checkpoint was made. +Curtis Hawthorne +Michael Tyka is there a Python-compatible unidirectional checkpoint available?

To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Erez Makavy

unread,
Nov 23, 2019, 4:07:18 PM11/23/19
to Magenta Discuss
Hi Michael,

Is there any estimate for the release of the tflite models? Is it still in progress?

I am interested in doing Real-Time transcription on iOS.

+Michael Tyka

Zhiguang Eric Zhang

unread,
Nov 23, 2019, 4:15:21 PM11/23/19
to Ryan Kelln, Magenta Discuss, Adam Roberts
hi again,

i'm not exactly following this thread closely but you mentioned MIDI?  i thought you were talking about audio because you have to get onsets from audio before any transcription can begin.

-ez

To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.
Reply all
Reply to author
Forward
0 new messages