Handling invalid performance events

Robby Nevels

unread,

Dec 3, 2020, 1:48:40 PM12/3/20

to Magenta Discuss

Hello,

The representation that PerformanceRNN and Music Transformer allows for some invalid sequences, like a note_off event for a pitch that was never turned on, or having a note_on event before any velocity event. Are these events ignored when sampling (by setting the probability of invalid events to zero and renormalizing the distribution across only valid events) or just ignored when converting to MIDI?

One benefit to renormalizing over only valid events is that when the output is fed back into the network to generate subsequent events, it will only see valid sequences, like those it was trained on. Though I suppose the downside is that it might be more complex and slower to sample.

I'm curious if y'all have considered this, or if there's something else about these models or representations that prevents invalid sequences which I've missed.

Thanks!

Robby

Curtis "Fjord" Hawthorne

unread,

Dec 3, 2020, 4:37:28 PM12/3/20

to Robby Nevels, Magenta Discuss

We currently just have code to handle these kinds of invalid sequences: https://github.com/magenta/note-seq/blob/master/note_seq/performance_lib.py#L414

You can see some stuff in there for things like notes with 0 duration or notes that never end.

Renormalizing over only valid events is definitely interesting! It would require a little extra work to incorporate semantic knowledge of the output representation into the decoder setup, and some things would be difficult to verify with a forward-only pass (like notes that never end), but some version of this is definitely possible. Let us know if you do any experiments with that!

It would be fun to see some experiments that show what kinds of errors this corrects for and how the model reacts to being pushed in a direction other than what it was originally going to do.

-Fjord

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Robby Nevels

unread,

Dec 3, 2020, 6:50:47 PM12/3/20

to Magenta Discuss, Curtis Hawthorne, Magenta Discuss, Robby Nevels

Fjord,

I agree handling the notes that never end would be difficult, I think that only makes sense to do that after sampling is over. But cases like turning on a note that's already on or turning off a note that isn't on should be doable. One way to do it could be to add a mask to the distribution with -Inf weights on the invalid events before running softmax on it. The mask would be updated based on each sample. This way, the semantic knowledge could be abstracted as a function that just updates the mask.

Thanks for the code pointer. So far, I've been playing around with ideas from papers mentioned in the magenta blog on my own, not using the magenta repository. If I end up exploring your code in depth, I can make a github issue describing the solution more precisely.

By the way, I wonder how many invalid events occur as training progresses? I imagine they would decrease as the model improves, but not entirely disappear. But if they mostly disappear, then it's probably not worth pursuing this idea.

Robby

Ian Simon

unread,

Dec 4, 2020, 1:24:05 PM12/4/20

to Robby Nevels, Magenta Discuss, Curtis Hawthorne

A related idea is to pass the model a conditioning vector at each step indicating which notes are currently "on". I suspect adding this conditioning signal would effectively eliminate all invalid events after sufficient training, and it might help with overall likelihood as well.

-Ian

Robby Nevels

unread,

Dec 6, 2020, 2:01:00 AM12/6/20

to Magenta Discuss, Ian Simon, Magenta Discuss, Curtis Hawthorne, Robby Nevels

Ian,

I like the idea of using a conditioning vector! I'm imagining the conditioning signal would be a 128-dim vector that has 1s for the pitches which are on, and 0s for the pitches that are off. Then the vector is concatenated to the one-hot event vector as an input to the network. Is that what you were imagining?

Robby

Ian Simon

unread,

Dec 7, 2020, 12:13:25 PM12/7/20

to Robby Nevels, Magenta Discuss, Curtis Hawthorne

Hi Robby, yes that is what I was imagining. Seems like this should make the model's job easier in general, as otherwise it somehow has to try to "remember" the on/off state for all pitches.

-Ian

Robby Nevels

unread,

Dec 18, 2020, 1:09:17 PM12/18/20

to Magenta Discuss, Ian Simon, Magenta Discuss, Curtis Hawthorne, Robby Nevels

I ran some initial experiments with this idea, and the results look promising! Attached is a graph of validation NLL loss and erroneous events over epoch while training. The errors were computed each epoch by sampling the model for 1000 sequential steps after summing how many times "note-on-while-note-already-on" and "note-off-before-note-on" occurs.

Teal is just passed the one hot event vector. Blue is passed the one hot event concatenated with a 128-element vector containing the notes that are currently on and off. When conditioned with this vector, loss drops faster, and errors disappear earlier! I also preprocessed the conditioning vector in a way that makes it fast to retrieve and apply data augmentation, so training didn't take any longer.

The models themselves aren't based on PerformanceRNN/LSTMs, so the results might be less dramatic when I apply this to them. But I noticed even a large PerformanceRNN has errors after training for a long time, so I still expect to see some benefit. There's a few other things I want to check and test, and then I'll write something up with more details. Let me know if there's anything you'd suggest including.

Thanks for the great idea, Ian!

Robby

Ian Simon

unread,

Dec 18, 2020, 1:26:40 PM12/18/20

to Robby Nevels, Magenta Discuss, Curtis Hawthorne

This is great! Thanks for trying it out! The results do indeed look promising.

Another question I would try to answer is: does this conditioning signal help the model apart from reducing the number of invalid events? Because the invalid events are easy to deal with as a post-processing step (or by renormalizing over valid events).

-Ian

Robby Nevels

unread,

Dec 26, 2020, 9:46:48 PM12/26/20

to Magenta Discuss, Ian Simon, Magenta Discuss, Curtis Hawthorne, Robby Nevels

I wrote up what I did in some detail here: https://medium.com/@robzz/performancernn-with-note-on-conditioning-fac981f82d10

It does appear that the conditioning helps the model improve loss as well, though it's more apparent in smaller models.

Ian Simon

unread,

Dec 26, 2020, 10:11:56 PM12/26/20

to Robby Nevels, Magenta Discuss, Curtis Hawthorne

This is so cool! It seems like larger models implicitly learn the conditioning signal eventually, but smaller models benefit a lot from having it provided.

Just to clarify, in this image it seems like the conditioning vector for event i is the set of notes that are active *after* i. I'm guessing that's not what you did, otherwise the model gets to see note on and off events before they happen.

-Ian

Robby Nevels

unread,

Dec 26, 2020, 10:49:39 PM12/26/20

to Ian Simon, Curtis Hawthorne, Magenta Discuss

> This is so cool! It seems like larger models implicitly learn the conditioning signal eventually, but smaller models benefit a lot from having it provided.

Yep, that’s what I suspect is happening!

> Just to clarify, in this image it seems like the conditioning vector for event i is the set of notes that are active *after* i

That’s right. The conditioning vector is concatenated to the event on the same row in that image when being passed into the model. The vector does contain the notes that are active after event i (so if event i turns off a note, then conditioning vector i does not have that note on). The model then predicts the event on the next row i+1. It is not given the conditioning vector for row i+1 ahead of time, so it doesn’t cheat. Does that make sense?

This is a bit hard to describe, so I think I’ll add a diagram to the post as well.

--

Robby Nevels

unread,

Dec 27, 2020, 12:58:32 AM12/27/20

to Ian Simon, Curtis Hawthorne, Magenta Discuss

Here's the diagram I added:

With note-on conditioning, the conditioning vector ci is computed from each input xi and the previous conditioning vector ci-1. Then ci is concatenated to xi, passed into the LSTM, which outputs a distribution over events yi, which is compared with the next event xi+1 to compute the loss.

Robby

Ian Simon

unread,

Dec 29, 2020, 1:38:57 PM12/29/20

to Robby Nevels, Curtis Hawthorne, Magenta Discuss

Got it, I was just checking up on the alignment of conditioning with input. This looks exactly correct.

-Ian

Reply all

Reply to author

Forward