Hi Joris,
You can see the comparisons on the MAESTRO (all piano) dataset in our ISMIR and ICLR papers about the architecture:
Depending on the metric used, they get better scores than the Onsets and Frames model.
However, I've heard anecdotally that Onsets and Frames might generalize better outside of the MAESTRO dataset, so depending on your exact use case, it's probably still worth doing some comparisons. I suspect Onsets and Frames is also faster to run because it's a smaller model, but I haven't actually done the benchmark and it may depend on the hardware you're using.
Sorry there's not a clear answer to your question, but hopefully those details help some.
I'd love to hear how the project goes!
-Fjord