Yes, I think the separation into two representation would offer benefits
later on. In particular, there could be an "articulation" control to add gaps
between successive notes, ranging from "tenuto" with just enough
separation so repeating the same note sounds distinct up through
"staccato" to "marcato" or some fun extreme term.
Having a conversion step makes a natural place to add effects like
this or even more mundane stuff like globally transposing all the notes.
It also helps (I hope) encapsulate the "master data structure" and
abstract away other data that can be easily generated as needed.
Because OP already has code, a mostly working program AIUI.
To be enticing to him, it should ... hmm. Maybe I shouldn't pretend
to know the answer to that.
But yes, the midi stream is organized as a stream of events that can
be "Note On", "Note Off", or "Control Change ..." (for stuff like the
position of the bender if it's moving, or selecting a GM sound on a
channel). All of these are prefixed with a relative time-offset IIRC
in a variable-length integer format where the 7th bit On indicates
the last byte. I'm note sure how much of this is needed by the midi
player. It probably can process the integer for you.
I wrote a program years ago that created midi files using a
line oriented input format (it even required an "END" line at the
end or it wouldn't work). It read in data for several channels
separately and then had sort the list of events because even
though each channel was sequential, the several channels
together were supposed to be interleaved.