I like the simplicity of the Markov Chain learning algorithm. Yet I suspect memory may be necessary for good music. Though if instead of counting the current note as a state, the last so many ordered notes could be counted as the current state and that would have the effect as memory.
Also, for something like radio head, the state of the current note is more complex because several notes can play at once, with different overtones and durations. Yet, from my naive perspective (the meager extent to which a followed Bozonier's chain algorithm) it seems that each level of complexity could be handled by adding another dimension to the matrix. The data would have an added degree of complexity while the program would simply add one more comparison such as how often each current not is played with one or the other current notes.
And then there's the separate issue of converting the sound file into a sequence of notes, note qualities, and timings.
The problem reminds me of Dirk Gentley's Holistic Detective Agency where Richard McDuff, a programmer, is developing a program to compose music.
I think designing a program to learn what music is is a good place to start.