Thanks Ian, I will play with it today and report back if I'm missing something I can't figure out.
Jeremy, I do research in cognitive neuroscience and I'm interested in comparing the output of this analysis with behavioral and neural data to study neural mechanisms of prediction. By extracting entropy and surprisal from the output layer, I would interpret these in the cognitive sense as prediction uncertainty and prediction error. Both are very relevant to how neuroscientists think about perception these days. I'm also interested in the question of learning. I.e. how much (as humans) do we update our internal models of music from hearing a surprising note. I could imagine that for some people a given note could be too surprising given the context, and we therefore don't learn how it relates to the previous notes, while others are also surprising but allow us to re-interpret the previous music in a new light or allow us to expand our understanding to appreciate this as being more probably in the future (e.g. maybe something like a piccadilly third). It would be interesting to see if we could model this with the RNN. I would like to break down the training algorithm to go note by note to see how much each note would change the weights. I'm not sure what exactly would be the best metric on this front yet. But I'm open to suggestions!
I imagine if each piece you run through this analysis has some functional harmonic analysis completed, you might be able to use this kind of analysis to see what kind of harmonic relationships the RNN has learned over the course of the training. And perhaps what it is most ready to learn next. Is this the sort of idea you were thinking of?