Repetitions in the output is an extremely common 'failure mode' of RNN
models run for generation. See for example this twitter discussion
about the best RNNLM in the game today (courtesy of Brain!) - it has
similar issues even though it is *by far* the best LM out there with
respect to perplexity!
https://twitter.com/tallinzen/status/776406902867578884 .
Usually you get by this in NLP with beam search or some kind of
fanciness in the training or decode (such as sequence level training
https://research.facebook.com/publications/sequence-level-training-with-recurrent-neural-networks/),
but in the case of Magenta that would be pretty difficult I think.
One common trick used in other areas is to keep track of what has been
put out at the last n timesteps (think moving windows of 1, 2, 3, 4,
and 5 grams over the last T timesteps), and if it is a repetition of
what happened recently, resample again until something new happens, or
turn up the temperature, or a number of other heuristics. I think this
kind of thing for specific applications can be done fairly easily, but
doing it in a general enough way for Magenta users might be very
tough.
Another simple method that can also work is class reweighting during
training and/or generation - unfortunately this too is fraught with
peril, as you are effectively changing the importance of different
data. Sometimes this is used in conjunction with the above heuristics
to avoid outputs that have happened already.
One way I explored a bit this summer for my polyphonic work was NPAD
by Kyunghyun Cho
https://arxiv.org/abs/1605.03835 - but an issue in
music is that we have no followup metric for choosing the best "beam"
from the group without some kind of hand coded heuristic method (NLP
has BLEU, METEOR, and so on). I also wonder if adding time-wise skip
connections could help avoid some of this (c.f.
http://arxiv.org/abs/1602.08210) issue, or adding noise/dropout on
specific parts of the recurrent connection.
No answers really, but some stuff to think about!