So the model (which in the beginning is pretty dumb, we can assume, as it's still learning the task) makes some prediction (which can as well be random), then uses that prediction to modify the actual true answer, and then uses that as a feedback. I don't really understand how exactly is the GT mask transformed, but unless you're doing something really clever, this doesn't seem to make sense. After all, imagine a student who's trying to learn some material from a book (GT), and during a rehearsal modifies the actual content of the book basing on their own imperfect answers - and then attempts to learn from the book they modified. IMHO, either this has no chance of working, or we're not understanding it correctly.