Hi Emily,
thanks so much for your answer. All of the above I'm actually doing already (apart from higher capacity - that's something I actually started out with first, but the loss went nan immediately ...)
But right now, I wanted to start another run in order to afterwards inspect the generated pixel values more closely - but now, with the exact same parameters as yesterday, I also get a nan loss, though not right from the outset. Now instead I'm letting it run with higher capacity again, but a lower learning rate ... see what happens...
BTW I'm using cifar as it comes with tfds, so these are integers in the range 0-255 and all preprocessing does is just cast to float - exactly as in the docstring example that uses mnist.
Speaking of mnist, as well as of the black-and-white dataset I'm using for the post - there I ran into nan issues when bumping up num_logistic_mix to 10 - but as they write in the paper that 5 should be enough, I just went back to that.
Just wanted to let you know :-)
I think I can publish this without cifar (just using quickdraw, which looks a bit weird but then, the actual samples are weird as well ;-)) - was thinking of Friday - but of course it would be nice to have color examples as well, so if you happen to have another idea please let me know :-)
If I may ask yet another question - I tried various ways of saving the model/model weights, but got various errors ... what would be the preferred way to save the weights here?
Many thanks again!