PixelCNN distribution for CIFAR

Sigrid Keydana

unread,

May 25, 2020, 4:59:14 AM5/25/20

to TensorFlow Probability

Hi,

I finally got back to the plan of writing about TFP's PixelCNN distribution :-)

However I have problems generating conditional images from CIFAR-10, and I'm not sure as to the cause(s). (My other dataset is black-and-white, and that works fine [given the inputs, which are pretty irregular] - so I'm wondering if it has to do with having 3 channels now?)

My input parameters are pretty standard:

image_shape = c(32, 32, 3),
conditional_shape = list(),
num_resnet = 3,
num_hierarchies = 2,
num_filters = 128,
num_logistic_mix = 5,
dropout_p = .5

Here would be a try to generate cars (others look similar). Especially confusing is the amount & intensity of red generated I guess...

Would anyone have an idea as to the cause of the problem?

Many thanks!

Sigrid

Emily Fertig

unread,

May 26, 2020, 8:56:45 PM5/26/20

to Sigrid Keydana, TensorFlow Probability

Hi Sigrid,

Great to hear you're picking up Pixel CNN again! There was a bug in the first version of the distribution (which was in the 0.9 release) that was manifesting as skewing the images red, so if you're using TFP 0.9 that's most likely the problem. It was fixed in late January, so v0.10 (and tfp-nightly since then) include the fix.

If you're already using a more recent version, then I'd double-check that the input data is integer-like between 0 and 255 (unless you're passing other `high`/`low` parameters), and that `use_weight_norm=True` and `use_data_init=True`, as both of those seem to make a big difference on color images. It's also possible that a higher-capacity model would help (the original paper used num_hierarchies=3 and num_resnet=5 on CIFAR), but I asked around a little and anecdotally others have gotten reasonable results on CIFAR with a lower-capacity model, so I'm not sure that's the case. That's all I can think of for now -- let me know if any of that helps.

Best,

Emily

--
You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfprobabilit...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfprobability/CADX%2BMzpYObQi8%2BxjJMi3XKnNq%2BT8yObRpojnKTq%3DG%3DFUbB6UGg%40mail.gmail.com.

Sigrid Keydana

unread,

May 27, 2020, 3:28:41 PM5/27/20

to Emily Fertig, TensorFlow Probability

Hi Emily,

thanks so much for your answer. All of the above I'm actually doing already (apart from higher capacity - that's something I actually started out with first, but the loss went nan immediately ...)

But right now, I wanted to start another run in order to afterwards inspect the generated pixel values more closely - but now, with the exact same parameters as yesterday, I also get a nan loss, though not right from the outset. Now instead I'm letting it run with higher capacity again, but a lower learning rate ... see what happens...

BTW I'm using cifar as it comes with tfds, so these are integers in the range 0-255 and all preprocessing does is just cast to float - exactly as in the docstring example that uses mnist.

Speaking of mnist, as well as of the black-and-white dataset I'm using for the post - there I ran into nan issues when bumping up num_logistic_mix to 10 - but as they write in the paper that 5 should be enough, I just went back to that.

Just wanted to let you know :-)

I think I can publish this without cifar (just using quickdraw, which looks a bit weird but then, the actual samples are weird as well ;-)) - was thinking of Friday - but of course it would be nice to have color examples as well, so if you happen to have another idea please let me know :-)

If I may ask yet another question - I tried various ways of saving the model/model weights, but got various errors ... what would be the preferred way to save the weights here?

Many thanks again!

Sigrid Keydana

unread,

May 29, 2020, 2:51:22 AM5/29/20

to Emily Fertig, TensorFlow Probability

Hey again :-)

I just published the post - a second example would probable have been too much anyway :-)

Sigrid Keydana

unread,

May 29, 2020, 2:56:37 AM5/29/20

to Emily Fertig, TensorFlow Probability

Hey again,

I just published the post without a cifar example - two examples would have been too much anyway I think: https://blogs.rstudio.com/ai/posts/2020-05-29-pixelcnn/

I'd still be interested in the model saving thing though (also, readers might ask ... :-)), and also, of course, in the cifar (or generally, color photos) question ... so just in case someone was finding something in that regard, please let me know :-)

Thanks,

Sigrid

On Wed, May 27, 2020 at 9:28 PM Sigrid Keydana <sig...@rstudio.com> wrote:

Emily Fertig

unread,

May 29, 2020, 1:33:18 PM5/29/20

to Sigrid Keydana, TensorFlow Probability

Hi Sigrid,

The blog post looks great!

For saving the model, I'd save the output of `pixelcnn.network.get_weights()`, and load it again using `pixelcnn.network.set_weights`. You can also save checkpoints with `tf.train.Checkpoint`.

I talked to someone who used Pixel CNN on CIFAR (as a VAE decoder though, not class-conditional) with a small-ish model and got reasonable-looking results (see below). His parameter configuration is as follows:

dropout_p=0.1

num_hierarchies=2 num_resnet=1 num_logistic_mix=2 num_filters=32

He trained it with Adam and a LR of .0001, using `tfp.experimental.nn.util.make_fit_op`. The lower dropout value stands out -- I also think it would be worth trying one-hot encodings of the class-conditional input, if you haven't already.

Best,

Emily

Sigrid Keydana

unread,

May 29, 2020, 2:26:53 PM5/29/20

to Emily Fertig, TensorFlow Probability

Thanks again Emily! I didn't know about `pixelcnn.network.get_weights()` and its complement ...

About CIFAR, yeah, that's way lower dropout but also, a way lower capacity model... I'll need to try that. I used the same optimizer and learning rate though (that, and lower ones...). Do you know what is the cause of the NaNs that seem to quickly appear during training? Well "quickly" isn't a good description ... for me, there were two conditions triggering them, not just on CIFAR but also on MNIST:

- num_logistic_mix > 5 (e.g., 10)

- relatively higher values of num_resnet and num_hierarchies (not higher than the defaults though - I think I got NaNs with the defaults, but don't remember exactly on what dataset that was).

I didn't know about `tfp.experimental.nn.util.make_fit_op` either - but if there are no special reasons to use it, I think Keras is a better fit for R users as they're more familiar with it than with GradientTape & co :-)

Best,

Sigrid

Emily Fertig

unread,

May 29, 2020, 3:52:02 PM5/29/20

to Sigrid Keydana, TensorFlow Probability

Hi Sigrid,

Glad to help! I'm not sure what could be causing the NaNs -- when I've seen that previously, it's been fixable by something you're already doing (weight norm/data-dependent initialization, learning rate adjustment). I'll ask around some more and let you know if I turn up anything...

I don't think there are particular advantages to `make_fit_op`, except for more visibility/control over the training process for debugging.

Best,

Emily

Sigrid Keydana

unread,

May 29, 2020, 3:55:02 PM5/29/20

to Emily Fertig, TensorFlow Probability

Thanks again, and no hurries ... :-)

(((just curious to hear in case something turns up)))

Reply all

Reply to author

Forward