Awkward GAN example not converging

40 views
Skip to first unread message

Ayan Biswas

unread,
Jan 22, 2020, 1:46:41 AM1/22/20
to deep-learning-illustrated
I was trying to run the GAN example notebook and it appears as if the system is not learning anything. Has that happened to anyone? I was running the code in my macbook pro and even after 2000 epochs, the output is random.

Here is the training loss figure:

Screenshot 2020-01-21 23.45.47.png

I did not use docker for this. I was just trying to run the notebook from my conda environment.

thanks a lot for any help/suggestions.

Ayan

Ayan Biswas

unread,
Jan 22, 2020, 1:48:02 AM1/22/20
to deep-learning-illustrated
I forgot to mention that I really liked the book. I thoroughly enjoyed reading it and this book explained everything very clearly. Thank you for that.

Ayan

Grant Beyleveld

unread,
Jan 22, 2020, 10:08:14 AM1/22/20
to deep-learning-illustrated
Hi Ayan,

Thanks for the email, and we're really glad you liked the book!

There's a note in the generative_adversarial_network.ipynb notebook that I think applies directly to your problem. I've pasted the note below for you, marking the relevant parts in bold:

In order to efficiently carry out the training in this notebook, we recommend using a GPU. Most readers don't have a GPU suitable for TensorFlow operations (i.e., an Nvidia GPU with CUDA and cuDNN drivers installed) available on their local machine, however you can easily access one for free via Colab. 
 
Separately, for reasons that escape us, the discriminator in this notebook nearly always fails to learn if you train on a CPU only. Because of this failure, the GAN will seldom learn how to generate sketches -- i.e., it will output images that are merely random noise. There are two ways we've identified to remedy this situation: 
 
Use a GPU. If you don't have one, use Colab as suggested above. While in Colab, you can select "Change runtime type" from the "Runtime" item in the menu bar, and choose "GPU" as your hardware accelerator. This hardware accelerator trains the GAN orders of magnitude more rapidly than the "None" or "TPU" options, and the discriminator (we have no idea why!) will train properly. 
 
Change the discriminator's optimizer. As noted by a comment in this notebook's discriminator compilation step, switching from the default RMSprop optimizer to another (e.g., Adam or AdaDelta) enables the discriminator to learn effectively and therefore the GAN generates sketches. Whether you use a CPU only, a GPU, or a TPU, this solution is effective. (That said, training the GAN with a GPU is still way faster than with a CPU only or a TPU.)
 
Give this a try and let us know if it solves your problem too.

I will add, however, that it's relatively difficult for us to diagnose a problem when the code is being run outside of the Docker environment we provided (a key reason for that environment was to ensure that everyone who runs the examples would be using the exact same versions of all of the packages involved). While I don't think the environment is your problem here, it's something to keep in mind going forward. If you're using the Docker environment you'll at least be sure that there are no version issues that are hampering your progress.

Thanks for the feedback!
Grant

Grant Beyleveld

unread,
Jan 22, 2020, 10:15:07 AM1/22/20
to deep-learning-illustrated
I've also added the note to the awkward-GAN-with-no-warning.ipynb notebook on Github to clear up any confusion for others!

Ayan Biswas

unread,
Jan 23, 2020, 7:17:47 PM1/23/20
to deep-learning-illustrated
It worked. I changed the optimizer to Adam and ran it on cpu. Thanks for the solution.

Ayan Biswas
Scientist
Los Alamos National Lab

Jon Krohn

unread,
Jan 24, 2020, 12:38:36 PM1/24/20
to deep-learning-illustrated
Awesome, Ayan! We had a feeling that would work -- it's a perplexing bug that has cost us days of confusion :)
Reply all
Reply to author
Forward
0 new messages