Why is Keras running so slow on GPU?

cki...@gmail.com

unread,

Sep 1, 2015, 3:41:29 AM9/1/15

to Keras-users

Hi all,

I am trying to run the example of cifar10_cnn.py, however it is very very slow. This is a GPU from GeForce GTX 780 as the debug message shows, but I think it is still too slow. What could be wrong and how should I debug it? Thanks!

################

Log output

Using gpu device 0: GeForce GTX 780

X_train shape: (50000, 3, 32, 32)

50000 train samples

10000 test samples

Using real time data augmentation

----------------------------------------

Epoch 0

----------------------------------------

Training...

50000/50000 [==============================] - 274s - train loss: 1.5861

Testing...

10000/10000 [==============================] - 25s - test loss: 1.2306

----------------------------------------

Epoch 1

----------------------------------------

Training...

50000/50000 [==============================] - 272s - train loss: 1.2637

Testing...

10000/10000 [==============================] - 25s - test loss: 1.1018

----------------------------------------

Epoch 2

----------------------------------------

Training...

50000/50000 [==============================] - 275s - train loss: 1.1315

Testing...

10000/10000 [==============================] - 25s - test loss: 0.9809

----------------------------------------

Epoch 3

----------------------------------------

Training...

50000/50000 [==============================] - 274s - train loss: 1.0477

Testing...

10000/10000 [==============================] - 25s - test loss: 0.9126

....

CTRL+C quitted here

########

There is a slight change in source code as I am compiled against numpy 1.11.0.dev0+e4d4b45, there is a strict check now applied to type conversion of numpy.ndarray, so I added these extra two lines. See git diff:

$ git diff

diff --git a/examples/cifar10_cnn.py b/examples/cifar10_cnn.py

index b495248..dfb1743 100644

--- a/examples/cifar10_cnn.py

+++ b/examples/cifar10_cnn.py

@@ -93,6 +93,9 @@ else:

horizontal_flip=True, # randomly flip images

vertical_flip=False) # randomly flip images

+ X_train = X_train.astype("float32")

+ X_test = X_test.astype("float32")

+

# compute quantities required for featurewise normalization

# (std, mean, and principal components if ZCA whitening is applied)

datagen.fit(X_train)

Amit Beka

unread,

Sep 1, 2015, 6:21:08 AM9/1/15

to cki...@gmail.com, Keras-users

I'm not a GPU expert, but check that your CPU isn't throttled -- maybe it limits the GPU utilization somehow. I'm not sure what the numpy check tells you, but you should use theano.config.floatX as the dtype for all arrays - it ensures that when theano runs on GPUs it uses 32-bit precision, and 64-bit on CPUs

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/3bcb3988-2319-44d4-8518-899f67503746%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

François Chollet

unread,

Sep 1, 2015, 10:34:51 AM9/1/15

to Amit Beka, cki...@gmail.com, Keras-users

Also make sure that you have cuDNN installed. The basic implementations of convolution in Theano are significantly slower.

To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/CAMCTSc1_0a8oJsjQD4UAJB8qpJUW33EN4xHW_0_6pyuBgmPZaQ%40mail.gmail.com.

Eric Chio

unread,

Sep 1, 2015, 12:23:10 PM9/1/15

to François Chollet, Amit Beka, Keras-users

Thanks everyone and fchollet. I confirm I don't have cudnn just cuda. I'm waiting to get approved to download the cuDNN, meanwhile could it be that slow without cudnn? Thanks.

I know for typical implementations of numpy without the *blas like it could be quite slow but not as slow.

How long does it take to complete cifar10_cnn.py training in general? Anyone mind giving me a ballpark?

--

Best Regards
Eric Chio

François Chollet

unread,

Sep 1, 2015, 1:29:26 PM9/1/15

to Eric Chio, Amit Beka, Keras-users

You can also try turning off image augmentation in the CIFAR example. It takes quite a bit of time on its own (and it's running entirely on CPU).

Eric Chio

unread,

Sep 2, 2015, 3:05:27 AM9/2/15

to François Chollet, Amit Beka, Keras-users

I installed cuDNN. Here is the stats, with image aug and without image aug. For the example of using nb_epoch = 200, it takes 25s (without image aug) and 128s (with image aug) per epoch, that would be 83 minutes and 426 minutes respectively. Is it normal to take that long? Or in practice I would run only 1-2 epochs?

###

Without image augmentation, takes ~25s (cudnn).

Using gpu device 0: GeForce GTX 780

dnn_available(): True

X_train shape: (50000, 3, 32, 32)

50000 train samples

10000 test samples

Not using data augmentation or normalization

Epoch 0

50000/50000 [==============================] - 25s - loss: 1.6664

Epoch 1

50000/50000 [==============================] - 25s - loss: 1.2372

###

With image augmentation, takes ~128s (cudnn) vs ~270s (without cudnn).

Using gpu device 0: GeForce GTX 780

dnn_available(): True

X_train shape: (50000, 3, 32, 32)

50000 train samples

10000 test samples

Using real time data augmentation

----------------------------------------

Epoch 0

----------------------------------------

Training...

3424/50000 [=>............................] - ETA: 128s - train loss: 2.1665

--

Best Regards
Eric Chio

Reply all

Reply to author

Forward