Why is Keras running so slow on GPU?

3,603 views
Skip to first unread message

cki...@gmail.com

unread,
Sep 1, 2015, 3:41:29 AM9/1/15
to Keras-users
Hi all,

I am trying to run the example of cifar10_cnn.py, however it is very very slow. This is a GPU from GeForce GTX 780 as the debug message shows, but I think it is still too slow. What could be wrong and how should I debug it? Thanks!

################
Log output

Using gpu device 0: GeForce GTX 780
X_train shape: (50000, 3, 32, 32)
50000 train samples
10000 test samples
Using real time data augmentation
----------------------------------------
Epoch 0
----------------------------------------
Training...
50000/50000 [==============================] - 274s - train loss: 1.5861     
Testing...
10000/10000 [==============================] - 25s - test loss: 1.2306     
----------------------------------------
Epoch 1
----------------------------------------
Training...
50000/50000 [==============================] - 272s - train loss: 1.2637     
Testing...
10000/10000 [==============================] - 25s - test loss: 1.1018     
----------------------------------------
Epoch 2
----------------------------------------
Training...
50000/50000 [==============================] - 275s - train loss: 1.1315     
Testing...
10000/10000 [==============================] - 25s - test loss: 0.9809     
----------------------------------------
Epoch 3
----------------------------------------
Training...
50000/50000 [==============================] - 274s - train loss: 1.0477     
Testing...
10000/10000 [==============================] - 25s - test loss: 0.9126    

....
CTRL+C quitted here

########
There is a slight change in source code as I am compiled against numpy 1.11.0.dev0+e4d4b45, there is a strict check now applied to type conversion of numpy.ndarray, so I added these extra two lines. See git diff:

$ git diff
diff --git a/examples/cifar10_cnn.py b/examples/cifar10_cnn.py
index b495248..dfb1743 100644
--- a/examples/cifar10_cnn.py
+++ b/examples/cifar10_cnn.py
@@ -93,6 +93,9 @@ else:
         horizontal_flip=True,  # randomly flip images
         vertical_flip=False)  # randomly flip images
 
+    X_train = X_train.astype("float32")
+    X_test = X_test.astype("float32")
+
     # compute quantities required for featurewise normalization
     # (std, mean, and principal components if ZCA whitening is applied)
     datagen.fit(X_train)

Amit Beka

unread,
Sep 1, 2015, 6:21:08 AM9/1/15
to cki...@gmail.com, Keras-users
I'm not a GPU expert, but check that your CPU isn't throttled -- maybe it limits the GPU utilization somehow. I'm not sure what the numpy check tells you, but you should use theano.config.floatX as the dtype for all arrays - it ensures that when theano runs on GPUs it uses 32-bit precision, and 64-bit on CPUs

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/3bcb3988-2319-44d4-8518-899f67503746%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

François Chollet

unread,
Sep 1, 2015, 10:34:51 AM9/1/15
to Amit Beka, cki...@gmail.com, Keras-users
Also make sure that you have cuDNN installed. The basic implementations of convolution in Theano are significantly slower.

Eric Chio

unread,
Sep 1, 2015, 12:23:10 PM9/1/15
to François Chollet, Amit Beka, Keras-users
Thanks everyone and fchollet. I confirm I don't have cudnn just cuda. I'm waiting to get approved to download the cuDNN, meanwhile could it be that slow without cudnn? Thanks.

I know for typical implementations of numpy without the *blas like it could be quite slow but not as slow.

How long does it take to complete cifar10_cnn.py training in general? Anyone mind giving me a ballpark?


--

Best Regards
Eric Chio

François Chollet

unread,
Sep 1, 2015, 1:29:26 PM9/1/15
to Eric Chio, Amit Beka, Keras-users
You can also try turning off image augmentation in the CIFAR example. It takes quite a bit of time on its own (and it's running entirely on CPU).

Eric Chio

unread,
Sep 2, 2015, 3:05:27 AM9/2/15
to François Chollet, Amit Beka, Keras-users
I installed cuDNN. Here is the stats, with image aug and without image aug. For the example of using nb_epoch = 200, it takes 25s (without image aug) and 128s (with image aug)  per epoch, that would be 83 minutes and 426 minutes respectively. Is it normal to take that long? Or in practice I would run only 1-2 epochs?

###

Without image augmentation, takes ~25s (cudnn).

Using gpu device 0: GeForce GTX 780
dnn_available(): True
X_train shape: (50000, 3, 32, 32)
50000 train samples
10000 test samples
Not using data augmentation or normalization
Epoch 0
50000/50000 [==============================] - 25s - loss: 1.6664     
Epoch 1
50000/50000 [==============================] - 25s - loss: 1.2372     

###

With image augmentation, takes ~128s (cudnn) vs ~270s (without cudnn).

Using gpu device 0: GeForce GTX 780
dnn_available(): True
X_train shape: (50000, 3, 32, 32)
50000 train samples
10000 test samples
Using real time data augmentation
----------------------------------------
Epoch 0
----------------------------------------
Training...
 3424/50000 [=>............................] - ETA: 128s - train loss: 2.1665


--

Best Regards
Eric Chio

Reply all
Reply to author
Forward
0 new messages