CUDA 6.5 + Python + Theano + Lasagne on Jetson TK1

839 views
Skip to first unread message

Jason Parham

unread,
Jun 3, 2015, 4:14:02 PM6/3/15
to lasagn...@googlegroups.com
I wanted to give a brief check-in on this google group as I have seen very few people actually post online about their success running CUDA 6.5 with Python + Theano + Lasagne on an NVIDIA Jetson TK1.  For those of you who are unfamiliar: https://developer.nvidia.com/jetson-tk1

"The NVIDIA Jetson TK1 development kit is a full-featured platform for Tegra K1 embedded applications. It allows you to unleash the power of 192 CUDA cores to develop solutions in computer vision, robotics, medicine, security, and automotive."

The Jetson is a SoC sporting a quad-core ARM processor and 2GB of shared system and graphics memory.  It runs CUDA 6.5 on 192 Kepler-generation cores.  Overall, a solid little machine that can run a small convolutional neural network.

I've been working on a remote-sensing project that requires some form of scene classification and/or object detection on an embedded platform.  Our prototype system uses a DCNN architecture and we've successfully taken production code running on (and models trained on) a Tesla K20m and very simply installed it (not ported, hallelujah Python) on the TK1.  Obviously there are compilation issues going from x86 to ARM, but the platform is mature enough that most required things can be apt-get installed or installed as Python modules and compiled with gcc or gfortran.  Luckily, NVIDIA has done some of the grunt work for giving instructions to install Ubuntu with the correct CUDA and NVIDIA drivers.

Now, the bread and butter.  Taking the stereotypical 10,000 28x28-pixel patch MNIST example, we were able to get the following performance:
  1. The Tesla K20m can train the model in a very short amount of time and by the first epoch is achieving >98% on the 60,000 MNIST training data.
  2. The final model (without data augmentation) achieves a performance of 99.38% on the validation set within 5 minutes, averaging 17 seconds per epoch.  This is rather slow, but training includes a lot of debugging output (like convolutional filter drawing) and the times should only serve as a relative benchmark.
  3. The final trained model achieves a 99.44% accuracy on the MNIST test data and classifies all 10,000 patches in 0.7406 seconds -- getting only 56 examples incorrect.  The state-of-the-art performance on MNIST has the error around 20-23 cases, depending on the architecture.
  4. The Jetson TK1 can go through one full training epoch in roughly 190 seconds.  We opted to simply transfer the pre-trained K20m model onto the Jetson.
  5. The Jetson, using the same model as the K20m, achieves an accuracy of 98.48% in 4.857 seconds.

Note the drop in performance from the K20 to the TK1.  We do not see this drop in performance when we run the same model on other x86-supported, non-K20, desktop GPUs.  One explanation we can offer as to why the performance becomes degraded is that the ARM-compiled libraries or Theano compiler is losing bits of precision somewhere and this error is accumulating as the signal propagates deeper into the network.  This is pure guess and intuition and should be taken with a grain of salt.

Nevertheless, hopefully this post serves as a verifiable proof-of-concept and as at least one benchmark for the DCNN performance on a Jetson TK1.

Frédéric Bastien

unread,
Jun 3, 2015, 4:27:25 PM6/3/15
to Jason Parham, lasagn...@googlegroups.com
Did you used cudnn? If so, it isn't 100% deterministic during training, but it should be during testing. This is a speed up vs deterministic choice they did. Can you redo the test without cudnn to be sure this isn't the cause?

Otherwise, do you know if ARM use the full IEEE floats? If not, that would explain small eps that could accumulate.

Thanks for sharing.

Fred

--
You received this message because you are subscribed to the Google Groups "lasagne-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lasagne-user...@googlegroups.com.
To post to this group, send email to lasagn...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lasagne-users/f8ad6fc8-e1f0-415b-a458-e2a60b8f2378%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages