Building aarch64 toolchain with CUDA support for Nvidia Jetson

124 views
Skip to first unread message

Brad Larson

unread,
May 5, 2019, 10:35:32 AM5/5/19
to Swift for TensorFlow
Over in the Fast.ai forums, I posted a quick listing of a couple of aarch64 Swift for TensorFlow toolchains (CPU-only, CUDA-enabled) I was able to build on an Nvidia Jetson Xavier and get to work on Jetson devices in general (Nano, TX2, Xavier). In case it would help others, and in case there are improvements people can suggest in my build process, I wanted to post the steps I used here.

First, I set the Jetson Xavier into its 30W, 8-core mode to enable faster compilation (described here):

sudo nvpmodel -m 3

I followed the basic Swift for TensorFlow build instructions on the Jetson Xavier to get the repository and get basic packages set up.

From there, I needed to compile Bazel 0.22 for ARM64, since no binary existed for that platform. I grabbed the dist source bundle from here. Before building, I needed to install the following packages:

sudo apt-get install build-essential openjdk-8-jdk python zip unzip

Then I could build Bazel and copy the finished binaries from output/bazel to /usr/local/bin/

I applied the patches from Neil Jones' repository, per the instructions there. That will allow you to build a CPU-only aarch64 Swift for TensorFlow toolchain.

The Jetson devices come with CUDA 10.0 and cuDNN 7.3.1 as of their latest Jetpack OS / tool image, so they can support GPU-enabled TensorFlow operation. To make this work, you first need to add the following to /tensorflow/third_party/gpus/crosstool/CROSSTOOL.tpl:

default_toolchain {
  cpu
: "aarch64"
  toolchain_identifier
: "local_linux"
}


then within /tensorflow/third_party/gpus/crosstool/BUILD.tpl , find the section

cc_toolchain_suite(
    name
= "toolchain",
    toolchains
= {
       
...
   
},

)

and add

cc_toolchain_suite(
    name
= "toolchain",
    toolchains
= {
       
...
       
"aarch64": ":cc-compiler-local",
   
},
)

You then need to modify the swift/utils/build-presets.ini section that was inserted by one of Neil Jones' patches to instead read:

# Ubuntu 16.04 preset for Tensorflow on AArch64.
[preset: buildbot_linux_1604_tensorflow]
mixin
-preset=buildbot_linux,no_test
enable
-tensorflow-gpu
tensorflow_bazel_options
=--define=tensorflow_mkldnn_contraction_kernel=0
install
-tensorflow
release

Before compilation, I wanted to make sure all 8 cores of the Jetson Xavier were being used, so I added the following to ~/.bazelrc:

build --jobs 8 --local_resources 13000,8.0,1.0

Then you should be able to build using that configuration to produce a CUDA-enabled aarch64 toolchain:

./swift/utils/build-script --preset=buildbot_linux_1604_tensorflow install_destdir=/home/[USER]/swift-source/install installable_package=/home/[USER]/swift-source/install/swift-tensorflow-aarch64.tar.gz

Again, I wanted to write this down in case it was helpful for others, and so that I'll remember the steps. I did have a couple of questions after this:

This is producing some fairly massive toolchains (530 MB for the CUDA-enabled one), so am I missing steps for stripping symbols or other debug information? Is there anything I can do to produce further optimized binaries for these devices?

Brennan Saeta

unread,
May 5, 2019, 1:29:54 PM5/5/19
to Brad Larson, Swift for TensorFlow
Hey Brad!

That's totally awesome to see! As for the toolchain size, because we're linking in TensorFlow, I'm not too surprised they're that large. The Python TensorFlow GPU packages are ~350MB (and the Python source doesn't add that much). TF has a lot of specialized C++ templates which results in a very large binary. :-/

All the best,
-Brennan


--
You received this message because you are subscribed to the Google Groups "Swift for TensorFlow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to swift+un...@tensorflow.org.
Reply all
Reply to author
Forward
0 new messages