Over in the Fast.ai forums, I posted a quick listing of a couple of aarch64 Swift for TensorFlow toolchains (CPU-only, CUDA-enabled) I was able to build on an Nvidia Jetson Xavier and get to work on Jetson devices in general (Nano, TX2, Xavier). In case it would help others, and in case there are improvements people can suggest in my build process, I wanted to post the steps I used here.
First, I set the Jetson Xavier into its 30W, 8-core mode to enable faster compilation (described
here):
I followed the basic Swift for TensorFlow build instructions on the Jetson Xavier to get the repository and get basic packages set up.
From there, I needed to compile Bazel 0.22 for ARM64, since no binary existed for that platform. I grabbed the dist source bundle from
here. Before building, I needed to install the following packages:
sudo apt-get install build-essential openjdk-8-jdk python zip unzip
Then I could build Bazel and copy the finished binaries from output/bazel to /usr/local/bin/
I applied the patches from
Neil Jones' repository, per the instructions there. That will allow you to build a CPU-only aarch64 Swift for TensorFlow toolchain.
The Jetson devices come with CUDA 10.0 and cuDNN 7.3.1 as of their latest Jetpack OS / tool image, so they can support GPU-enabled TensorFlow operation. To make this work, you first need to add the following to /tensorflow/third_party/gpus/crosstool/CROSSTOOL.tpl:
default_toolchain {
cpu: "aarch64"
toolchain_identifier: "local_linux"
}
then within /tensorflow/third_party/gpus/crosstool/BUILD.tpl , find the section
cc_toolchain_suite(
name = "toolchain",
toolchains = {
...
},
)
and add
cc_toolchain_suite(
name = "toolchain",
toolchains = {
...
"aarch64": ":cc-compiler-local",
},
)
You then need to modify the swift/utils/build-presets.ini section that was inserted by one of Neil Jones' patches to instead read:
# Ubuntu 16.04 preset for Tensorflow on AArch64.
[preset: buildbot_linux_1604_tensorflow]
mixin-preset=buildbot_linux,no_test
enable-tensorflow-gpu
tensorflow_bazel_options=--define=tensorflow_mkldnn_contraction_kernel=0
install-tensorflow
release
Before compilation, I wanted to make sure all 8 cores of the Jetson Xavier were being used, so I added the following to ~/.bazelrc:
build --jobs 8 --local_resources 13000,8.0,1.0
Then you should be able to build using that configuration to produce a CUDA-enabled aarch64 toolchain:
./swift/utils/build-script --preset=buildbot_linux_1604_tensorflow install_destdir=/home/[USER]/swift-source/install installable_package=/home/[USER]/swift-source/install/swift-tensorflow-aarch64.tar.gz
Again, I wanted to write this down in case it was helpful for others, and so that I'll remember the steps. I did have a couple of questions after this:
This is producing some fairly massive toolchains (530 MB for the CUDA-enabled one), so am I missing steps for stripping symbols or other debug information? Is there anything I can do to produce further optimized binaries for these devices?