We now have a new experimental Swift for TensorFlow toolchain that targets the Nvidia Jetson single-board computers. It can be downloaded
here, and requires JetPack 4.4 (CUDA 10.2) to be installed on the Jetson device. It aligns roughly with our latest 0.11 toolchain release. This toolchain includes full CUDA support for all Jetson devices, including for the first time X10 running on ARM64 hardware.
For those who might not be familiar with
the Jetson series of devices, they are Nvidia's single-board computers that combine an ARM64 processor with a CUDA-capable GPU in a relatively inexpensive and low-power package. The Jetson Nano is only $99, and with the above-linked toolchain you can now do GPU-accelerated training or inference of Swift for TensorFlow models on that computer (as well as other Swift differentiable programming). The big limitation for these boards is memory, with the Jetson Nano having only 4 GB of RAM and with that being shared between GPU and CPU. The new support for X10's XLA JIT compilation on these devices helps to squeeze models into available memory, but I'm only just starting to explore what works and what doesn't there.
For comparison, here are some initial benchmarks using the LeNet-MNIST example, which you can run yourself from tensorflow/swift-models via
swift run -c release Benchmarks --filter LeNetMNIST
Jetson Nano in the 5W power setting:
name startup_time iterations exp_per_second
--------------------------------------------------------------
LeNetMNIST.inference 6.839 s 329 31503.989 /s
LeNetMNIST.inference_x10 0.682 s 307 34961.313 /s
LeNetMNIST.training 3.050 s 2 251.163 /s
LeNetMNIST.training_x10 16.537 s 6 690.047 /s
Jetson Nano in the MAXN power setting:
name startup_time iterations exp_per_second
--------------------------------------------------------------
LeNetMNIST.inference 8.380 s 375 36048.707 /s
LeNetMNIST.inference_x10 0.453 s 791 75658.721 /s
LeNetMNIST.training 2.623 s 10 1122.343 /s
LeNetMNIST.training_x10 12.096 s 12 1428.267 /s
Jetson Xavier in the 15W power setting:
name startup_time iterations exp_per_second
--------------------------------------------------------------
LeNetMNIST.inference 4.729 s 397 40392.597 /s
LeNetMNIST.inference_x10 0.652 s 367 32665.995 /s
LeNetMNIST.training 1.138 s 17 1929.219 /s
LeNetMNIST.training_x10 7.943 s 45 4704.442 /s
Jetson Xavier in the 30W (all cores) power setting:
name startup_time iterations exp_per_second
--------------------------------------------------------------
LeNetMNIST.inference 4.787 s 492 46484.627 /s
LeNetMNIST.inference_x10 0.560 s 529 47347.466 /s
LeNetMNIST.training 1.126 s 18 2041.205 /s
LeNetMNIST.training_x10 7.859 s 36 3845.427 /s
Finally, as a reference, an i7 desktop with a GTX 1080 (which uses ~300W during training):
name startup_time iterations exp_per_second
--------------------------------------------------------------
LeNetMNIST.inference 0.659 s 3396 309180.435 /s
LeNetMNIST.inference_x10 0.133 s 3024 279131.259 /s
LeNetMNIST.training 0.259 s 86 10759.409 /s
LeNetMNIST.training_x10 1.455 s 279 30163.080 /s
This benchmark isn't the best for showing off GPU utilization or XLA, because of the tiny model and images. I just wanted a quick benchmark that would fit in memory on all devices.
I apologize for the delay in getting new toolchains out for this platform (it's been almost a year since the last build, and that wasn't particularly stable). Advances in the toolchain and the JetPack images have made building this a lot easier and more stable, so I'm hoping that with this new build system in place we should be able to do new builds regularly if there's continued interest in the platform. Let us know if you'd like for us to do these alongside our numbered stable toolchain releases.
Also, I've seen some possible issues with certain models on the Nano and I'm not sure if they're simply due to available memory being exceeded or problems with certain operators. Consider this toolchain fairly experimental. If you notice specific consistent problems that are present with models using this toolchain, let us know and we can see if there are any common elements to them.