New Swift for TensorFlow toolchain that supports Nvidia Jetson devices

Brad Larson

unread,

Sep 3, 2020, 12:39:12 PM9/3/20

to Swift for TensorFlow

We now have a new experimental Swift for TensorFlow toolchain that targets the Nvidia Jetson single-board computers. It can be downloaded here, and requires JetPack 4.4 (CUDA 10.2) to be installed on the Jetson device. It aligns roughly with our latest 0.11 toolchain release. This toolchain includes full CUDA support for all Jetson devices, including for the first time X10 running on ARM64 hardware.

For those who might not be familiar with the Jetson series of devices, they are Nvidia's single-board computers that combine an ARM64 processor with a CUDA-capable GPU in a relatively inexpensive and low-power package. The Jetson Nano is only $99, and with the above-linked toolchain you can now do GPU-accelerated training or inference of Swift for TensorFlow models on that computer (as well as other Swift differentiable programming). The big limitation for these boards is memory, with the Jetson Nano having only 4 GB of RAM and with that being shared between GPU and CPU. The new support for X10's XLA JIT compilation on these devices helps to squeeze models into available memory, but I'm only just starting to explore what works and what doesn't there.

For comparison, here are some initial benchmarks using the LeNet-MNIST example, which you can run yourself from tensorflow/swift-models via

swift run -c release Benchmarks --filter LeNetMNIST

Jetson Nano in the 5W power setting:

name                    startup_time iterations exp_per_second
--------------------------------------------------------------
LeNetMNIST.inference         6.839 s        329   31503.989 /s
LeNetMNIST.inference_x10     0.682 s        307   34961.313 /s
LeNetMNIST.training          3.050 s          2     251.163 /s
LeNetMNIST.training_x10     16.537 s          6     690.047 /s

Jetson Nano in the MAXN power setting:

name                    startup_time iterations exp_per_second
--------------------------------------------------------------
LeNetMNIST.inference         8.380 s        375   36048.707 /s
LeNetMNIST.inference_x10     0.453 s        791   75658.721 /s
LeNetMNIST.training          2.623 s         10    1122.343 /s
LeNetMNIST.training_x10     12.096 s         12    1428.267 /s

Jetson Xavier in the 15W power setting:

name                    startup_time iterations exp_per_second
--------------------------------------------------------------
LeNetMNIST.inference         4.729 s        397   40392.597 /s
LeNetMNIST.inference_x10     0.652 s        367   32665.995 /s
LeNetMNIST.training          1.138 s         17    1929.219 /s
LeNetMNIST.training_x10      7.943 s         45    4704.442 /s

Jetson Xavier in the 30W (all cores) power setting:

name                    startup_time iterations exp_per_second
--------------------------------------------------------------
LeNetMNIST.inference         4.787 s        492   46484.627 /s
LeNetMNIST.inference_x10     0.560 s        529   47347.466 /s
LeNetMNIST.training          1.126 s         18    2041.205 /s
LeNetMNIST.training_x10      7.859 s         36    3845.427 /s

Finally, as a reference, an i7 desktop with a GTX 1080 (which uses ~300W during training):

name                    startup_time iterations exp_per_second
--------------------------------------------------------------
LeNetMNIST.inference         0.659 s       3396  309180.435 /s
LeNetMNIST.inference_x10     0.133 s       3024  279131.259 /s
LeNetMNIST.training          0.259 s         86   10759.409 /s
LeNetMNIST.training_x10      1.455 s        279   30163.080 /s

This benchmark isn't the best for showing off GPU utilization or XLA, because of the tiny model and images. I just wanted a quick benchmark that would fit in memory on all devices.

I apologize for the delay in getting new toolchains out for this platform (it's been almost a year since the last build, and that wasn't particularly stable). Advances in the toolchain and the JetPack images have made building this a lot easier and more stable, so I'm hoping that with this new build system in place we should be able to do new builds regularly if there's continued interest in the platform. Let us know if you'd like for us to do these alongside our numbered stable toolchain releases.

Also, I've seen some possible issues with certain models on the Nano and I'm not sure if they're simply due to available memory being exceeded or problems with certain operators. Consider this toolchain fairly experimental. If you notice specific consistent problems that are present with models using this toolchain, let us know and we can see if there are any common elements to them.

Fan Jiang

unread,

Sep 3, 2020, 1:37:31 PM9/3/20

to Swift for TensorFlow, bradl...@google.com

On my way to purchase a Jetson Nano. So excited!

Joaquin Alori

unread,

Sep 3, 2020, 5:17:05 PM9/3/20

to Swift for TensorFlow, Fan Jiang, bradl...@google.com

I'll probably be running some tests with pose estimation using this on a jetson nano soon, I'll post back results. This is great!

Brad Larson

unread,

Sep 3, 2020, 5:26:46 PM9/3/20

to Swift for TensorFlow

One quick thing I've found that can help with the Jetson Nano's constrained memory is to set it to boot without the window manager. This can be done by running the command

sudo systemctl set-default multi-user.target

and rebooting. To re-enable the window manager, use

sudo systemctl set-default graphical.target

Running without the window manager saves ~1.7 GB of active memory (only ~300 MB are used at idle when you boot without the window manager) and seems to provide up to a 5% improvement in training throughput.

Frank Dellaert

unread,

Sep 3, 2020, 5:27:33 PM9/3/20

to Swift for TensorFlow, joaqui...@gmail.com, Fan Jiang, bradl...@google.com

Very cool! Wondering whether I should get a Xavier NX as a sidekick Linux/CUDA system, alongside my Mac. :-)

Porter Child

unread,

Sep 5, 2020, 6:56:13 PM9/5/20

to Swift for TensorFlow, bradl...@google.com

Exciting! At PassiveLogic we've been building an embedded platform for autonomous buildings (partly leveraging S4TF), and we've decided to target Jetson devices, so these builds will be great to have!

Frank Dellaert

unread,

Nov 7, 2020, 7:41:07 PM11/7/20

to Swift for TensorFlow, pchi...@gmail.com, bradl...@google.com

I can confirm it works on Xavier NX :-)

Rahul

unread,

Nov 8, 2020, 3:31:49 AM11/8/20

to Frank Dellaert, Swift for TensorFlow, pchi...@gmail.com, bradl...@google.com

Hi Team,

Any updates on when version 0.12 RC will be available for Xcode? I want to try Bidirectional RNNs.

Regards

Rahul Bhalley

--
To unsubscribe from this group and stop receiving emails from it, send an email to swift+un...@tensorflow.org.

Reply all

Reply to author

Forward