Announcing the release of the Subzero fast translator for PNaCl

1,240 views
Skip to first unread message

Jim Stichnoth

unread,
May 2, 2016, 9:25:36 AM5/2/16
to Native-Client-Discuss
Since its launch, Portable Native Client (PNaCl) has enabled developers to bring platform-independent native code to the Chrome browser, with near-native performance that includes multi-threading and SIMD vectorization.

The tradeoff inherent in platform-independent native code is the amount of time it takes the PNaCl in-browser translator to compile the pexe into architecture-specific native code—upwards of a minute for larger applications.  Mitigations have always been possible, such as using "optlevel":0 in the .nmf manifest or hiding the translation time behind asset download or other user interaction, but these measures only go so far.

Today, we are excited to announce the release of the Subzero fast translator.  Subzero is a PNaCl bitcode compiler designed from the ground up to be a blazingly fast translator that produces good code quality.  Our tests show that Subzero translates at about 15 times the speed of the default PNaCl translator, and generally produces code quality about 80-90% of the default translator, while also using less memory during translation.  For example, a pexe that currently takes two minutes to translate, may now take 10 seconds or less with Subzero.

Subzero is shipping for x86 platforms as of Chrome M50, and ARM support is enabled as of Chrome M51.  On these platforms, Subzero is activated when the manifest specifies "optlevel":0 (see https://developer.chrome.com/native-client/reference/nacl-manifest-format).  Subzero can also be tested in M50 for all PNaCl apps (regardless of manifest optlevel) by launching Chrome with the hidden command line flag --force-pnacl-subzero, and in M51 by setting the “Force PNaCl Subzero” option in chrome://flags .

With any new compiler, there is a chance of performance surprises, as well as outright bugs.  As such, we invite and encourage PNaCl developers to test their applications with Subzero, as we would like to get more feedback, including any bug reports.  It’s easy to test without any server-side changes—simply use Chrome’s force-pnacl-subzero flag as described above.  (NOTE: Be sure to clear the browser cache between tests to make sure you’re seeing the effect of first-time translation.)

The PNaCl Subzero Team

Dennis Kane

unread,
May 2, 2016, 11:54:13 AM5/2/16
to Native-Client-Discuss
On Monday, May 2, 2016 at 6:25:36 AM UTC-7, Jim Stichnoth wrote:
Since its launch, Portable Native Client (PNaCl) has enabled developers to bring platform-independent native code to the Chrome browser, with near-native performance that includes multi-threading and SIMD vectorization.

The tradeoff inherent in platform-independent native code is the amount of time it takes the PNaCl in-browser translator to compile the pexe into architecture-specific native code—upwards of a minute for larger applications.  Mitigations have always been possible, such as using "optlevel":0 in the .nmf manifest or hiding the translation time behind asset download or other user interaction, but these measures only go so far.

Today, we are excited to announce the release of the Subzero fast translator.  Subzero is a PNaCl bitcode compiler designed from the ground up to be a blazingly fast translator that produces good code quality.  Our tests show that Subzero translates at about 15 times the speed of the default PNaCl translator, and generally produces code quality about 80-90% of the default translator, while also using less memory during translation.  For example, a pexe that currently takes two minutes to translate, may now take 10 seconds or less with Subzero.

Subzero is shipping for x86 platforms as of Chrome M50, and ARM support is enabled as of Chrome M51.  On these platforms, Subzero is activated when the manifest specifies "optlevel":0 (see https://developer.chrome.com/native-client/reference/nacl-manifest-format). 

From that page: "If compute speed is not as important as first load speed, an application could specify an optlevel of 0."

I assume that the same tradeoff still holds true... that the actual speed of execution will still be slower.

Floh

unread,
May 2, 2016, 12:14:11 PM5/2/16
to Native-Client-Discuss
Very cool! A blog post or somesuch would be nice with details about the differences to the previous backend (e.g. why is it so fast compared to the old one, and are there any performance trade-offs?) I would have expected that most optimizations already happen offline, what type of optimization were performed in the browser during loading?

And how is this related to WebAssembly (see here: https://chromium.googlesource.com/native_client/pnacl-subzero/+/master)?

Thanks & Cheers,
-Floh.

Dennis Kane

unread,
May 2, 2016, 1:18:40 PM5/2/16
to Native-Client-Discuss
My feeling is that this is a very good thing for basic, interactive utilities like nano and vim (you can see demonstrations of these on my site at http://lotw.co).  vim can indeed take around a minute or so to translate, and since it really just sits and waits for key strokes the vast majority of the time, there does not seem to be much need to have a highly optimized version of vim sitting in the browser cache.  Of course, if you are talking about having very tight loops for things like real-time signal processing -- say, for an embedded voice-recognition system -- then you probably will always want to force the highest optimization level possible.

Jim Stichnoth

unread,
May 2, 2016, 1:54:05 PM5/2/16
to Native-Client-Discuss
On Mon, May 2, 2016 at 8:54 AM, Dennis Kane <dka...@gmail.com> wrote:
On Monday, May 2, 2016 at 6:25:36 AM UTC-7, Jim Stichnoth wrote:
Since its launch, Portable Native Client (PNaCl) has enabled developers to bring platform-independent native code to the Chrome browser, with near-native performance that includes multi-threading and SIMD vectorization.

The tradeoff inherent in platform-independent native code is the amount of time it takes the PNaCl in-browser translator to compile the pexe into architecture-specific native code—upwards of a minute for larger applications.  Mitigations have always been possible, such as using "optlevel":0 in the .nmf manifest or hiding the translation time behind asset download or other user interaction, but these measures only go so far.

Today, we are excited to announce the release of the Subzero fast translator.  Subzero is a PNaCl bitcode compiler designed from the ground up to be a blazingly fast translator that produces good code quality.  Our tests show that Subzero translates at about 15 times the speed of the default PNaCl translator, and generally produces code quality about 80-90% of the default translator, while also using less memory during translation.  For example, a pexe that currently takes two minutes to translate, may now take 10 seconds or less with Subzero.

Subzero is shipping for x86 platforms as of Chrome M50, and ARM support is enabled as of Chrome M51.  On these platforms, Subzero is activated when the manifest specifies "optlevel":0 (see https://developer.chrome.com/native-client/reference/nacl-manifest-format). 

From that page: "If compute speed is not as important as first load speed, an application could specify an optlevel of 0."

I assume that the same tradeoff still holds true... that the actual speed of execution will still be slower.
 
That is true.  Our general experience with optlevel:2 versus optlevel:0 has been that optlevel:0 translates at about 3x the speed of optlevel:2 and produces code quality about 50% that of optlevel:2.  With Subzero, we translate at about 15x the speed of optlevel:2 and produce code quality about 80-90% that of optlevel:2.

So if your application requires the utmost performance, definitely continue using optlevel:2, otherwise Subzero (optlevel:0) is a pretty good alternative. 

Dennis Kane

unread,
May 2, 2016, 2:15:56 PM5/2/16
to Native-Client-Discuss


On Monday, May 2, 2016 at 10:54:05 AM UTC-7, Jim Stichnoth wrote:

That is true.  Our general experience with optlevel:2 versus optlevel:0 has been that optlevel:0 translates at about 3x the speed of optlevel:2 and produces code quality about 50% that of optlevel:2.  With Subzero, we translate at about 15x the speed of optlevel:2 and produce code quality about 80-90% that of optlevel:2.

So if your application requires the utmost performance, definitely continue using optlevel:2, otherwise Subzero (optlevel:0) is a pretty good alternative. 


I think the thing about all of this that initially tripped me up was the phrase "code quality".  But I assume that this is really just a rewording of concept of "speed of execution".  My first reaction was that "low code quality" is somehow a VERY BAD THING, whereas simply having "slower, less optimized code" is just an inconvenience.

Jim Stichnoth

unread,
May 2, 2016, 4:03:50 PM5/2/16
to Native-Client-Discuss
On Mon, May 2, 2016 at 9:14 AM, Floh <flo...@gmail.com> wrote:
Very cool! A blog post or somesuch would be nice with details about the differences to the previous backend (e.g. why is it so fast compared to the old one, and are there any performance trade-offs?) I would have expected that most optimizations already happen offline, what type of optimization were performed in the browser during loading?

Just to briefly answer the specific questions here...  While LLVM is an awesome compiler toolchain, its code generator is just "too slow" with no easy fix.  See for example the LLVM B3 JIT (https://webkit.org/blog/5852/introducing-the-b3-jit-compiler/) where they came to basically the same conclusion.

You're right that we try to arrange for most of the optimizations to be done in the developer-side PNaCl toolchain.  As such, the most important task left to the translator is register allocation.  The second most important is to take advantage of the available address modes.  Most of the Subzero passes are aimed at improving register allocation.

There are bound to be isolated exceptions, but on the whole, Subzero is uniformly and substantially better than the original optlevel:0 translator, both in translation speed and resulting code quality.
 

And how is this related to WebAssembly (see here: https://chromium.googlesource.com/native_client/pnacl-subzero/+/master)?

We recently started trying to hook up a WebAssembly parser into Subzero as a research tool, not strictly related to PNaCl.  The idea is to try to create equivalent pexe and wasm applications and compare Subzero performance on the two versions.  This might give some insight about whether wasm structural changes or additional developer-size optimizations could help improve the speed or code quality of a wasm code generator.

ad...@tftlabs.com

unread,
May 3, 2016, 2:47:22 AM5/3/16
to Native-Client-Discuss
When saying "ARM support is enabled", you're talking about Chrome OS, right ?
What about Native Client on Android some day ?

Jim Stichnoth

unread,
May 3, 2016, 9:31:22 AM5/3/16
to Native-Client-Discuss
On Mon, May 2, 2016 at 11:47 PM, <ad...@tftlabs.com> wrote:
When saying "ARM support is enabled", you're talking about Chrome OS, right ?

ARM support includes ARM ChromeOS devices, and it also includes Chromium builds for 32-bit ARM.  For example, we tested on a Raspberry Pi 2 system, and Subzero is especially fast and impressive there.
 
What about Native Client on Android some day ?

Unfortunately, that is still not on the roadmap, for the same technical reasons that have been there all along.
Reply all
Reply to author
Forward
0 new messages