Let's optimize zlib for ARM

666 views
Skip to first unread message

Adenilson Cavalcanti

unread,
Aug 14, 2017, 3:40:30 PM8/14/17
to blink-dev
Zlib is a compression library used by Chromium code base and its dependencies (skia, libpng, pdfium, freetype, etc) for quite a few tasks ranging from image handling, loading extensions and accessing compressed content (i.e. Content-Encoding: gzip).

It is an impressive feat of engineering considering that it is 22 years old and it is used all over the place (e.g. linux kernel). Due to its history and the need to support long gone compilers and operating systems, its main focus has being portability than performance.

Since Chromium developers and users care about performance, it makes sense to have optimizations in the zlib used by Chromium. Since 2014 Chromium's zlib features Intel specific optimizations (e.g. optimized CRC, optimized hash function and fill_window), but up to this days it still has no optimizations targeting ARM processors used in mobile devices.

At January this year I noticed this issue and started working to address it, with the initial focus towards optimizing PNG image decoding.

Since maintaining a forked zlib isn't ideal, I performed some research about zlib alternatives (https://goo.gl/ZUoy96) and tried to upstream with some degree of success the ARM specific optimizations (zlib-ng accepted the patches, canonical zlib still haven't reviewed them yet after 4 months). The initial golden goal was to achieve a scenario where Chromium wouldn't need to keep a forked zlib.

One first obstacle that had to be solved was the presence of multiple copies of zlib in Chromium code base (e.g. PDFium had its own zlib with patches applied on top of it). Fortunately PDFium has migrated to use chromium's zlib (i.e. third_party/zlib) and this is no longer an issue.

That being said, given that zlib-ng hasn't yet made an official release and its security status are unknown (i.e. new bugs?), it seems a bit too risky to migrate Chromium to it. On the other hand, canonical zlib doesn't seem interested in neither performance/security patches.

Until these external factors change (so we can revisit the issue), my intent is to land the ARM specific optimizations in zlib.

The patches are:
a) NEON implementation of Adler32 checksum:
https://chromium-review.googlesource.com/c/611492

It should be about 2 to 3x times faster than the C implementation featured in zlib today. This should help on PNG image decoding.

b) Using the ARMv8 CRC32 instruction:
https://chromium-review.googlesource.com/c/612629

Should be between 6x to 10x faster. This one should both help with image decoding as also other areas (e.g. gzipped content).

Since not all ARMv8 SoCs feature this instruction, I would love to hear from people who are familiar with how the Android apk is generated and distributed how we could enable the feature (i.e. this option has to be enabled at build time). Devices like Nexus 5x and Google Pixel will benefit from this change as they have the CRC32 instruction.

There are still other areas in zlib that we can optimize for ARM with good potential performance gains (e.g. fill_window, etc).

Best regards


Adenilson Cavalcanti

Torne (Richard Coles)

unread,
Aug 14, 2017, 5:24:47 PM8/14/17
to Adenilson Cavalcanti, blink-dev
On Mon, 14 Aug 2017 at 15:40 Adenilson Cavalcanti <caval...@chromium.org> wrote:
Since not all ARMv8 SoCs feature this instruction, I would love to hear from people who are familiar with how the Android apk is generated and distributed how we could enable the feature (i.e. this option has to be enabled at build time). Devices like Nexus 5x and Google Pixel will benefit from this change as they have the CRC32 instruction.

We don't ship 64-bit Chrome binaries to the Android stable/beta channels at all, so no matter what, ARMv8-specific code will only be used by dev/canary builds (on pre-N OS versions; on N and up the shipping configuration runs chrome in 32-bit for all channels), and by 64-bit apps running inside WebView. So, this type of optimisation right now has very limited scope even if you could easily ship it.

However: you also can't easily ship it if it's just a build time flag and the instruction isn't available on all ARMv8 chips. We don't ship different binaries to different CPU variants. You would have to do what we used to do with NEON, and have both versions of the code compiled into the binary (which means that you can't use a global compiler flag; it must be translation-unit-specific or only used from assembly), and then runtime-detect whether the CPU supports it or not.

caval...@chromium.org

unread,
Aug 14, 2017, 5:51:34 PM8/14/17
to blink-dev
Richard

Thanks for the feedback, I will comment inline.
 
We don't ship 64-bit Chrome binaries to the Android stable/beta channels at all, so no matter what, ARMv8-specific code will only be used by dev/canary builds (on pre-N OS versions; on N and up the shipping configuration runs chrome in 32-bit for all channels), and by 64-bit apps running inside WebView. So, this type of optimisation right now has very limited scope even if you could easily ship it.

The CRC32 instruction is available optionally in ARMV8-A and IIRC is mandatory for ARMv8-1.

You can use it in an app both running in 32bits mode or 64bits (AArch64). Provided the SoC supports it, doesn't matter if Chrome is a 64bit or 32bits binary.

Last time I checked, all flagship devices (Samsung/LG/etc) supported this instruction and even cheap but new SoCs should feature this instruction.

Also, this is only one of the patches optimizing zlib, we have more coming.
:-)
 

However: you also can't easily ship it if it's just a build time flag and the instruction isn't available on all ARMv8 chips. We don't ship different binaries to different CPU variants. You would have to do what we used to do with NEON, and have both versions of the code compiled into the binary (which means that you can't use a global compiler flag; it must be translation-unit-specific or only used from assembly), and then runtime-detect whether the CPU supports it or not.

That is a shame, but if I understood correctly, runtime detection would be ok?

Nico Weber

unread,
Aug 14, 2017, 6:06:45 PM8/14/17
to Adenilson Cavalcanti, blink-dev
FYI, there's some potentially other zlib-related work at https://chromium-review.googlesource.com/c/601694

I hear zlib upstream does accept pull requests sometimes, it just takes a while for them to see movement at times.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAKx6XHPtwuq%2B3kSMM5tMt5uZZEEnwpnYKbLRb%2B0t5L45bRDG1w%40mail.gmail.com.

caval...@chromium.org

unread,
Aug 14, 2017, 6:53:59 PM8/14/17
to blink-dev
Nico

Thanks for pointing it out (https://chromium-review.googlesource.com/c/601694). Inflate fast is one of the candidates for optimization (needs to be rebased with current master):
https://codereview.chromium.org/2722063002/

Concerning canonical zlib, I tried pretty much everything:
https://github.com/madler/zlib/issues/216
https://github.com/madler/zlib/pull/251

I can see the issue is not exclusive for performance patches, but also for security:
https://github.com/madler/zlib/issues/245

Reply all
Reply to author
Forward
0 new messages