Checking for CRC32C instructions on ARM64?

914 views
Skip to first unread message

Victor Costan

unread,
Aug 26, 2017, 6:48:19 PM8/26/17
to Chromium-dev
Dear Chromium developers,

TL;DR: What's the strategy behind using ARM64's CRC32C instructions in Chromium for Android? (Skia appears to be using them on Android without checking for their support, so we seem to assume that all ARM64 chips support them.)

I'm trying to optimize the code we use to compute CRC32C checksums to take advantage of ARM64's hardware-accelerated instructions. I'm following the approach in a LevelDB patch submitted by an ARM engineer.

The relevant part about the patch is that it gates the use of CRC32C instructions by a getauxval(AT_HWCAP) runtime check, which requires the <sys/auxv.h> header. I was able to build and run this code on Android in a standalone repository, using the latest Android NDK (r15c). However, I was not able to build the code in Chromium, as the <sys/auxv.h> header appears.

While looking for CRC32C consumers in Chrome, I found out that Skia uses ARM64's CRC32C accelerated instructions for its own hashing function. Assuming I understand the code correctly, Skia uses the instructions without any runtime check.

Does this mean I should follow suit and unconditionally use CRC32C instructions on ARM64 builds?

Here are the code fragments that led me to my conclusion:


Thank you in advance for your help!
    Victor

caval...@chromium.org

unread,
Aug 27, 2017, 3:30:04 AM8/27/17
to Chromium-dev
Hey Victor

It is nice to see more people looking for optimizations on ARM. 
:-)

Since it is the dominant embedded/mobile platform, it makes sense to improve Chromium's performance on it. I will comment inline.
 
TL;DR: What's the strategy behind using ARM64's CRC32C instructions in Chromium for Android? (Skia appears to be using them on Android without checking for their support, so we seem to assume that all ARM64 chips support them.)


This is something I would like to know too.

From my post on blink-dev  (https://goo.gl/pDGXHL), I got the understanding that Chrome apk distributed through the Google Play Store is an armv7 build (and chrome://version has "Official 32-bit build"). Anyone could confirm this?

Most of the flagship devices today have an ARMv8 SoC (Google pixel, Galaxy S8, LG G6, etc). Even an old Nexus 5x got it and newer and cheap devices will have it too (devices with an ARM Cortex A53 e.g. Nokia 6 and 3).

Which poses the question: anyone ever considered distributing an optimized build for those devices? (e.g. -march=armv8-a)? I don't have numbers but it is not hard to see some possible performance benefits.


The relevant part about the patch is that it gates the use of CRC32C instructions by a getauxval(AT_HWCAP) runtime check, which requires the <sys/auxv.h> header. I was able to build and run this code on Android in a standalone repository, using the latest Android NDK (r15c). However, I was not able to build the code in Chromium, as the <sys/auxv.h> header appears.


This seems to point to proper support for the syscall in the latest NDK for Android. What I'm unsure is if the toolchain used by chromium for android is the latest?
 
While looking for CRC32C consumers in Chrome, I found out that Skia uses ARM64's CRC32C accelerated instructions for its own hashing function. Assuming I understand the code correctly, Skia uses the instructions without any runtime check.

 
That is interesting, I did some investigation about hash functions (https://bugs.chromium.org/p/chromium/issues/detail?id=735674#c8) and using the crc32 instruction was indeed faster on ARM, even though it could have a bit more collisions than other hashes (e.g. cityhash, highway hash, etc) for the specific test case I studied (i.e. ShapeCache & HashMap).

A runtime check should be performed, as the instruction is optional on ARMv8-a and mandatory on ARMv8.1 (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0801g/awi1476352818103.html). As an example, IIRC the first iphone featuring an ARMv8 SoC didn't have the crc32 instruction (unsure if they now have it).
 
Does this mean I should follow suit and unconditionally use CRC32C instructions on ARM64 builds?


Nopes, as explained before.

Another important detail is that you can use the instruction even on 32bits mode (as long the SoC supports it). As an example, this was fixed by a teammate@ARM in Skia (https://skia-review.googlesource.com/c/skia/+/15480).

Concerning runtime detection: assuming that it is possible to do the syscall, the same approach could be used as in the aforementioned LevelDB patch. On the other hand, what happens if the detection has to be done in a less privileged level (e.g. inside of the RendererProcess) for doing image decoding by a dependency (i.e. libpng uses zlib for decompressing IDAT segments)? Which, by the way, is a case that I'm interested: https://chromium-review.googlesource.com/c/chromium/src/+/612629

One alternative to the syscall (if that is indeed a limitation for the Chromium case), would be just to check /proc/cpuinfo as the information should be there. For the device I'm using it returns:
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpae evtstrm aes pmull sha1 sha2 crc32

I can imagine that maybe we could hook into b
ase/cpu.cc and worst case have an IPC call from the Renderer to the Browser process. Any thoughts? Cheers Adenilson

David Benjamin

unread,
Aug 27, 2017, 4:34:58 PM8/27/17
to caval...@chromium.org, Chromium-dev
Runtime detection of ARM features is a little hairy. This is how BoringSSL detects ARMv8 features:

On AArch64, it's relatively straightforward, though you do need to grab the constants from somewhere. On AArch32, it's a mess due to various versions of Android to work around. If you only care about ARMv8 features, it's mostly fine, but you may care about this mess:
(I'll see about getting metrics for whether it's still an issue.)

This indeed can't all be done in the sandbox, so we make sure it's all computed before entering it. I want to say getauxval is actually fine (?), but any workarounds with /proc won't work.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/eba5997b-2cc6-4572-833f-6596ab3b4b25%40chromium.org.

David Turner

unread,
Aug 27, 2017, 7:38:42 PM8/27/17
to caval...@chromium.org, Chromium-dev
On Sun, Aug 27, 2017 at 9:30 AM, <caval...@chromium.org> wrote:
Hey Victor

It is nice to see more people looking for optimizations on ARM. 
:-)

Since it is the dominant embedded/mobile platform, it makes sense to improve Chromium's performance on it. I will comment inline.
 
TL;DR: What's the strategy behind using ARM64's CRC32C instructions in Chromium for Android? (Skia appears to be using them on Android without checking for their support, so we seem to assume that all ARM64 chips support them.)


This is something I would like to know too.

From my post on blink-dev  (https://goo.gl/pDGXHL), I got the understanding that Chrome apk distributed through the Google Play Store is an armv7 build (and chrome://version has "Official 32-bit build"). Anyone could confirm this?

Most of the flagship devices today have an ARMv8 SoC (Google pixel, Galaxy S8, LG G6, etc). Even an old Nexus 5x got it and newer and cheap devices will have it too (devices with an ARM Cortex A53 e.g. Nokia 6 and 3).

Which poses the question: anyone ever considered distributing an optimized build for those devices? (e.g. -march=armv8-a)? I don't have numbers but it is not hard to see some possible performance benefits.

I confirm that only a 32-bit version is officially released. This is mainly to save on disk size, and much more importantly in RAM usage.
IIRC, we also build with -Os instead of -O2, and disable certain RAM-hungry features. We also limit the number of locales we support due to disk size.
There is also many people working on how to reduce things even further (which is of course getting harder and harder).

And generally speaking, adding a new official build variant is very painful, so slight performance benefits might not be sufficient to justify increased RAM usage / storage size.
I'm not saying it will never happen, just that it is a very hard sell, and priorities are currently elsewhere :)


The relevant part about the patch is that it gates the use of CRC32C instructions by a getauxval(AT_HWCAP) runtime check, which requires the <sys/auxv.h> header. I was able to build and run this code on Android in a standalone repository, using the latest Android NDK (r15c). However, I was not able to build the code in Chromium, as the <sys/auxv.h> header appears.


This seems to point to proper support for the syscall in the latest NDK for Android. What I'm unsure is if the toolchain used by chromium for android is the latest?
 
While looking for CRC32C consumers in Chrome, I found out that Skia uses ARM64's CRC32C accelerated instructions for its own hashing function. Assuming I understand the code correctly, Skia uses the instructions without any runtime check.

 
That is interesting, I did some investigation about hash functions (https://bugs.chromium.org/p/chromium/issues/detail?id=735674#c8) and using the crc32 instruction was indeed faster on ARM, even though it could have a bit more collisions than other hashes (e.g. cityhash, highway hash, etc) for the specific test case I studied (i.e. ShapeCache & HashMap).

A runtime check should be performed, as the instruction is optional on ARMv8-a and mandatory on ARMv8.1 (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0801g/awi1476352818103.html). As an example, IIRC the first iphone featuring an ARMv8 SoC didn't have the crc32 instruction (unsure if they now have it).
 
Does this mean I should follow suit and unconditionally use CRC32C instructions on ARM64 builds?


Nopes, as explained before.

Another important detail is that you can use the instruction even on 32bits mode (as long the SoC supports it). As an example, this was fixed by a teammate@ARM in Skia (https://skia-review.googlesource.com/c/skia/+/15480).

Concerning runtime detection: assuming that it is possible to do the syscall, the same approach could be used as in the aforementioned LevelDB patch. On the other hand, what happens if the detection has to be done in a less privileged level (e.g. inside of the RendererProcess) for doing image decoding by a dependency (i.e. libpng uses zlib for decompressing IDAT segments)? Which, by the way, is a case that I'm interested: https://chromium-review.googlesource.com/c/chromium/src/+/612629
One alternative to the syscall (if that is indeed a limitation for the Chromium case), would be just to check /proc/cpuinfo as the information should be there. For the device I'm using it returns:
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpae evtstrm aes pmull sha1 sha2 crc32

I can imagine that maybe we could hook into b
ase/cpu.cc and worst case have an IPC call from the Renderer to the Browser process. Any thoughts?

getaux() doesn't use a syscall, but information provided by the kernel to the C library at process startup. It can be used in a renderer process.
It is always available on Android/arm64, but that is not the case on Android/arm32 (only available since Android M, IIRC). Reading /proc/ will not work in renderer processes on certain devices, due to different kernel + SELinux configurations.

The NDK comes with a handy companion library called cpu-features that will do all the work for you though. See this for example.

 For Chrome on Android, we get the CPU features mask in the browser process, and pass them to the renderer process to ensure they get the right value on all systems.

I assume the Skia code that assumes a 32-bit CRC32 instruction is only built for ChromiumOS. That would explain the lack of runtime probing.

Hope this helps,

- Digit

Cheers Adenilson

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

Victor Costan

unread,
Aug 28, 2017, 3:24:05 AM8/28/17
to davi...@chromium.org, caval...@chromium.org, Chromium-dev
Thank you very much for the very helpful answer, David! Learning about BoringSSL's approach gave me the background I needed to make an informed decision.

For anyone else interested -- I adopted the weakly-linked getauxval solution. This keeps the code small (no parsing), and I won't have to worry about renderer permissions. If https://developer.android.com/about/dashboards/index.html is correct, it seems like I'm missing out on 20-25% of the users. That seems acceptable in my case, as the baseline is that nobody gets any optimization.

    Victor


To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.

Victor Costan

unread,
Aug 28, 2017, 3:31:32 AM8/28/17
to di...@google.com, caval...@chromium.org, Chromium-dev
Thank you very much for the extra context, Digit!

On Sun, Aug 27, 2017 at 4:37 PM, David Turner <di...@chromium.org> wrote:
getaux() doesn't use a syscall, but information provided by the kernel to the C library at process startup. It can be used in a renderer process.
It is always available on Android/arm64, but that is not the case on Android/arm32 (only available since Android M, IIRC). Reading /proc/ will not work in renderer processes on certain devices, due to different kernel + SELinux configurations.

It seems like the weakly linked getauxval() is the right solution for my specific issue, as the code I have is only targeting ARM64. It's nice when the right answer is the alternative with less code! :D

    Victor

PhistucK

unread,
Aug 28, 2017, 4:07:54 AM8/28/17
to Victor Costan, David Benjamin, caval...@chromium.org, Chromium-dev
 it seems like I'm missing out on 20-25% of the users.
More like ~50%, because Lollipop does not have it, right (s
till something)?


PhistucK

Victor Costan

unread,
Aug 28, 2017, 6:16:46 AM8/28/17
to Adenilson Cavalcanti, Chromium-dev
Hi, Adenilson!

On Sun, Aug 27, 2017 at 12:30 AM, <caval...@chromium.org> wrote:
Most of the flagship devices today have an ARMv8 SoC (Google pixel, Galaxy S8, LG G6, etc). Even an old Nexus 5x got it and newer and cheap devices will have it too (devices with an ARM Cortex A53 e.g. Nokia 6 and 3).
 
A runtime check should be performed, as the instruction is optional on ARMv8-a and mandatory on ARMv8.1 (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0801g/awi1476352818103.html). As an example, IIRC the first iphone featuring an ARMv8 SoC didn't have the crc32 instruction (unsure if they now have it).

Thank you very much for the background!
 
Another important detail is that you can use the instruction even on 32bits mode (as long the SoC supports it). As an example, this was fixed by a teammate@ARM in Skia (https://skia-review.googlesource.com/c/skia/+/15480).

This is particularly useful, as it seems like we don't distribute a ARM64 build on Android.

Do you happen to know if the LevelDB patch would work on a 32-bit ARM? If so, would you happen to know what --march= flag should be used for the intrinsics used by the patch (crc32c{b,h,w,d} and vmull_p64)?

Also, in general, thank you very much for your help on improving Chromium's performance on ARM! As an Android phone owner, I am really grateful to you and your colleagues!
    Victor

mtk...@chromium.org

unread,
Aug 28, 2017, 8:14:18 AM8/28/17
to Chromium-dev
Skia does use CRC32 instructions, but only after checking for runtime support.  Here's how we do it (from SkCpu.cpp):

#elif defined(SK_CPU_ARM64) && __has_include(<sys/auxv.h>)
    #include <sys/auxv.h>

    static uint32_t read_cpu_features() {
        const uint32_t kHWCAP_CRC32 = (1<<7);

        uint32_t features = 0;
        uint32_t hwcaps = getauxval(AT_HWCAP);
        if (hwcaps & kHWCAP_CRC32) { features |= SkCpu::CRC32; }
        return features;
    }

It's a pretty bad idea to use them without checking.  Notably, no iDevices have CRC32 as far as I know.  Frustratingly, Apple's Clang #defines the guard that indicates they do by default!  We're forced to ignore them:

// Really this __APPLE__ check shouldn't be necessary, but it seems that Apple's Clang defines
// __ARM_FEATURE_CRC32 for -arch arm64, even though their chips don't support those instructions!
#if defined(__ARM_FEATURE_CRC32) && !defined(__APPLE__)
    #define SK_ARM_HAS_CRC32
#endif

Nico Weber

unread,
Aug 28, 2017, 11:12:47 AM8/28/17
to Mike, Chromium-dev
On Mon, Aug 28, 2017 at 8:14 AM, <mtk...@chromium.org> wrote:
Skia does use CRC32 instructions, but only after checking for runtime support.  Here's how we do it (from SkCpu.cpp):

#elif defined(SK_CPU_ARM64) && __has_include(<sys/auxv.h>)
    #include <sys/auxv.h>

    static uint32_t read_cpu_features() {
        const uint32_t kHWCAP_CRC32 = (1<<7);

        uint32_t features = 0;
        uint32_t hwcaps = getauxval(AT_HWCAP);
        if (hwcaps & kHWCAP_CRC32) { features |= SkCpu::CRC32; }
        return features;
    }

It's a pretty bad idea to use them without checking.  Notably, no iDevices have CRC32 as far as I know.  Frustratingly, Apple's Clang #defines the guard that indicates they do by default!  We're forced to ignore them:

// Really this __APPLE__ check shouldn't be necessary, but it seems that Apple's Clang defines
// __ARM_FEATURE_CRC32 for -arch arm64, even though their chips don't support those instructions!
#if defined(__ARM_FEATURE_CRC32) && !defined(__APPLE__)
    #define SK_ARM_HAS_CRC32
#endif

Word on the street is that this got fixed in http://llvm.org/viewvc/llvm-project?view=revision&revision=302077 and Xcode 9 should have this fix at least. So in a year or so this might no longer be needed. (But sounds like for now, it is :-/)
 

On Saturday, August 26, 2017 at 6:48:19 PM UTC-4, Victor Costan wrote:

While looking for CRC32C consumers in Chrome, I found out that Skia uses ARM64's CRC32C accelerated instructions for its own hashing function. Assuming I understand the code correctly, Skia uses the instructions without any runtime check.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.

David Benjamin

unread,
Aug 28, 2017, 11:53:55 AM8/28/17
to Victor Costan, caval...@chromium.org, Chromium-dev
You're probably missing out on less than that, FWIW. My understanding is that you can assume that devices with ARMv8 instructions are new enough to have getauxval. It's just that, for AArch32 specifically, some kernel configurations are missing the AT_HWCAP2 entry. getauxval works, but getauxval(AT_HWCAP2) returns zero when it shouldn't. The two instances in https://crbug.com/boringssl/46 were fixed in an update a while ago, so maybe it's not needed anymore?

I've filed https://crbug.com/boringssl/203 to measure this. Losing the workaround would be great.

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.

caval...@chromium.org

unread,
Aug 28, 2017, 1:49:48 PM8/28/17
to Chromium-dev, pwn...@chromium.org, caval...@chromium.org
@David: thanks a lot for the explanation and the example provided is quite helpful.

It is quite interesting to know what are the primary concerns (e.g. binary size) for the Chrome apk.

Assuming that we were able to keep the same apk/binary size (assuming: target_os = "android", target_cpu = "arm", arm_version = 8, arm_arch = "armv8-a+crc"), what would be the threshold (% speedup) to justify the effort in *future* to provide a specialized build for new mobile devices?

Using the CRC32 specific instruction on ARMv8 yielded a boost of about 7% in the time for decoding PNGs. I haven't measured it yet, but it should also help with loading gzipped webpages (e.g. google, gmail, gcalendar, engadget, etc).

@Victor: Thanks for the kind words, I feel everyone wants a better and faster Chrome on their mobile devices.

Please see the comments inline:


>Do you happen to know if the LevelDB patch would work on a 32-bit ARM?

I'm unfamiliar about the details of that specific patch (besides the fact that is a hybrid approach using NEON instructions for loading data and the scalar crc32 instruction), I would have to have a look on it and understand if there is anything there that could rely on AArch64 behavior (e.g. support for unaligned memory access). At a quick and first glimpse, it looks fine (I can also ask my colleague about it).

That being said, I personally tested the crc32 instruction in Chromium in both 32bits and 64bits mode in Chromium running in a Google Pixel (Qualcomm Snapdragon 820) and it works fine (as in https://chromium-review.googlesource.com/c/chromium/src/+/612629).

So I don't see that as a major problem. I can give the LevelDB patch a try and report back to you.


>If so, would you happen to know what --march= flag should be used for the intrinsics used by the patch (crc32c{b,h,w,d} >and vmull_p64)?

Lets breakdown the instructions: the first is a scalar instruction (ARMv8.1 specific) while the second is a SIMD (NEON) instruction.

A bit of history: back when ARMv7 was released, support for NEON was optional (and IIRC there was a Tegra SoC that didn't have it). Later on, pretty much all SoCs started to have NEON support, but is not mandatory.

Therefore, you need to pass a flag (e.g. -mfpu=neon) to tell the compiler that your target has support for it.

In ARMv8 things are different: NEON support is mandatory. As a result, you don't need to pass to the compiler any flag to activate support for NEON, if your target is armv8 (e.g. -march=armv8-a). For chromium, I think NEON support is enabled by default in an arm build (e.g. target_cpu = "arm").

Issue is if you want to activate the crc32 instruction, then you got to tell the compiler about it (e.g. -march=armv8-a+crc). Depending on the compiler (gcc, clang) and version, the flags can vary.

Maybe an example can help, in the zlib upstream pull request (https://github.com/madler/zlib/pull/251/files#diff-af3b638bc2a3e6c650974192a53c7291R156). In that CMakefile, it will detect the compiler version and then pass the proper flag (it was tested with gcc 5.4 and 6.3 but not clang).

Just keep in mind that the oficial Chrome apk is not an ARMv8 specific target. At least for third_party/zlib, I didn't have to supply any specific flag to use NEON instructions (https://chromium-review.googlesource.com/c/chromium/src/+/611492/5/third_party/zlib/BUILD.gn) but had to identify if the architecture was armv8 with support for crc32 (https://chromium-review.googlesource.com/c/chromium/src/+/612629/3/third_party/zlib/BUILD.gn#69).

So back to the chromium case, you can enable it passing to 'gn args' arm_arch= "armv8-a+crc" (https://gist.github.com/Adenilson/29974397cea0ff159eb89f8fe2d1ddca).

Not sure if this is the 'recommended' way, though.

caval...@chromium.org

unread,
Aug 28, 2017, 1:53:52 PM8/28/17
to Chromium-dev, pwn...@chromium.org, caval...@chromium.org
 
I'm unfamiliar about the details of that specific patch (besides the fact that is a hybrid approach using NEON instructions for loading data and the scalar crc32 instruction), I would have to have a look on it and understand if there

 A correction: there are no vlds there.

Torne (Richard Coles)

unread,
Aug 28, 2017, 2:38:29 PM8/28/17
to caval...@chromium.org, Chromium-dev, pwn...@chromium.org
On Mon, 28 Aug 2017 at 13:50 <caval...@chromium.org> wrote:
@David: thanks a lot for the explanation and the example provided is quite helpful.

It is quite interesting to know what are the primary concerns (e.g. binary size) for the Chrome apk.

Assuming that we were able to keep the same apk/binary size (assuming: target_os = "android", target_cpu = "arm", arm_version = 8, arm_arch = "armv8-a+crc"), what would be the threshold (% speedup) to justify the effort in *future* to provide a specialized build for new mobile devices?

It's not possible to have a specialized build for ARMv8 devices in 32-bit. Android does not define a separate ABI category for this, and so the Play Store doesn't have any way to target it. The only options are armeabi-v7a (which is what our current 32-bit APK already targets) and arm64-v8a (which our 64-bit APKs target, but we don't release those to stable as discussed).
 

Using the CRC32 specific instruction on ARMv8 yielded a boost of about 7% in the time for decoding PNGs. I haven't measured it yet, but it should also help with loading gzipped webpages (e.g. google, gmail, gcalendar, engadget, etc).

@Victor: Thanks for the kind words, I feel everyone wants a better and faster Chrome on their mobile devices.

Please see the comments inline:


>Do you happen to know if the LevelDB patch would work on a 32-bit ARM?

I'm unfamiliar about the details of that specific patch (besides the fact that is a hybrid approach using NEON instructions for loading data and the scalar crc32 instruction), I would have to have a look on it and understand if there is anything there that could rely on AArch64 behavior (e.g. support for unaligned memory access). At a quick and first glimpse, it looks fine (I can also ask my colleague about it).

That being said, I personally tested the crc32 instruction in Chromium in both 32bits and 64bits mode in Chromium running in a Google Pixel (Qualcomm Snapdragon 820) and it works fine (as in https://chromium-review.googlesource.com/c/chromium/src/+/612629).

So I don't see that as a major problem. I can give the LevelDB patch a try and report back to you.


>If so, would you happen to know what --march= flag should be used for the intrinsics used by the patch (crc32c{b,h,w,d} >and vmull_p64)?

Lets breakdown the instructions: the first is a scalar instruction (ARMv8.1 specific) while the second is a SIMD (NEON) instruction.

A bit of history: back when ARMv7 was released, support for NEON was optional (and IIRC there was a Tegra SoC that didn't have it). Later on, pretty much all SoCs started to have NEON support, but is not mandatory.

Therefore, you need to pass a flag (e.g. -mfpu=neon) to tell the compiler that your target has support for it.

In ARMv8 things are different: NEON support is mandatory. As a result, you don't need to pass to the compiler any flag to activate support for NEON, if your target is armv8 (e.g. -march=armv8-a). For chromium, I think NEON support is enabled by default in an arm build (e.g. target_cpu = "arm").

Issue is if you want to activate the crc32 instruction, then you got to tell the compiler about it (e.g. -march=armv8-a+crc). Depending on the compiler (gcc, clang) and version, the flags can vary.

Maybe an example can help, in the zlib upstream pull request (https://github.com/madler/zlib/pull/251/files#diff-af3b638bc2a3e6c650974192a53c7291R156). In that CMakefile, it will detect the compiler version and then pass the proper flag (it was tested with gcc 5.4 and 6.3 but not clang).

Just keep in mind that the oficial Chrome apk is not an ARMv8 specific target. At least for third_party/zlib, I didn't have to supply any specific flag to use NEON instructions (https://chromium-review.googlesource.com/c/chromium/src/+/611492/5/third_party/zlib/BUILD.gn) but had to identify if the architecture was armv8 with support for crc32 (https://chromium-review.googlesource.com/c/chromium/src/+/612629/3/third_party/zlib/BUILD.gn#69).

So back to the chromium case, you can enable it passing to 'gn args' arm_arch= "armv8-a+crc" (https://gist.github.com/Adenilson/29974397cea0ff159eb89f8fe2d1ddca).

Not sure if this is the 'recommended' way, though.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.

mtk...@chromium.org

unread,
Aug 28, 2017, 2:47:37 PM8/28/17
to Chromium-dev, caval...@chromium.org, pwn...@chromium.org
On Monday, August 28, 2017 at 2:38:29 PM UTC-4, Torne (Richard Coles) wrote:
The only options are armeabi-v7a (which is what our current 32-bit APK already targets) and arm64-v8a (which our 64-bit APKs target, but we don't release those to stable as discussed).

Has anyone reconsidered shipping 64-bit APKs again since switching Clank builds to Clang?  I remember that GCC 4.9 was pretty garbage at 64-bit ARM code generation, and so it may have given arm64-v8a an undeserved bad reputation in terms of performance per code size.

Torne (Richard Coles)

unread,
Aug 28, 2017, 2:53:06 PM8/28/17
to mtk...@chromium.org, Chromium-dev, caval...@chromium.org, pwn...@chromium.org
Performance wasn't the problem; the issue is significantly increased memory usage, a lot of which is due to larger pointers.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
Reply all
Reply to author
Forward
0 new messages