1) Does v8 codegen emit 64-bit machine instructions for 64-bit wasm instructions on 64-bit architectures (specifically Android)? I imagine the speedup from SWAR techniques will be significantly reduced using just 32-bit registers, perhaps to the level that doesn't make this worthwhile.
2) Does v8's codegen currently do any vectorization? If not, are there plans to add it? In this case the plain C version might be best to stick with as it would be easier for auto-vectorization to detect and optimize.
3) Can anyone provide tips / links to help with investigating and optimizing this kind of thing? Any way of flagging wasm functions for maximum optimization for benchmarking purposes?
Clemens Hammacher
Software Engineer
Google Germany GmbH
Erika-Mann-Straße 33
80636 München
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.
This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.
Hi Clemens,Thanks for those really helpful pointers.I have continued digging into this for the last week or so, and have put my current code up GitHub for anyone interested:Results are that the bit-twiddling approaches do offer a pretty decent speed-up on mobile platforms (and 64-bit desktop), so this is a promising route :)On Android it seems the version of Chrome distributed through Google Play is still the 32-bit one, even on 64-bit devices (tested on a Google Pixel 2 - chrome://version states 32-bit).Despite that, there is still a speedup using the packed implementations. The fastest on the Pixel 2 is the half_sample_uint32x2_blocks implementation, which gives something like a 2.4x speedup (2.04ms to 0.84ms for half-sampling a 720p image).Safari in iOS 12.4 is 64-bit, and there the half_sample_uint64_blocks implementation is fastest and gives more than a 3x speedup on an iPod Touch 7 (1.0ms to 0.3ms for 720p input).Each benchmark run does 10 iterations with different but overlapping input data (so both input and output are likely to be at least in L2 cache - this is the case I expect in practice). The timing numbers that are printed by the test code show the total time for all 10 iterations, so the numbers above are divided by 10 from typical outputs over multiple runs. Safari only offers 1ms resolution on Performance.now() hence is harder to get accurate measurements, but the numbers above look pretty consistent over multiple runs.Out of interest I've also tried to write an implementation targeting WebAssembly SIMD. I was able to get it to compile with emcc from latest-upstream but it doesn't run in my self-built d8 7.7 with the --experimental-wasm-simd flag. More details in the README of the repo linked above. I'd appreciate any help to get that one running for a further comparison datapoint.So the questions that arise:1) Is there any way for a page to detect if the browser is 32-bit or 64-bit? navigator.platform reports Linux armv8l on the Pixel 2, so that doesn't help unfortunately. I have 32 and 64 bit "busy loops" that report approximately equal counts on 64-bit platforms (and not on 32-bit), but it would be nice if there was a more direct way to determine this!2) Are there plans to transition Play Store Chrome releases to 64-bit for 64-bit Android? I did some searching but couldn't find any official information about the reasons for sticking with 32-bit there (though I assume memory usage, apk size, or both).
--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/09853b85-6d5d-4726-ad90-c11d566396d1%40googlegroups.com.
Hi Clemens,Thanks for those really helpful pointers.I have continued digging into this for the last week or so, and have put my current code up GitHub for anyone interested:Results are that the bit-twiddling approaches do offer a pretty decent speed-up on mobile platforms (and 64-bit desktop), so this is a promising route :)On Android it seems the version of Chrome distributed through Google Play is still the 32-bit one, even on 64-bit devices (tested on a Google Pixel 2 - chrome://version states 32-bit).Despite that, there is still a speedup using the packed implementations. The fastest on the Pixel 2 is the half_sample_uint32x2_blocks implementation, which gives something like a 2.4x speedup (2.04ms to 0.84ms for half-sampling a 720p image).Safari in iOS 12.4 is 64-bit, and there the half_sample_uint64_blocks implementation is fastest and gives more than a 3x speedup on an iPod Touch 7 (1.0ms to 0.3ms for 720p input).Each benchmark run does 10 iterations with different but overlapping input data (so both input and output are likely to be at least in L2 cache - this is the case I expect in practice). The timing numbers that are printed by the test code show the total time for all 10 iterations, so the numbers above are divided by 10 from typical outputs over multiple runs. Safari only offers 1ms resolution on Performance.now() hence is harder to get accurate measurements, but the numbers above look pretty consistent over multiple runs.Out of interest I've also tried to write an implementation targeting WebAssembly SIMD. I was able to get it to compile with emcc from latest-upstream but it doesn't run in my self-built d8 7.7 with the --experimental-wasm-simd flag. More details in the README of the repo linked above. I'd appreciate any help to get that one running for a further comparison datapoint.So the questions that arise:1) Is there any way for a page to detect if the browser is 32-bit or 64-bit? navigator.platform reports Linux armv8l on the Pixel 2, so that doesn't help unfortunately. I have 32 and 64 bit "busy loops" that report approximately equal counts on 64-bit platforms (and not on 32-bit), but it would be nice if there was a more direct way to determine this!2) Are there plans to transition Play Store Chrome releases to 64-bit for 64-bit Android? I did some searching but couldn't find any official information about the reasons for sticking with 32-bit there (though I assume memory usage, apk size, or both).3) Any hints on how I can get the SIMD version to run or is this likely just a bug / spec instability between emcc and d8?
--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/09853b85-6d5d-4726-ad90-c11d566396d1%40googlegroups.com.
To answer one of the questions here (inline):
[...]
2) Are there plans to transition Play Store Chrome releases to 64-bit for 64-bit Android? I did some searching but couldn't find any official information about the reasons for sticking with 32-bit there (though I assume memory usage, apk size, or both).
There is a plan to transition Play Store Chrome releases to 64-bit for some 64-bit Android devices (those over a certain memory threshold, yet to be determined). You are right that the reason for sticking with 32-bits is for memory reasons. We don't have a timeline for this yet, but it is unlikely to hit the stable channel until sometime early next year.
On Tue, Sep 10, 2019 at 10:20 AM <si...@zappar.com> wrote:
[...]
3) Any hints on how I can get the SIMD version to run or is this likely just a bug / spec instability between emcc and d8?
This indeed sounds like a bug, probably on the emscripten side. Can you open an emscripten bug about this?
Chrome Canary for Android on Pixel 2 unfortunately just gets an "Aw, Snap" page when trying to load the test (with simd enabled in chrome://flags).
Simon
--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/1d5fa9c6-fc9c-443f-89f6-870a61c39dd6%40googlegroups.com.
On Mon, Sep 16, 2019 at 12:41 PM <si...@zappar.com> wrote:Chrome Canary for Android on Pixel 2 unfortunately just gets an "Aw, Snap" page when trying to load the test (with simd enabled in chrome://flags).Oh, that's unfortunate. Can you send a crash ID (from chrome://crashes)?
If you have a local reproducer, it would also be really helpful to open a v8 bug.
| plain | uint32_blocks | uint32x2_blocks | uint64_blocks | wasm_simd
-----------------------------------------------------------------------------------------------
MacBook Pro, x64 | 3.16 | 2.37 (1.33x) | 2.06 (1.54x) | 1.60 (1.98x) | 0.64 (4.91x)
Pixel 2, armv7, 32 bit | 20.25 | 9.95 (2.04x) | 8.31 (2.44x) | 9.68 (2.09x) | ---
Pixel 2, arm64, 64 bit | 16.95 | 9.08 (1.87x) | 7.08 (2.39x) | 4.98 (3.40x) | 2.72 (6.23x)