LLVM V8 Code Generator for WebAssembly

255 views
Skip to first unread message

manji...@gmail.com

unread,
Nov 20, 2019, 7:02:34 PM11/20/19
to v8-dev
Hi,
    I am working on a new code generator that based on LLVM, and generate code for both V8 built-ins and WebAssembly.
    I try to compile the libffmpeg for video decoder, and find out 50% and more time reduction. Mainly because LLVM supports auto vectorize.
    This project only support arm32 android for now.
    Anyone feel interested may checkout the source from https://github.com/linzj/llvm-toy. And switch to the branch named  arm-tf-7_8_279_17_725bcedcb69.
    The hash at the end is the git commit hash.
Message has been deleted

manji...@gmail.com

unread,
Nov 21, 2019, 1:35:34 AM11/21/19
to v8-dev

llvm.png

nollvm.png

2019-11-21 14-27-06 的屏幕截图.png

Octane benchmark improvement, with no-opt manually.



在 2019年11月21日星期四 UTC+8上午8:02:34,manji...@gmail.com写道:

Yang Guo

unread,
Nov 21, 2019, 2:46:19 AM11/21/19
to v8-...@googlegroups.com
Interesting. Thanks for sharing!

I noticed that the source code you linked to have .gyp files. Which V8 version does it target? Do you have steps to build it and reproduce your benchmark results?

Cheers,

Yang

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/13052418-21de-4311-8244-008ee0a1db2f%40googlegroups.com.

manji...@gmail.com

unread,
Nov 21, 2019, 6:48:29 PM11/21/19
to v8-dev
The lastest version is for 7.8. And the hash at the last field of the branch name is the hash for git.
You can checkout the version using this hash, like git checkout  725bcedcb69.
You should checkout the LLVM version 8, the final release version, then patch it with the llvm-patch-by-far.patch, and build it to libLLVM-8.so.The build script can be referred to   config32.sh.
Then patch v8 with v8.patch.
Then rsync src/ to v8/src/.
Then copy libLLVM-8.so to v8/lib/
Then you can build it with chrome. Oh, don't forget to add "UC_BUILD_TF_LLVM_BACKEND" to defines.

在 2019年11月21日星期四 UTC+8下午3:46:19,Yang Guo写道:
Interesting. Thanks for sharing!

I noticed that the source code you linked to have .gyp files. Which V8 version does it target? Do you have steps to build it and reproduce your benchmark results?

Cheers,

Yang

On Thu, Nov 21, 2019 at 7:35 AM <manji...@gmail.com> wrote:

llvm.png

nollvm.png

2019-11-21 14-27-06 的屏幕截图.png

Octane benchmark improvement, with no-opt manually.



在 2019年11月21日星期四 UTC+8上午8:02:34,manji...@gmail.com写道:
Hi,
    I am working on a new code generator that based on LLVM, and generate code for both V8 built-ins and WebAssembly.
    I try to compile the libffmpeg for video decoder, and find out 50% and more time reduction. Mainly because LLVM supports auto vectorize.
    This project only support arm32 android for now.
    Anyone feel interested may checkout the source from https://github.com/linzj/llvm-toy. And switch to the branch named  arm-tf-7_8_279_17_725bcedcb69.
    The hash at the end is the git commit hash.

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-...@googlegroups.com.

manji...@gmail.com

unread,
Nov 25, 2019, 8:35:39 PM11/25/19
to v8-dev

You may test the arm-6-9 branch first. The performance data show above comes from this branch. And it is so stable that it survived alibaba's 11.11. 

mvst...@google.com

unread,
Nov 27, 2019, 9:53:13 AM11/27/19
to v8-dev
Hi, I've created a tracking bug on which we'll apply some more investigation (https://bugs.chromium.org/p/v8/issues/detail?id=10023). We'll have a detailed look in January. Are you interested in upstreaming your work? We're interested in using this for the built-ins and interpreter bytecode handlers, as part of the mksnapshot build step. We wouldn't try to do this at runtime.

All the best,
--Michael Stanton
--V8 Compiler Team, Manager

manji...@gmail.com

unread,
Dec 2, 2019, 1:10:33 AM12/2/19
to v8-dev
Yes, thanks. I am maintaining this feature, and just only me. I feel tired to fix all kinds of bug when new feature arise. Like ephemorons, and deleting isolate parameter from RecordWrite.

在 2019年11月27日星期三 UTC+8下午10:53:13,mvst...@google.com写道:

manji...@gmail.com

unread,
Dec 2, 2019, 1:12:32 AM12/2/19
to v8-dev

I don't check the mail in time, sorry for the late respond.

在 2019年11月27日星期三 UTC+8下午10:53:13,mvst...@google.com写道:
Hi, I've created a tracking bug on which we'll apply some more investigation (https://bugs.chromium.org/p/v8/issues/detail?id=10023). We'll have a detailed look in January. Are you interested in upstreaming your work? We're interested in using this for the built-ins and interpreter bytecode handlers, as part of the mksnapshot build step. We wouldn't try to do this at runtime.

manji...@gmail.com

unread,
Dec 2, 2019, 1:31:26 AM12/2/19
to v8-dev
We don't use this at runtime either. The compile time is horrible, and v8's binary size will grow up dramatically.
But we do try it for wasm. And make the binary performance close to the clang version.


在 2019年11月27日星期三 UTC+8下午10:53:13,mvst...@google.com写道:
Hi, I've created a tracking bug on which we'll apply some more investigation (https://bugs.chromium.org/p/v8/issues/detail?id=10023). We'll have a detailed look in January. Are you interested in upstreaming your work? We're interested in using this for the built-ins and interpreter bytecode handlers, as part of the mksnapshot build step. We wouldn't try to do this at runtime.

manji...@gmail.com

unread,
Dec 4, 2019, 3:44:54 PM12/4/19
to v8-dev
V8 have safe points record for c-call functions now. I am tracking a gc crash problem for days and finally find this change from 69.

manji...@gmail.com

unread,
Jan 7, 2020, 5:44:21 AM1/7/20
to v8-dev
Hi,
Any progress? I am looking forward to upstream my compiler. And please grant me the permission to the bug tracking system.

Michael Stanton

unread,
Jan 7, 2020, 6:56:44 AM1/7/20
to v8-...@googlegroups.com
I've added your email on 'cc to the tracking bug. We scheduled to have a look in mid-January, and we'll update the tracking bug when we have more information.

On Tue, Jan 7, 2020 at 11:44 AM <manji...@gmail.com> wrote:
Hi,
Any progress? I am looking forward to upstream my compiler.  And please grant me the permission to the bug tracking system.

You received this message because you are subscribed to a topic in the Google Groups "v8-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/v8-dev/JvXutIAFQvw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/ecef11b7-82dc-468c-8a10-10ab27f96ed3%40googlegroups.com.


--

Michael Stanton

Manager, V8 Compiler Team

mvst...@google.com


Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.

    

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.


manji...@gmail.com

unread,
Jan 9, 2020, 7:21:04 PM1/9/20
to v8-dev

2020-01-09 17-56-56 的屏幕截图.png

2020-01-09 17-59-12 的屏幕截图.png

Here's the profiler output when I test the following test on a Huawei 荣耀9 cell phone, the v8 version is 7.8:
<body>
<script>
(function() {
  let a = [];
  let i;
  let t0 = performance.now();
  for (i = 0; i < 0x1000000; ++i) {
    a.push(i);
  }
  let t1 = performance.now();
  console.log('time: ' + (t1 - t0));
})()
</script>
</body>

I feel glad to see the LdaNamedProperty has the amazing improvment, which is my original optimize target.
The compiler working on 7.8 version is pretty stable now, I have fixed many bugs these days.

在 2019年11月21日星期四 UTC+8上午8:02:34,manji...@gmail.com写道:
Hi,

Dan Elphick

unread,
Jan 10, 2020, 5:54:27 AM1/10/20
to v8-...@googlegroups.com
These results look very impressive. Do you think you could post the generated assembly code for something simple like StackCheck with and without llvm generation? I wonder if it's improving the stack check itself of the dispatch.

Thanks,
Dan

--
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/68666882-6e5c-478f-b067-4eb140c403da%40googlegroups.com.

manji...@gmail.com

unread,
Jan 12, 2020, 6:51:12 PM1/12/20
to v8-dev
Hi Dan,
Here it is.
And both disable the untrusted code migitation.


在 2020年1月10日星期五 UTC+8下午6:54:27,Dan Elphick写道:
To unsubscribe from this group and stop receiving emails from it, send an email to v8-...@googlegroups.com.
code.7z

Dan Elphick

unread,
Jan 13, 2020, 6:27:04 AM1/13/20
to v8-...@googlegroups.com
Thanks for sending this. There are some really interesting results here.

For instance in StackCheck, TF generates this sequence:
0x4560805c    1c  e30f903d       movw r9, #61501
0x45608060    20  e34f9fff       movt r9, #65535
0x45608064    24  e08a9009       add r9, r10, r9
0x45608068    28  e5999000       ldr r9, [r9, #+0]

where LLVM generates:
0x455f61fc    1c  e51a0fc3       ldr r0, [r10, #-4035]

This is definitely something we could do better in TF.

We did notice some problems though in the generated code. For SetPendingMessage, TF generates:
0x45608140     0  e30015f1       movw r1, #1521
0x45608144     4  e08a1001       add r1, r10, r1
0x45608148     8  e5913000       ldr r3, [r1, #+0]
0x4560814c     c  e5810000       str r0, [r1, #+0]
0x45608150    10  e2855001       add r5, r5, #1
0x45608154    14  e7d61005       ldrb r1, [r6, +r5]
0x45608158    18  e7982101       ldr r2, [r8, +r1, lsl #2]
0x4560815c    1c  e1a00003       mov r0, r3
0x45608160    20  e12fff12       bx r2
0x45608164    24  e1a00000       mov r0, r0

whereas LLVM generates:

0x455f62c0     0  e2855001       add r5, r5, #1
0x455f62c4     4  e58a05f1       str r0, [r10, #+1521]
0x455f62c8     8  e7d60005       ldrb r0, [r6, +r5]
0x455f62cc     c  e7981100       ldr r1, [r8, +r0, lsl #2]
0x455f62d0    10  e59a05f1       ldr r0, [r10, #+1521]
0x455f62d4    14  e12fff11       bx r1

Again, this highlights how TF is not generating the most efficient instruction sequences for loads, but it looks like there's a problem in the LLVM code. It's loading into the accumulator (r0) the value it has just stored there, which means the accumulator will be unchanged as a result of this bytecode, but if you look at the original source, SetPendingMessage sets the accumulator to previous pending message. This is what happens in the TF case.

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/70cc42d0-1ee6-4448-8e16-2bcafcfefaf9%40googlegroups.com.

manji...@gmail.com

unread,
Jan 13, 2020, 8:40:50 AM1/13/20
to v8-dev
Oh, you find a bug!
I will fix it tomorrow.
That's caused by the recent patch that make LLVM thinks root register points to a readonly memory region.

Actually a safe point recording bug has just been fixed today.

manji...@gmail.com

unread,
Jan 13, 2020, 8:49:16 AM1/13/20
to v8-dev
LLVM generates worse code
like v8. It's a bug. I make it realize GetElementPtr IR can use (-4096, 4096) constant at zero cost.

manji...@gmail.com

unread,
Jan 13, 2020, 9:27:50 AM1/13/20
to v8-dev
How do you find this bug so quick?
And note that LdaNamedProperty is 1/3 smaller than tf one. That should explain why the time it consumes is 1/3 less.

Dan Elphick

unread,
Jan 13, 2020, 10:35:17 AM1/13/20
to v8-...@googlegroups.com
It was in the builtin next to StackCheck and was quite short and so seemed worth looking at + the original interpreter authors were standing next to me.

On Mon, 13 Jan 2020 at 14:27, <manji...@gmail.com> wrote:
How do you find this bug so quick?
And note that LdaNamedProperty is 1/3 smaller than tf one. That should explain why the time it consumes is 1/3 less.

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.

manji...@gmail.com

unread,
Jan 13, 2020, 9:21:29 PM1/13/20
to v8-dev
Here's the fixed code.
SetPendingMessage breaks the assumption of that root pointer refers to a readonly region.
The screen captures are the comparison of turbofan and LLVM, LLVM still does better in most builtin-functions.
LLVM's LdaNamedProperty still uses 31.79% less time than turbofan, and although the fixed version is 20 bytes larger than the old one, it still 24.70% smaller than the turbofan version.

I think V8 should make the memory region that root pointer refers to readonly. That will enable LLVM generate better code.
code_llvm.7z
2020-01-14 10-02-11 的屏幕截图.png
2020-01-14 10-02-10 的屏幕截图.png

manji...@gmail.com

unread,
Jan 14, 2020, 4:49:05 AM1/14/20
to v8-dev
This code reenable root region rematerialize, the last one has a problem and disable rematerialize.

I have not yet get my cell phone back, so it has not been tested.
code_llvm.7z

Dan Elphick

unread,
Jan 14, 2020, 5:28:27 AM1/14/20
to v8-...@googlegroups.com
In both LLVM builtin dumps you posted, I noticed that EphemeronKeyBarrier is significantly shorter for LLVM 80 bytes vs 348 for the original.

It looks like LLVM has optimised away several conditionals including the check for fp_mode instead assuming it's kSaveFPRegs. Since fp_mode is a parameter to the builtin this does not seem like a safe optimisation.

On Tue, 14 Jan 2020 at 09:49, <manji...@gmail.com> wrote:
This code reenable root region rematerialize, the last one has a problem and disable rematerialize.

I have not yet get my cell phone back, so it has not been tested.

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.

manji...@gmail.com

unread,
Jan 14, 2020, 10:24:34 AM1/14/20
to v8-dev
I make the prologue saves all non c's callee save float point registers. So that no matter what the fp_mode is all the float point registers are secured. But it's definitely not the most effective way. It's a conservative solution. We may optimize the call to RecordWrites in the code assemble phase, when the float point usage can be discovered.

Kurten Chan

unread,
Mar 5, 2020, 12:45:16 AM3/5/20
to v8-dev
Hi, 
Any progress for this investigation?

In our fast start project maybe need this feature.

在 2019年11月27日星期三 UTC+8下午10:53:13,Michael Stanton写道:

Dan Elphick

unread,
Mar 6, 2020, 4:29:34 AM3/6/20
to v8-...@googlegroups.com
Hi there,

The tracking bug for this effort is https://bugs.chromium.org/p/v8/issues/detail?id=10023, which has some further discussion adding to what’s in this thread. We’ve made several changes to TurboFan generation in response to this and we’re always happy to accept patches or look at specific cases where our pipeline is deficient.

However at this point, we don't have the resources to commit to a project of this size. We're still open to the concept of using LLVM for builtin generation and would accept compelling patches, but to put this in context, we expect a change of this nature would incur significant costs in terms of maintaining patches to LLVM as well as greatly complicating the build processes for V8 and Chrome.

Thanks again for your impressive work and proving that this is possible!

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.

manji...@gmail.com

unread,
Mar 8, 2020, 10:02:29 PM3/8/20
to v8-dev
I think this message should refer to me.
Ok, I understand. It's too complicated to invovle other big project like LLVM. I have done a lot modification on it, including codegen/arch/analysis... So many people own these different modules should get involved.
It's much simpler for the third party to handle these jobs.
I will continue this project on UC Browser, however. It is already RTM to UC Browser and Taobao Mobile/Alipay Mobile  in the version 6.9, I will continue to improve in the upcomming version 7.8.

在 2020年3月6日星期五 UTC+8下午5:29:34,Dan Elphick写道:
To unsubscribe from this group and stop receiving emails from it, send an email to v8-...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages