[LLVMdev] [RFC] WebAssembly Backend

291 views
Skip to first unread message

Dan Gohman

unread,
Jun 17, 2015, 12:21:41 PM6/17/15
to llv...@cs.uiuc.edu
Hello all,

WebAssembly [0] its a new virtual ISA being designed to efficiently
run compiled code in web browsers and other things, starting with
C/C++, and eventually many other languages [1]. WebAssembly
distinguishes itself from other virtual ISAs with optimizations to
reduce download size and decode time, strong portability and
predictability invariants (for example, the base has no undefined
behavior in the C/C++ sense), and participation from several browser
vendors.

We're interested in developing and contributing an LLVM backend to
target this new ISA. There are many interesting technical aspects that
we’re excited to discuss with the LLVM community. Before we get
started though, we need to figure out how to do our development. Most
backends in LLVM were initially submitted in monolithic form and
developed incrementally thereafter. However, we have contributors from
multiple organizations, and a monolithic patch wouldn’t accurately
reflect the separate contributions.

Would the LLVM community be willing to let us start a new target from
scratch within the LLVM tree, following normal LLVM
incremental-development practices? The target would naturally start as
"experimental", excluded from the default build. The code organization
would look like any other backend, with everything under
lib/Target/WebAssembly except for various bits of configury that any
backend needs. We have need of the functionality provided by
SelectionDAG, MI and others, so this will pretty clearly be a backend,
rather than a specialized serialization. Also, the people leading the
project are JF Bastien and Dan Gohman, existing LLVM contributors
familiar with various relevant areas of LLVM.

Additionally, there are opportunities to refactor generic
infrastructure in LLVM to better support the needs of virtual ISAs,
including those in LLVM already and possibly more in the future.
Working in LLVM from the start would make collaboration with the rest
of the community easier.

We look forward to your feedback and questions. Thanks!

- The WebAssembly Community Group

[0] https://github.com/WebAssembly/design/blob/master/README.md
[1] https://github.com/WebAssembly/design/blob/master/HighLevelGoals.md

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Hal Finkel

unread,
Jun 17, 2015, 3:22:59 PM6/17/15
to Dan Gohman, llv...@cs.uiuc.edu

I agree, developing the backend in tree is the best path. Specifically, it is the best way for the community to monitor the development and ensure proper test coverage. Obviously, there are some number of files needed for a backend to be minimally functional, and I think that putting those together and submitting them for review as a new experimental backend is perfectly appropriate.

-Hal

>
> We look forward to your feedback and questions. Thanks!
>
> - The WebAssembly Community Group
>
> [0] https://github.com/WebAssembly/design/blob/master/README.md
> [1]
> https://github.com/WebAssembly/design/blob/master/HighLevelGoals.md
>
> _______________________________________________
> LLVM Developers mailing list
> LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Chandler Carruth

unread,
Jun 17, 2015, 3:41:05 PM6/17/15
to Hal Finkel, Dan Gohman, llv...@cs.uiuc.edu
FWIW, I agree.

However, I also suggested this approach, so I'm probably biased. ;] I just wanted to explicitly say that I think this is a great path forward and I'm cautiously optimistic about the entire effort.

Chris Lattner

unread,
Jun 17, 2015, 6:00:00 PM6/17/15
to Dan Gohman, llv...@cs.uiuc.edu
On Jun 17, 2015, at 9:16 AM, Dan Gohman <dan4...@gmail.com> wrote:
> Would the LLVM community be willing to let us start a new target from
> scratch within the LLVM tree, following normal LLVM
> incremental-development practices? The target would naturally start as
> "experimental", excluded from the default build. The code organization
> would look like any other backend, with everything under
> lib/Target/WebAssembly except for various bits of configury that any
> backend needs. We have need of the functionality provided by
> SelectionDAG, MI and others, so this will pretty clearly be a backend,
> rather than a specialized serialization. Also, the people leading the
> project are JF Bastien and Dan Gohman, existing LLVM contributors
> familiar with various relevant areas of LLVM.

Sounds great to me,

-Chris

Philip Reames

unread,
Jun 17, 2015, 7:14:09 PM6/17/15
to Dan Gohman, llv...@cs.uiuc.edu
On 06/17/2015 09:16 AM, Dan Gohman wrote:
> Would the LLVM community be willing to let us start a new target from
> scratch within the LLVM tree, following normal LLVM
> incremental-development practices? The target would naturally start as
> "experimental", excluded from the default build. The code organization
> would look like any other backend, with everything under
> lib/Target/WebAssembly except for various bits of configury that any
> backend needs. We have need of the functionality provided by
> SelectionDAG, MI and others, so this will pretty clearly be a backend,
> rather than a specialized serialization. Also, the people leading the
> project are JF Bastien and Dan Gohman, existing LLVM contributors
> familiar with various relevant areas of LLVM.
+1. I see no real downside to this proposal.

> Additionally, there are opportunities to refactor generic
> infrastructure in LLVM to better support the needs of virtual ISAs,
> including those in LLVM already and possibly more in the future.
> Working in LLVM from the start would make collaboration with the rest
> of the community easier.
Can you list some of the areas you know are going to need attention?
What problems are you seeing? (It might be best to move this to it's
own thread or keep this very high level for now.)

Philip

Dan Gohman

unread,
Jun 17, 2015, 9:28:28 PM6/17/15
to Philip Reames, llv...@cs.uiuc.edu
On Wed, Jun 17, 2015 at 4:12 PM, Philip Reames
<list...@philipreames.com> wrote:
> On 06/17/2015 09:16 AM, Dan Gohman wrote:
>> Additionally, there are opportunities to refactor generic
>> infrastructure in LLVM to better support the needs of virtual ISAs,
>> including those in LLVM already and possibly more in the future.
>> Working in LLVM from the start would make collaboration with the rest
>> of the community easier.
>
> Can you list some of the areas you know are going to need attention? What
> problems are you seeing? (It might be best to move this to it's own thread
> or keep this very high level for now.)

We'll definitely be starting separate threads when we're ready, but
briefly, one of the things we'll need to look into is MC. Right now,
MC's interfaces appear to be tailored for native-binary-style
encodings, so if we're going to use MC, it'll likely need some
refactoring to generalize it.

Dan

Tom Stellard

unread,
Jun 18, 2015, 9:33:06 AM6/18/15
to Dan Gohman, llv...@cs.uiuc.edu

Hi,

This seems interesting, I have a few questions:


Has the ISA been finalized yet or is it still a work in progress? Will
there be a fixed number of registers?

How will the ISA be transformed to machine code?

Why do you want to develop a full backend as opposed to a simple LLVM
IR translation pass that converts IR directly to WebAssembly?

-Tom

JF Bastien

unread,
Jun 18, 2015, 9:49:32 AM6/18/15
to Tom Stellard, llv...@cs.uiuc.edu
This seems interesting, I have a few questions:


Has the ISA been finalized yet or is it still a work in progress?  Will
there be a fixed number of registers?

The design document has a high-level idea of the ISA, or rather of the AST we're thinking of going with:
The final encoding isn't decided on, we're still missing experiments to figure out precise details.

We foresee having an infinite number of locals per function, but we plan to pre-color them so that locals whose lifetimes don't interfere can be merged. We can get clever and do this in an interesting order.


How will the ISA be transformed to machine code?

That's implementation dependent. Initially, a polyfill to JavaScript because that's what exists today. We'll also implement a reference interpreter to validate the spec. Each browser can do what it wants to have fast and secure native support.


Why do you want to develop a full backend as opposed to a simple LLVM
IR translation pass that converts IR directly to WebAssembly?

Because that's proven to have unfortunate shortcomings in both PNaCl and Emscripten. Doing a true backend has significant advantages including in amount of code needed, and in what e.g. ISel can do for us for legalization and clever instruction selection.

Dan Gohman

unread,
Jun 19, 2015, 10:36:50 AM6/19/15
to Hal Finkel, llv...@cs.uiuc.edu
On Wed, Jun 17, 2015 at 12:20 PM, Hal Finkel <hfi...@anl.gov> wrote:
> I agree, developing the backend in tree is the best path. Specifically, it is the best way for the community to monitor the development and ensure proper test coverage. Obviously, there are some number of files needed for a backend to be minimally functional, and I think that putting those together and submitting them for review as a new experimental backend is perfectly appropriate.

For people following this thread, I've now posted such a patch here:

http://reviews.llvm.org/D10569

Dan

Jakob Stoklund Olesen

unread,
Jun 19, 2015, 10:53:31 AM6/19/15
to JF Bastien, LLVM Developers Mailing List

On Jun 18, 2015, at 6:47 AM, JF Bastien <j...@google.com> wrote:

We foresee having an infinite number of locals per function, but we plan to pre-color them so that locals whose lifetimes don't interfere can be merged. We can get clever and do this in an interesting order.

The NVPTX target does this with the existing register allocator by defining 800 physical registers of each type. It means you have to deal with unwanted spill code for those programs that exhaust the 800 registers.

The greedy register allocator doesn’t scale too well along that axis IMO. I think you get something like O(#vregs x #colors-used) behavior. It’s really designed to handle a small, fixed number of physical registers.

Matthias, Quentin: How well do the SSA-based register allocator algorithms work with infinite colors available?

Thanks,
/jakob

Quentin Colombet

unread,
Jun 19, 2015, 12:56:13 PM6/19/15
to Jakob Stoklund Olesen, LLVM Developers Mailing List
The SSA-based register allocators work in linear time of the number of instructions.

Cheers,
Q.


Thanks,
/jakob

Owen Anderson

unread,
Jun 19, 2015, 12:59:03 PM6/19/15
to Jakob Stoklund Olesen, LLVM Developers Mailing List
Hi Jakob,

On Jun 19, 2015, at 7:49 AM, Jakob Stoklund Olesen <stok...@2pi.dk> wrote:

Matthias, Quentin: How well do the SSA-based register allocator algorithms work with infinite colors available?

The chordal coloring algorithm, at least, would be a perfect fit here.  It finds the minimal coloring without pre-determined bound.

—Owen

Quentin Colombet

unread,
Jun 19, 2015, 1:07:41 PM6/19/15
to Jakob Stoklund Olesen, LLVM Developers Mailing List
For those interested in figures, check out:
http://q-colo01.appspot.com/thesis_slides.pdf slides 33/40 for high level information.
http://q-colo01.appspot.com/thesis.pdf 5.5.2.1 page 109, for more details.

Q.


Cheers,
Q.


Thanks,
/jakob

Matthias Braun

unread,
Jun 19, 2015, 1:45:44 PM6/19/15
to Quentin Colombet, LLVM Developers Mailing List
If you don't have any register constraints SSA coloring boils down to: "Walk the program in dominance order and assign a fresh register to each def and release the register on a last use". However you need a way to implement register swaps for phi instructions afterwards although with infinite temp registers this is trivial.
I'd definitely recommend implementing a simple allocator from scratch, maybe reusing the liveness calculation, as you don't need all the tricks necessary to deal with register constraints, two-address instructions, precolored registers, spill code placement, ...

- Matthias

On Jun 19, 2015, at 9:50 AM, Quentin Colombet <qcol...@apple.com> wrote:

Stephen Cross

unread,
Jun 19, 2015, 6:04:14 PM6/19/15
to JF Bastien, llv...@cs.uiuc.edu
Hi,

I've read the documentation (looks very interesting!) and have some
questions; apologies if these are answered elsewhere.

> That's implementation dependent. Initially, a polyfill to JavaScript because
> that's what exists today. We'll also implement a reference interpreter to
> validate the spec. Each browser can do what it wants to have fast and secure
> native support.

Is there likely to be a desire at some point to produce an LLVM
frontend (i.e. to convert from WebAssembly to LLVM IR), for example
for compiling WebAssembly to native binaries? Furthermore, does it
seem likely that any of the browsers would use LLVM on the client side
for optimisation or JIT execution? Based on the comparison against
LLVM IR, I assume the model is to use WebAssembly as a portable means
of communicating programs but LLVM IR (or some other IR) for
performing optimisations and other transformations (reminds me of this
discussion: http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043719.html
).

I'm also wondering about the C ABI for WebAssembly. It sounds like
this currently isn't specified because dynamic linking isn't supported
yet, but presumably front-ends should be working on some kind of
placeholder standard to avoid later incompatibility? And won't the Web
APIs need to have some sort of ABI?

I noticed the C and C++ page says pointers are 32-bits, but there will
also ultimately be a 64-bit variant. Is there any reason to not just
start off with 64-bit sized pointers and initially restrict usage to a
32-bit sized range?

In the post-MVP page it mentions 128-bit vectors; why not allow
arbitrary (fixed-size) vectors and then have these lowered in an
appropriate way on the target?

Thanks,
Stephen

Dan Gohman

unread,
Jun 19, 2015, 7:05:56 PM6/19/15
to Stephen Cross, llv...@cs.uiuc.edu
On Fri, Jun 19, 2015 at 2:57 PM, Stephen Cross <scr...@scross.co.uk> wrote:
> Hi,
>
> I've read the documentation (looks very interesting!) and have some
> questions; apologies if these are answered elsewhere.
>
>> That's implementation dependent. Initially, a polyfill to JavaScript because
>> that's what exists today. We'll also implement a reference interpreter to
>> validate the spec. Each browser can do what it wants to have fast and secure
>> native support.
>
> Is there likely to be a desire at some point to produce an LLVM
> frontend (i.e. to convert from WebAssembly to LLVM IR), for example
> for compiling WebAssembly to native binaries? Furthermore, does it
> seem likely that any of the browsers would use LLVM on the client side
> for optimisation or JIT execution? Based on the comparison against
> LLVM IR, I assume the model is to use WebAssembly as a portable means
> of communicating programs but LLVM IR (or some other IR) for
> performing optimisations and other transformations (reminds me of this
> discussion: http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043719.html
> ).

Yes, it is likely someone will have this desire at some point. Such an
effort would be an entirely separate project from this WebAssembly
backend project though. To put things in perspective though,
WebAssembly is lower-level than LLVM IR. Round-tripping from LLVM IR
down to WebAssembly and back to LLVM IR would be lossy.

>
> I'm also wondering about the C ABI for WebAssembly. It sounds like
> this currently isn't specified because dynamic linking isn't supported
> yet, but presumably front-ends should be working on some kind of
> placeholder standard to avoid later incompatibility? And won't the Web
> APIs need to have some sort of ABI?

In the MVP, WebAssembly won't be able to talk to the Web APIs
directly, and applications will include a layer of JS which the
WebAssembly code can call into to do the work for it, similar to how
asm.js works today. In the future, the plan is for that layer to
become unnecessary. ABIs will be very important -- so important, that
we want to avoid getting stuck too early :-).

> I noticed the C and C++ page says pointers are 32-bits, but there will
> also ultimately be a 64-bit variant. Is there any reason to not just
> start off with 64-bit sized pointers and initially restrict usage to a
> 32-bit sized range?

Requiring 64-bit pointers when only 32 bits of them are used would
waste a lot of memory, which is not a cheap commodity in some of the
environments WebAssembly is targeting.

> In the post-MVP page it mentions 128-bit vectors; why not allow
> arbitrary (fixed-size) vectors and then have these lowered in an
> appropriate way on the target?

128 bits is a good place to start, and it leaves room for adding more
things in the future.

Dan

Ronan KERYELL

unread,
Jun 19, 2015, 9:36:05 PM6/19/15
to Dan Gohman, llv...@cs.uiuc.edu
>>>>> On Fri, 19 Jun 2015 15:57:26 -0700, Dan Gohman <dan4...@gmail.com> said:

Dan> Yes, it is likely someone will have this desire at some
Dan> point. Such an effort would be an entirely separate project
Dan> from this WebAssembly backend project though. To put things in
Dan> perspective though, WebAssembly is lower-level than LLVM
Dan> IR. Round-tripping from LLVM IR down to WebAssembly and back to
Dan> LLVM IR would be lossy.

This looks like déjà vu with the current discussion on SPIR-V in another
thread. :-)

At some point any virtual IR (pick yours: SPIR, SPIR-V, PTX, AIR,
HSA...) could benefit from having some generic support for this in
LLVM...
--
Ronan KERYELL

Reply all
Reply to author
Forward
0 new messages