Emscripten Architecture for emitting Wasm

189 views
Skip to first unread message

René N.

unread,
Jun 2, 2023, 12:25:54 PM6/2/23
to emscripten-discuss
Hello guys,

for a uni paper I need to explain how Emscripten works (on higher-level) to emit WebAssembly-Binary in the end.
I'm new into compilers and toolchains and I'm not sure if I understand it correctly, how Emscripten is converting compatible code (like C) to WebAssembly Binary.

If I got it right, for creating .wasm, these tools are used by Emscripten:
  • emsdk as configurator for the whole toolchain
  • emcc (which includes Clang+LLVM)
  • the (upstream) LLVM WebAssembly Backend
  • Binaryen
So if i got it right, to create .wasm, the compilation works like this:
  1. C-Code -> LLVM IR
  2. LLVM IR -> LLVM IR (optimized)
  3. LLVM IR (optimized) -> Wasm Binary
  4. Wasm Binary -> Wasm Binary (optimized)
  5. (Wasm Binary (optimized) -> JS) [optional]
Did I forget something?

What I am especially unsure of, is, which tool is doing what:
So emcc uses Clang+LLVM. Now I'm not sure if emcc emits LLVM IR only (since Clang is creating LLVM IR), or does it also convert it to emit Wasm Binary (which means that the upstream LLVM Wasm Backend lies in the emcc)?
What I'm quite sure of, is that step 1 is done by Clang and steps 4 & 5 are done by Binaryen.

Also, to me it seems like 'emcc' has 2 different meanings: 1. as part(!!) of the compilation process and 2. as command representation for the whole toolchain

Also, which component/tool of Emscripten is creating the JS and HTML Gluecode?

I'm happy if someone can help me out!

Sam Clegg

unread,
Jun 2, 2023, 1:00:21 PM6/2/23
to emscripte...@googlegroups.com
Hi René,

Great questions.  It sounds like you have a pretty good understanding of the various phases.  I will reiterate your list, filling in a few details for you.   At the high level emcc is the compiler driver, rather than the compiler itself.  gcc and clang both take this roll too, and under the hood both clang and gcc fork separate processes for the actual compiling and linking.  

emcc ->
   1. clang.exe: C-Code -> LLVM IR 
   2. clang.exe: LLVM IR -> LLVM IR (optimized) (clang/llvm) ( this really happens as part of (1) when you build with optimizations enabled)
   3. clang.exe: LLVM IR -> wasm object file
   4. wasm-ld.exe: combine files -> Wasm Binary
   5. wasm-opt.exe: Wasm Binary -> Optimized Wasm Binary (optional)
   6. emcc.py: Generate JS wrapper code (optional)


--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/emscripten-discuss/3efb5151-1d5c-4ae1-89a5-b37ed3085793n%40googlegroups.com.

René N.

unread,
Jun 3, 2023, 2:40:13 AM6/3/23
to emscripten-discuss
Thank you for answering Sam! Very glad to get a feedback this quick.

So if I understand it correctly, emcc is doing the whole C -> Wasm Binary since it's the driver of the whole toolchain, right?
I think I am irritated because emcc is called "Emscripten Compiler Frontend". So I thought that emcc is just a part of the compilation process since a "Frontend" to me is the component of a compiler which takes a source code and generates the IR while the Backend is takes the IR and generates the target language.

Also I would like to ask: Where does the "(upstream) LLVM WebAssembly Backend" comes into play? It should be in step 3 (LLVM IR -> wasm object file), right?

Referring to step 4 & 5: While wasm-opt is a Binaryen-Tool, wasm-ld is a LLVM tool, right? I'm asking this because Emscripten's website says, that "Emscripten's WebAssembly support depends on Binaryen". This statement seems wrong, since (non-optimized) Wasm Binary is generated without Binaryen.

Referring step 6: emcc.py creates JS wrapper code if needed, does it also creates the html code (optionally)?


Thanks again for helping me understand Emscripten and toolchains in general!

Sam Clegg

unread,
Jun 4, 2023, 10:46:48 PM6/4/23
to emscripte...@googlegroups.com
On Fri, Jun 2, 2023 at 11:40 PM René N. <withe...@gmail.com> wrote:
Thank you for answering Sam! Very glad to get a feedback this quick.

So if I understand it correctly, emcc is doing the whole C -> Wasm Binary since it's the driver of the whole toolchain, right?
I think I am irritated because emcc is called "Emscripten Compiler Frontend". So I thought that emcc is just a part of the compilation process since a "Frontend" to me is the component of a compiler which takes a source code and generates the IR while the Backend is takes the IR and generates the target language.

You are right, it is not quite correct to call emcc a frontend, since it doesn't actually parse any code.  clang is both a compiler driver, and a C frontend.  

Also I would like to ask: Where does the "(upstream) LLVM WebAssembly Backend" comes into play? It should be in step 3 (LLVM IR -> wasm object file), right?

Yes.
 

Referring to step 4 & 5: While wasm-opt is a Binaryen-Tool, wasm-ld is a LLVM tool, right? I'm asking this because Emscripten's website says, that "Emscripten's WebAssembly support depends on Binaryen". This statement seems wrong, since (non-optimized) Wasm Binary is generated without Binaryen.

Historically it has been true that binaryen was always required, but fairly recently we have made it possible to perform complete debug builds without binaryen (by moving a lot of things that binaryen used to do into python code in emscripten).  The documentation is a little inaccurate there.   However, to build anything for production (i.e.. any release build) binaryen is required, so I think we can still think of it as a required dependency of emscirpten.   Binaryen is also requires even for debug builds in some cases (for example if you use asyncify there is no way to avoid it).


Referring step 6: emcc.py creates JS wrapper code if needed, does it also creates the html code (optionally)?

Yes it can optionally create both JS and HTML.
 

René N.

unread,
Jun 26, 2023, 8:05:54 AM6/26/23
to emscripten-discuss
Hi Sam,
please excuse my late response. Thank you for all your answers! This helped me so much.

From all those information you gave me, I tried to make a diagram to summarize the process of Emscripten while creating C->Wasm. See the picture below. Did I got it right now?
Emscripten C-to-Wasm Sketch.png

I am wondering if these information are documented somewhere. Since I need those information for my uni paper, I have to list the sources and refer to "official" documents.

Sam Clegg

unread,
Jun 26, 2023, 2:12:29 PM6/26/23
to emscripte...@googlegroups.com
On Mon, Jun 26, 2023 at 5:05 AM René N. <withe...@gmail.com> wrote:
Hi Sam,
please excuse my late response. Thank you for all your answers! This helped me so much.

From all those information you gave me, I tried to make a diagram to summarize the process of Emscripten while creating C->Wasm. See the picture below. Did I got it right now?

Your picture looks about right to me, yes.   I assume the first disgram is the expanded form of the second box of the second diagram?  I don't know that we have any specific documentation about these internals, but I could be wrong.

 

René N.

unread,
Jun 27, 2023, 12:55:29 PM6/27/23
to emscripten-discuss
Kind of. In the first diagram I tried to show what happens in "Clang+LLVM" which is shown in the first box (named "Clang+LLVM") in the second diagram. Does it show something incorrectly?
It would a pity if the toolchain is not clearly documented as in this form. It would bring clarity to the tools used by Emscripten and their interrelationships.

Do you work "for" Emscripten (I know, it's Open Source)? Maybe I'll just refer to this discussion as "Expert Interview" for prove the source lol.

Sam Clegg

unread,
Jun 27, 2023, 1:20:04 PM6/27/23
to emscripte...@googlegroups.com
On Tue, Jun 27, 2023 at 9:55 AM René N. <withe...@gmail.com> wrote:
Kind of. In the first diagram I tried to show what happens in "Clang+LLVM" which is shown in the first box (named "Clang+LLVM") in the second diagram. Does it show something incorrectly?

That sounds right, yes.  That is what I meant by "I assume the first diagram is the expanded form of the second box of the second diagram". I was counting the upper left "emcc-compatible  source language" as the first box.

In other words, your diagram looks correct to me.


It would a pity if the toolchain is not clearly documented as in this form. It would bring clarity to the tools used by Emscripten and their interrelationships.

Do you work "for" Emscripten (I know, it's Open Source)? Maybe I'll just refer to this discussion as "Expert Interview" for prove the source lol.

I work for Google, but a lot of the work I do is on emscripten (and llvm and other tools).
 
Reply all
Reply to author
Forward
0 new messages