Generic LLVM toochain, PNaCl and NaCl ABI

86 views
Skip to first unread message

Andrew Parker

unread,
Jun 18, 2015, 5:51:50 AM6/18/15
to native-cli...@googlegroups.com
I've not spent much time with the PNaCl tooling so only understand PNaCl and NaCl from a high(ish) level. I thus haven't got a good working knowledge of the toolchain so some of my questions may have obvious, many times repeated answers. Apologies in advance if this is the case and please redirect me with extreme predjudice (the links are very welcome of course). 

I'd like to know whether the PNaCl toolchain is now an entirely separate fork from the LLVM mainline or whether it is compatible with it (or perhaps already integrated).  I'm aware that PNaCl uses a frozen version of LLVM IR which is undoubtedly different from today's mainline version. Furthermore, I'm also aware of the significant ABI differences between native code compiled by a "standard" (read LLVM mainline) native compiler and the PNaCl backend for emitting native code.  What I'd like to understand is a couple of things:

- Does the PNaCl IR encode the sandbox ABI differences or are these determined entirely by the native compiler (this is more for personal interest/understanding)?
- Is it possible/feasible to convert mainline LLVM IR to NaCl native code?
- Is it possible/feasible to convert mainline LLVM IR to PNaCl IR?

Feasible in the last two questions means that with a "sensible" amount of effort I could compile an LLVM toolchain to do this.

My guess to the 2nd and 3rd question is no, but wanted to confirm from experts.

Hope that's not too vague!
Thanks

Victor Khimenko

unread,
Jun 18, 2015, 10:02:45 AM6/18/15
to Native Client Discuss
On Thu, Jun 18, 2015 at 12:51 PM, Andrew Parker <andrew.j...@gmail.com> wrote:
I've not spent much time with the PNaCl tooling so only understand PNaCl and NaCl from a high(ish) level. I thus haven't got a good working knowledge of the toolchain so some of my questions may have obvious, many times repeated answers. Apologies in advance if this is the case and please redirect me with extreme predjudice (the links are very welcome of course). 

I'd like to know whether the PNaCl toolchain is now an entirely separate fork from the LLVM mainline or whether it is compatible with it (or perhaps already integrated).  I'm aware that PNaCl uses a frozen version of LLVM IR which is undoubtedly different from today's mainline version. Furthermore, I'm also aware of the significant ABI differences between native code compiled by a "standard" (read LLVM mainline) native compiler and the PNaCl backend for emitting native code.  What I'd like to understand is a couple of things:

- Does the PNaCl IR encode the sandbox ABI differences or are these determined entirely by the native compiler (this is more for personal interest/understanding)?

I'm not sure what are asking about, sorry. The whole idea behind PNaCl is to provide portable ABI.
 
- Is it possible/feasible to convert mainline LLVM IR to NaCl native code?
- Is it possible/feasible to convert mainline LLVM IR to PNaCl IR?

No for both. LLVM uses some knowledge of the target platform at pretty early stages of compilation process. This makes IR platform-dependent which means it no longer portable. I guess some limited forms could be converted (no "struct" function paramenters, etc).
 
Feasible in the last two questions means that with a "sensible" amount of effort I could compile an LLVM toolchain to do this.

My guess to the 2nd and 3rd question is no, but wanted to confirm from experts.

Hope that's not too vague!
Thanks

--
You received this message because you are subscribed to the Google Groups "Native-Client-Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to native-client-di...@googlegroups.com.
To post to this group, send email to native-cli...@googlegroups.com.
Visit this group at http://groups.google.com/group/native-client-discuss.
For more options, visit https://groups.google.com/d/optout.

Mark Seaborn

unread,
Jun 18, 2015, 11:48:40 AM6/18/15
to Native Client Discuss
On 18 June 2015 at 07:02, Victor Khimenko <kh...@chromium.org> wrote:
On Thu, Jun 18, 2015 at 12:51 PM, Andrew Parker <andrew.j...@gmail.com> wrote:
I've not spent much time with the PNaCl tooling so only understand PNaCl and NaCl from a high(ish) level. I thus haven't got a good working knowledge of the toolchain so some of my questions may have obvious, many times repeated answers. Apologies in advance if this is the case and please redirect me with extreme predjudice (the links are very welcome of course). 

I'd like to know whether the PNaCl toolchain is now an entirely separate fork from the LLVM mainline or whether it is compatible with it (or perhaps already integrated).

It is a branch, not a fork.  We merge upstream LLVM into the PNaCl branch fairly regularly.

You can inspect the current merge state by looking at the Git history:

Are you asking what parts of (P)NaCl's functionality are in upstream LLVM and which aren't?

 
I'm aware that PNaCl uses a frozen version of LLVM IR which is undoubtedly different from today's mainline version.

It's important to distinguish between:
 * IR -- the language (which can be represented in memory or serialised to a text or binary file)
 * bitcode format -- one way of serialising IR to a file

PNaCl's bitcode format is a frozen, simplified version of mainline LLVM's bitcode format.  PNaCl's bitcode format represents a subset of LLVM IR.

Note that the PNaCl toolchain is capable of reading and writing both mainline LLVM bitcode and PNaCl bitcode, and it can convert between the two.

 
Furthermore, I'm also aware of the significant ABI differences between native code compiled by a "standard" (read LLVM mainline) native compiler and the PNaCl backend for emitting native code.  What I'd like to understand is a couple of things:

- Does the PNaCl IR encode the sandbox ABI differences or are these determined entirely by the native compiler (this is more for personal interest/understanding)?

By "sandbox ABI" do you mean "how jumps, returns and memory accesses are sandboxed"?  Those details are handled by the backend, and the IR is agnostic about those details.
 
 
- Is it possible/feasible to convert mainline LLVM IR to NaCl native code?

Yes.  The PNaCl toolchain can read mainline LLVM IR.

 
- Is it possible/feasible to convert mainline LLVM IR to PNaCl IR?

Yes.  The PNaCl toolchain does this at link time by running a set of IR simplification passes.

 
No for both. LLVM uses some knowledge of the target platform at pretty early stages of compilation process. This makes IR platform-dependent which means it no longer portable. I guess some limited forms could be converted (no "struct" function paramenters, etc).

You're referring to how the more complex arch-specific C/C++ calling conventions -- such as for passing structs by value -- are handled in LLVM.  If you want ABI compatibility with those calling conventions, then you have to tell the front end (Clang) to target specific architectures such as x86-32 or ARM, and generate arch-specific IR.  That is independent of how NaCl sandboxing is done, though.  NaCl sandboxing does not require arch-specific IR.

Cheers,
Mark

Andrew Parker

unread,
Jul 2, 2015, 3:51:31 AM7/2/15
to native-cli...@googlegroups.com, msea...@chromium.org
Thanks for the replies and sorry about the slow reply on my behalf.  I've been caught up with a bunch of things and my attention was drawn away from this.  I appreciate the quick responses.  Some thoughts below.


On Thursday, 18 June 2015 23:48:40 UTC+8, Mark Seaborn wrote:
On 18 June 2015 at 07:02, Victor Khimenko <kh...@chromium.org> wrote:
On Thu, Jun 18, 2015 at 12:51 PM, Andrew Parker <andrew.j...@gmail.com> wrote:
I've not spent much time with the PNaCl tooling so only understand PNaCl and NaCl from a high(ish) level. I thus haven't got a good working knowledge of the toolchain so some of my questions may have obvious, many times repeated answers. Apologies in advance if this is the case and please redirect me with extreme predjudice (the links are very welcome of course). 

I'd like to know whether the PNaCl toolchain is now an entirely separate fork from the LLVM mainline or whether it is compatible with it (or perhaps already integrated).

It is a branch, not a fork.  We merge upstream LLVM into the PNaCl branch fairly regularly.

You can inspect the current merge state by looking at the Git history:

Are you asking what parts of (P)NaCl's functionality are in upstream LLVM and which aren't?

Yes, that was essentially my question.  I think it's probably best to leave this point for now as I don't have a deep enough understanding of the toolchain to fully appreciate what changes have been made on PNacl's branch.    
 
I'm aware that PNaCl uses a frozen version of LLVM IR which is undoubtedly different from today's mainline version.

It's important to distinguish between:
 * IR -- the language (which can be represented in memory or serialised to a text or binary file)
 * bitcode format -- one way of serialising IR to a file

PNaCl's bitcode format is a frozen, simplified version of mainline LLVM's bitcode format.  PNaCl's bitcode format represents a subset of LLVM IR.

This is slightly confusing.  You make a point of distinguising between IR and bitcode then compare PNaCl's bitcode format with IR.  Can I assume you mean that the PNaCl IR's language is a subset of LLVM IR's language?  Also, genuinely a subset, i.e. there's no additions made to the PNaCl language which aren't in LLVM IR?
 
Note that the PNaCl toolchain is capable of reading and writing both mainline LLVM bitcode and PNaCl bitcode, and it can convert between the two.

 
Furthermore, I'm also aware of the significant ABI differences between native code compiled by a "standard" (read LLVM mainline) native compiler and the PNaCl backend for emitting native code.  What I'd like to understand is a couple of things:

- Does the PNaCl IR encode the sandbox ABI differences or are these determined entirely by the native compiler (this is more for personal interest/understanding)?

By "sandbox ABI" do you mean "how jumps, returns and memory accesses are sandboxed"?  Those details are handled by the backend, and the IR is agnostic about those details.

Yes, precisely those sorts of things.  I was curious to know whether any mods had been made to the IR to make the sandboxing easier.   
 
 
- Is it possible/feasible to convert mainline LLVM IR to NaCl native code?

Yes.  The PNaCl toolchain can read mainline LLVM IR.

This seems to contradict Victor's response in which he states:

No for both. LLVM uses some knowledge of the target platform at pretty early stages of compilation process. This makes IR platform-dependent which means it no longer portable. I guess some limited forms could be converted (no "struct" function paramenters, etc). 

Can you clarify?  I suspect maybe the issue is that both of you are correct.  Perhaps PNaCl is capable of understanding the full LLVM language but platform dependencies mean the code simply would be incompatible.  For example, using sizeof(void*) in C++ code when compiled with a generic clang would produce an int which is bitness dependent.  There are similar problems with endianess and lord knows what else.  Is this the crux of the matter?  If so, then how does PNaCl handle this sort of thing?  Using the example, how would PNaCl handle sizeof(void*)?  Actually, having just read your final point below I think I'm probably correct in my understanding here.  I would guess this makes it pretty difficult to reuse code generated by a different front end.

Andrew Parker

unread,
Jul 2, 2015, 3:54:51 AM7/2/15
to native-cli...@googlegroups.com
Thanks for the response.  I think everything is covered in my response to Mark so I'll save confusion and not answer any more inline here :)

--
You received this message because you are subscribed to a topic in the Google Groups "Native-Client-Discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/native-client-discuss/IYwj5eECFVw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to native-client-di...@googlegroups.com.

Victor Khimenko

unread,
Jul 2, 2015, 5:07:11 AM7/2/15
to Native Client Discuss, Mark Seaborn
On Thu, Jul 2, 2015 at 10:51 AM, Andrew Parker <andrew.j...@gmail.com> wrote:
This seems to contradict Victor's response in which he states:

No for both. LLVM uses some knowledge of the target platform at pretty early stages of compilation process. This makes IR platform-dependent which means it no longer portable. I guess some limited forms could be converted (no "struct" function paramenters, etc). 

Can you clarify?  I suspect maybe the issue is that both of you are correct.  Perhaps PNaCl is capable of understanding the full LLVM language but platform dependencies mean the code simply would be incompatible.

As was explained IR used by PNaCl is a subset of IR used by LLVM. Which means that some constructs used by upstream LLVM couldn't be understood by PNaCl. Also: it's frozen subset. LLVM developers are not obliged to keep all the bits and pieces of IR used by PNaCl in LLVM. That means that at least potentially LLVM could lose the ability to understand PNaCl format. It's not known if that's already happened or not, because there are more serious issue (see below).

The most important fact lies not with the minutae details of IR format but in the fact that PNaCl uses different ABI: if you are lucky you could link functions compiled by upstream arch-dependent LLVM compiler and PNaCl's branch together and if you are really lucky it'll even work. But typical result is a thing which links just fine but then crashes at runtime.
 
For example, using sizeof(void*) in C++ code when compiled with a generic clang would produce an int which is bitness dependent.  There are similar problems with endianess and lord knows what else.  Is this the crux of the matter?

Not really. PNaCl declares that sizeof(void* is 4 and it only supports little endian platforms.

But you are thinking in right direction: x86, e.g., have 80bit floats which are not supported by PNaCl.

As I've already noted biggest problem is calling conventions: many architectures pass arguments in registers. E.g. 64bit int could be passed in a single register %rdi on x86-64 but in register pair %r4/%r5 on ARM. And structs are even worse.

To facilitate such things LLVM uses system dependent extensions (mostly architecture-dependent but some architectures have many different calling conventions) - and they are used early in the process, when IR is initially constructed. If you strip these things then upstream LLVM will not be able to build working program - and PNaCl does not support these extensions.

THAT is what I meant when I've said that upstream LLVM couldn't read PNaCl bitcode: technically it could, but practically it's useless ability because you couldn't use it to create a working program.
 
If so, then how does PNaCl handle this sort of thing?  Using the example, how would PNaCl handle sizeof(void*)?

sizeof(void*) is always 4 (which means that PNaCl-compiled code is not compatible with x86-64), types use natural alignment (and since x86 aligns doubles and longs at 4 bytes, not 8 bytes it means that it's incompatible with x86, too), etc.

Perhaps there are exist architecture which does exactly what PNaCl does (x86 with -malign-double comes close if you are not using 80bit floats), but it differs from what's used by most architectures. Difference are not big but just big enough to make it impossible to link PNaCl code and non-PNaCl code together in most cases.

But then, when we've developed NaCl we've built specs by linking x86-64 code with NaCl-produced x32 code ( https://en.wikipedia.org/wiki/X32_ABI ) and were even able to run some (not all) tests that way which means that sometimes it could work. Just not always.
 
Actually, having just read your final point below I think I'm probably correct in my understanding here.  I would guess this makes it pretty difficult to reuse code generated by a different front end.

Yup. That was my point. Difference in IR is very small, actually, but difference in ABI means that it just does not matter.

Andrew Parker

unread,
Jul 7, 2015, 10:35:50 PM7/7/15
to native-cli...@googlegroups.com
Thanks for the details (again I'm seemingly slow at responding :) ).  I think this gives me enough information for what I'm after.  The discussion is also interesting just from a learning POV.  I have a reasonable understanding of IR and the like, but far from deep.  Hopefully I'll get some time to pursue this stuff in more detail as it's pretty interesting.

FYI roughly speaking I just wanted to get an idea whether it would be possible to build a "binary" (loose use of the term here, I effectively mean a collection of IR object files) with a "generic" build of clang and deploy it (read convert to native format) to many different systems.  Already I know this isn't possible when the set of target systems includes both 32 and 64 bit machines so this means at least 2 front ends and builds are necessary.  It seems that targeting PNaCl would add a third.  The liklihood is that, with the set of systems I have in mind, even more would be needed!



--
You received this message because you are subscribed to a topic in the Google Groups "Native-Client-Discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/native-client-discuss/IYwj5eECFVw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to native-client-di...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages