[LLVMdev] LLVM as a shared library

209 views
Skip to first unread message

Chris Bieneman

unread,
Aug 5, 2014, 3:41:46 PM8/5/14
to LLVM
Hello LLVM community,

Over the last few years the LLVM team here at Apple and development teams elsewhere have been busily working on finding new and interesting uses for LLVM. Some of these uses are traditional compilers, but a growing number of them aren’t. Some of LLVM’s new clients, like WebKit, are embedding LLVM into existing applications. These embedded uses of LLVM have their own unique challenges.

Over the next few months, a few of us at Apple are going to be working on tackling a few new problems that we would like solved in open source so other projects can benefit from them. Some of these efforts will be non-trivial, so we’d like to start a few discussions over the next few weeks.

Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.

The list of the problems we’re currently planning to tackle is:

(1) Reduce or eliminate static initializers, global constructors, and global destructors
(2) Clean up cross compiling in the CMake build system
(3) Update LLVM debugging mechanisms for being part of a dynamic library
(4) Move overridden sys calls (like abort) into the tools, rather than the libraries
(5) Update TableGen to support stripping unused content (i.e. Intrinsics for backends you’re not building)

We will be sending more specific proposals and patches for each of the changes listed above starting this week. If you’re interested in these problems and their solutions, please speak up and help us develop a solution that will work for your needs and ours.

Thanks,
-Chris
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Reid Kleckner

unread,
Aug 5, 2014, 3:59:13 PM8/5/14
to Chris Bieneman, LLVM
Sounds reasonable.

Do you have any plans or interest in annotating adding visibility / export attributes to the API? I'm trying to gauge demand for them.

Chris Bieneman

unread,
Aug 5, 2014, 4:08:30 PM8/5/14
to Reid Kleckner, LLVM
(adding Juergen and Pete who will be working on this with me)

We haven’t fully fleshed out the exact implementation yet. Our target user for the initial work is WebKit. For WebKit we want to generate a shared library which only exports the C API. We were discussing doing this with an exports list for the linker, but visibility annotations is another option.

-Chris

Filip Pizlo

unread,
Aug 5, 2014, 4:21:53 PM8/5/14
to Chris Bieneman, LLVM
This is exciting!

I would be happy to help.


> On Aug 5, 2014, at 12:38 PM, Chris Bieneman <be...@apple.com> wrote:
>
> Hello LLVM community,
>
> Over the last few years the LLVM team here at Apple and development teams elsewhere have been busily working on finding new and interesting uses for LLVM. Some of these uses are traditional compilers, but a growing number of them aren’t. Some of LLVM’s new clients, like WebKit, are embedding LLVM into existing applications. These embedded uses of LLVM have their own unique challenges.
>
> Over the next few months, a few of us at Apple are going to be working on tackling a few new problems that we would like solved in open source so other projects can benefit from them. Some of these efforts will be non-trivial, so we’d like to start a few discussions over the next few weeks.
>
> Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.
>
> The list of the problems we’re currently planning to tackle is:
>
> (1) Reduce or eliminate static initializers, global constructors, and global destructors
> (2) Clean up cross compiling in the CMake build system
> (3) Update LLVM debugging mechanisms for being part of a dynamic library
> (4) Move overridden sys calls (like abort) into the tools, rather than the libraries
> (5) Update TableGen to support stripping unused content (i.e. Intrinsics for backends you’re not building)

Also:

(6) Determine if command line options are the best way of passing configuration settings into LLVM.

It’s an awkward abstraction when LLVM is embedded. I suspect (6) will be closely related to (1) since command line option parsing was the hardest impediment to getting rid of static initializers.

My understanding of the shared library proposal is that the library only exposes the C API since the C++ API is not intended to allow for binary compatibility. So, I think we need to either add the following as either an explicit goal of the shared library work, or as a closely related project:

(7) Make the C API truly great.

I think it’s harmful to LLVM in the long run if external embedders use the C++ API. I think that one way of ensuring that they don’t have an excuse to do it is to flesh out some things:

- Add more tests of the C API to ensure that people don’t break it accidentally and to give more gravitas to the C API backwards compatibility claims.
- Increase C API coverage.
- For example, WebKit currently sidesteps the C API to pass some commandline options to LLVM. We don’t want that.
- Add more support for reasoning about targets and triples. WebKit still has to hardcode triples in some places even though it only ever does in-process JITing where host==target. That’s weird.
- Expose debugging and runtime stuff and make sure that there’s a coherent integration story with the MCJIT C API.
- Currently it’s difficult to round-trip debug info: creating it in C is awkward and parsing DWARF sections that MCJIT generates involves lots of weirdness. WebKit has its own DWARF parser for this, which shouldn’t be necessary.
- WebKit is about to have its own copies of both a compactunwind and EH frame parser. The contributor who “wrote” the EH frame parser actually just took it from LLVM. The licenses are compatible, but nonetheless, copy-paste from LLVM into WebKit should be discouraged.
- Engage with non-WebKit embedders that currently use the C++ API to figure out what it would take to get them to switch to the C API.

I think that a lot of time when C API discussions arise, lots of embedders give excuses for using the C++ API. WebKit used the C API for generating IR and even doing some IR manipulation, and for driving the MCJIT. It’s been a positive experience and we enjoy the binary compatibility that it gives us. I think it would be great to see if other embedders can do the same.

-Filip

Renato Golin

unread,
Aug 5, 2014, 4:53:44 PM8/5/14
to Chris Bieneman, LLVM
On 5 August 2014 20:38, Chris Bieneman <be...@apple.com> wrote:
> (1) Reduce or eliminate static initializers, global constructors, and global destructors
> (2) Clean up cross compiling in the CMake build system
> (3) Update LLVM debugging mechanisms for being part of a dynamic library
> (4) Move overridden sys calls (like abort) into the tools, rather than the libraries
> (5) Update TableGen to support stripping unused content (i.e. Intrinsics for backends you’re not building)

One other thing that I'd like to see is a common framework for
defining, describing and inferring architectural support.

Tools use a lot of string parsing and, as Filip said, command line
options don't normally mean the exact same thing across tools.

So that the same arch/cpu/fpu/abi/target options will be parsed
identically across all tools (and external users) and mean exactly the
same thing to the back-end, when building a new sub-target, it should
only accept a TargetDescription object or whatever holds all the
options.

cheers,
--renato

Tom Stellard

unread,
Aug 5, 2014, 4:55:22 PM8/5/14
to Chris Bieneman, LLVM
On Tue, Aug 05, 2014 at 12:38:49PM -0700, Chris Bieneman wrote:
> Hello LLVM community,
>
> Over the last few years the LLVM team here at Apple and development teams elsewhere have been busily working on finding new and interesting uses for LLVM. Some of these uses are traditional compilers, but a growing number of them aren’t. Some of LLVM’s new clients, like WebKit, are embedding LLVM into existing applications. These embedded uses of LLVM have their own unique challenges.
>
> Over the next few months, a few of us at Apple are going to be working on tackling a few new problems that we would like solved in open source so other projects can benefit from them. Some of these efforts will be non-trivial, so we’d like to start a few discussions over the next few weeks.
>
> Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.
>

This sounds great.

> The list of the problems we’re currently planning to tackle is:
>
> (1) Reduce or eliminate static initializers, global constructors, and global destructors
> (2) Clean up cross compiling in the CMake build system

One problem we have with the Mesa project is that the automake and CMake build
system produce different shared libraries. Automake builds libLLVM-major.minor.so
while CMake builds a different shared library for each component: e.g. libLLVMSupport.so

To cope with this, Mesa's build system has to try to guess which build system
was used in order to find the libraries.

Do you have plans to standardize the shared libraries produced by LLVM's build systems?

Even better, will improving cross compiling in the CMake build system make
it possible to completely drop automake?

-Tom

Eric Christopher

unread,
Aug 5, 2014, 4:56:57 PM8/5/14
to Filip Pizlo, LLVM
> (7) Make the C API truly great.
>
> I think it’s harmful to LLVM in the long run if external embedders use the C++ API. I think that one way of ensuring that they don’t have an excuse to do it is to flesh out some things:
>
> - Add more tests of the C API to ensure that people don’t break it accidentally and to give more gravitas to the C API backwards compatibility claims.
> - Increase C API coverage.
> - For example, WebKit currently sidesteps the C API to pass some commandline options to LLVM. We don’t want that.
> - Add more support for reasoning about targets and triples. WebKit still has to hardcode triples in some places even though it only ever does in-process JITing where host==target. That’s weird.
> - Expose debugging and runtime stuff and make sure that there’s a coherent integration story with the MCJIT C API.
> - Currently it’s difficult to round-trip debug info: creating it in C is awkward and parsing DWARF sections that MCJIT generates involves lots of weirdness. WebKit has its own DWARF parser for this, which shouldn’t be necessary.
> - WebKit is about to have its own copies of both a compactunwind and EH frame parser. The contributor who “wrote” the EH frame parser actually just took it from LLVM. The licenses are compatible, but nonetheless, copy-paste from LLVM into WebKit should be discouraged.
> - Engage with non-WebKit embedders that currently use the C++ API to figure out what it would take to get them to switch to the C API.
>
> I think that a lot of time when C API discussions arise, lots of embedders give excuses for using the C++ API. WebKit used the C API for generating IR and even doing some IR manipulation, and for driving the MCJIT. It’s been a positive experience and we enjoy the binary compatibility that it gives us. I think it would be great to see if other embedders can do the same.
>

Honestly I think if you want to make the C API great we should burn it
to the ground and come up with another one - and one that can be
versioned as well so we don't have the problems of being limited in
what we can do to llvm by needing compatibility with the C API.

-eric

Chris Bieneman

unread,
Aug 5, 2014, 4:58:13 PM8/5/14
to Tom Stellard, LLVM

> On Aug 5, 2014, at 1:33 PM, Tom Stellard <t...@stellard.net> wrote:
>
> On Tue, Aug 05, 2014 at 12:38:49PM -0700, Chris Bieneman wrote:
>> Hello LLVM community,
>>
>> Over the last few years the LLVM team here at Apple and development teams elsewhere have been busily working on finding new and interesting uses for LLVM. Some of these uses are traditional compilers, but a growing number of them aren’t. Some of LLVM’s new clients, like WebKit, are embedding LLVM into existing applications. These embedded uses of LLVM have their own unique challenges.
>>
>> Over the next few months, a few of us at Apple are going to be working on tackling a few new problems that we would like solved in open source so other projects can benefit from them. Some of these efforts will be non-trivial, so we’d like to start a few discussions over the next few weeks.
>>
>> Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.
>>
>
> This sounds great.
>
>> The list of the problems we’re currently planning to tackle is:
>>
>> (1) Reduce or eliminate static initializers, global constructors, and global destructors
>> (2) Clean up cross compiling in the CMake build system
>
> One problem we have with the Mesa project is that the automake and CMake build
> system produce different shared libraries. Automake builds libLLVM-major.minor.so
> while CMake builds a different shared library for each component: e.g. libLLVMSupport.so
>
> To cope with this, Mesa's build system has to try to guess which build system
> was used in order to find the libraries.
>
> Do you have plans to standardize the shared libraries produced by LLVM's build systems?

We had not planned to standardize the build systems, but it is interesting to consider. IMHO, maintaining two build systems is a royal pain.

>
> Even better, will improving cross compiling in the CMake build system make
> it possible to completely drop automake?

I would really like to think so, but there will be quite a bit of work involved in getting everyone using the Automake build system to migrate off. One snag with getting off automake that I’m aware of is compiler-rt. I haven’t been able to make sense of the compiler-rt CMake configs well enough to come up with a good solution for cross-compiling.

-Chris

Rafael Espíndola

unread,
Aug 5, 2014, 5:17:09 PM8/5/14
to Eric Christopher, Duncan Exon Smith, LLVM
> Honestly I think if you want to make the C API great we should burn it
> to the ground and come up with another one - and one that can be
> versioned as well so we don't have the problems of being limited in
> what we can do to llvm by needing compatibility with the C API.

Or at least document what our backwards compatibility promises are and
how we transition away from old APIs.

Two examples where we do break C APIs:

* An hypothetical off by one source range bug in clang. It will break
a user of libclang that might have been compensating for the bug. In
cases like this we seem to just assume there is a low risk and just
fix the bug.

* Dropping features like the old JIT. It will break users of the C API
that depend on the old JIT. In cases like this we provide an upgrade
path (MCJIT) and a deprecation period.

Cheers,
Rafael

Peter Collingbourne

unread,
Aug 5, 2014, 5:20:09 PM8/5/14
to Filip Pizlo, axw...@gmail.com, LLVM

Just to give a bit of perspective from another external LLVM client:

GoLLVM [1], the LLVM bindings for Go used by the llgo compiler [2], mostly
uses the C bindings, but it does need to resort to the C++ API (with its
own set of C bindings) for a few things:

Exporting bitcode to memory buffer:
https://github.com/go-llvm/llvm/blob/master/bitwriter.cpp

Use of attribute masks above 1 << 31:
https://github.com/go-llvm/llvm/blob/master/core.cpp

Debug info generation:
https://github.com/go-llvm/llvm/blob/master/dibuilder.cpp

Loading plugins and setting flags:
https://github.com/go-llvm/llvm/blob/master/support.cpp

Adding instrumentation passes:
https://github.com/go-llvm/llvm/blob/master/transforms_instrumentation.cpp

I think most of this could be upstreamable in some shape or form, but I heard
from debug info experts a few months ago that the IR format was unstable,
so the solution we went with was to wrap the C++ API so that we would be
notified (by the compiler) when the format changes, rather than creating
debug info directly and having it potentially silently discarded. I'm not
sure if the debug info situation has changed since then.

The plugin/flags stuff is valuable to external projects for exactly the same
reasons that Clang supports plugins and LLVM flags. I don't see any reason
to make the specific flag semantics stable, and we can document this as such.

Thanks,
--
Peter

[1] https://github.com/go-llvm/llvm
[2] https://github.com/go-llvm/llgo

Dan Liew

unread,
Aug 5, 2014, 5:28:43 PM8/5/14
to Chris Bieneman, LLVM
Hi Chris,

> Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.

That sounds great. Just as a note, using LLVM libraries for external
projects using CMake was recently improved [1] (mostly by Brad King).
I've never built LLVM as a single shared library (I'm not aware of
there being a CMake option to do so) using CMake but it would be great
if doing this didn't create any problems for users of this interface.

If you'd like me to take a look at anything related to this, please let me know.

> (2) Clean up cross compiling in the CMake build system

Brad King might be interested in this so it might be a good idea to CC
him in any patches related to this.

[1] http://llvm.org/docs/CMake.html#embedding-llvm-in-your-project

Thanks,
Dan.

Filip Pizlo

unread,
Aug 5, 2014, 5:51:34 PM8/5/14
to Eric Christopher, LLVM

Can you come up with specific reasons why building a new API would be better for the community than maintaining the one we’ve got?

-Filip

Eric Christopher

unread,
Aug 5, 2014, 5:54:38 PM8/5/14
to Filip Pizlo, LLVM

Rafael came up with a few in his, but also having an API that lightly
wraps the C++ api is hard if we want to change a major C++ interface
or completely remove a class, etc. There's no existing way in the API
to either version or remove an interface given our current promises.

-eric

Filip Pizlo

unread,
Aug 5, 2014, 5:57:13 PM8/5/14
to Rafael Espíndola, LLVM

> On Aug 5, 2014, at 2:13 PM, Rafael Espíndola <rafael.e...@gmail.com> wrote:
>
>> Honestly I think if you want to make the C API great we should burn it
>> to the ground and come up with another one - and one that can be
>> versioned as well so we don't have the problems of being limited in
>> what we can do to llvm by needing compatibility with the C API.
>
> Or at least document what our backwards compatibility promises are and
> how we transition away from old APIs.

Right. I believe this is the right approach.

>
> Two examples where we do break C APIs:
>
> * An hypothetical off by one source range bug in clang. It will break
> a user of libclang that might have been compensating for the bug. In
> cases like this we seem to just assume there is a low risk and just
> fix the bug.

Yup.

Speaking for WebKit: we would be happy to get rid of workarounds even if it meant a brief period of breakage. We would handle that breakage on our end by ensuring that we don’t build the new (sans workaround) version of WebKit against the old (pre bugfix) version of LLVM or vice-versa. We’re OK with short-term pain for long-term gain.

>
> * Dropping features like the old JIT. It will break users of the C API
> that depend on the old JIT. In cases like this we provide an upgrade
> path (MCJIT) and a deprecation period.

Yup. I wonder how many people still use the old JIT via the C API. I know of old JIT users but I thought that many (most? all?) were using the C++ API.

-Filip

Eric Christopher

unread,
Aug 5, 2014, 5:59:56 PM8/5/14
to Filip Pizlo, LLVM
On Tue, Aug 5, 2014 at 2:54 PM, Filip Pizlo <fpi...@apple.com> wrote:
>
>> On Aug 5, 2014, at 2:13 PM, Rafael Espíndola <rafael.e...@gmail.com> wrote:
>>
>>> Honestly I think if you want to make the C API great we should burn it
>>> to the ground and come up with another one - and one that can be
>>> versioned as well so we don't have the problems of being limited in
>>> what we can do to llvm by needing compatibility with the C API.
>>
>> Or at least document what our backwards compatibility promises are and
>> how we transition away from old APIs.
>
> Right. I believe this is the right approach.
>
>>
>> Two examples where we do break C APIs:
>>
>> * An hypothetical off by one source range bug in clang. It will break
>> a user of libclang that might have been compensating for the bug. In
>> cases like this we seem to just assume there is a low risk and just
>> fix the bug.
>
> Yup.
>
> Speaking for WebKit: we would be happy to get rid of workarounds even if it meant a brief period of breakage. We would handle that breakage on our end by ensuring that we don’t build the new (sans workaround) version of WebKit against the old (pre bugfix) version of LLVM or vice-versa. We’re OK with short-term pain for long-term gain.
>

This is a really good point Filip. I'm totally down with being able to
do the occasional API migration etc. Also since we've got a release
branch now I wonder what the odds are of "we won't break it between
releases" would get? At any rate, being able to migrate API versions
in some clean way would be nice.

I'm almost tempted to want a SWIG wrapped API but I'm pretty sure
that's actually worse than what we've got and still wouldn't solve the
migration issues. ;)

>>
>> * Dropping features like the old JIT. It will break users of the C API
>> that depend on the old JIT. In cases like this we provide an upgrade
>> path (MCJIT) and a deprecation period.
>
> Yup. I wonder how many people still use the old JIT via the C API. I know of old JIT users but I thought that many (most? all?) were using the C++ API.
>

Not sure. I seem to recall a few, but they may have moved off when I
wasn't looking.

-eric

Filip Pizlo

unread,
Aug 5, 2014, 6:02:06 PM8/5/14
to Eric Christopher, LLVM
Can you give a specific example of an intended C++ API change that wasn’t possible because of a C API?

Just because you have an API doesn’t mean that things can’t be deprecated, or that the API layer can’t be hacked to give the illusion of old behavior.  Can you give an example of a C API deprecation proposal that was intended to make some C++ change possible, that was rejected on the grounds that it would break the C API?

I can only recall cases where the C API was broken by accident because of lack of testing, and in all of those cases, the issue was either resolved, or there is a plan to resolve it and a workaround was made available.

-Filip

Eric Christopher

unread,
Aug 5, 2014, 6:24:26 PM8/5/14
to Filip Pizlo, LLVM

Sure, these are going to be a bit vague because I'm a bit busy at the
moment, but I recall a couple of times during the year that we've had
API up for review (or even committed temporarily) that exposed
internal constants via enums, and I think Rafael had some issues with
visibility changes for the same reasons.

In a more recent case here's a thread:

[LLVMdev] Inconsistent third field in global_ctors (was Re: [llvm]
r214321 - UseListOrder: Visit global values)

and

[PATCH] Add return value attribute to C interface

also I think the conversation we were having in here:

[PATCH] Expose MCInst in C Disassembler API

is somewhat relevant :)

Just a couple of quick things I could find with a search. I could
probably dig up more given some more time.

Josh Klontz

unread,
Aug 5, 2014, 7:04:55 PM8/5/14
to Filip Pizlo, LLVM
Filip,

As a non-WebKit embedder currently using the C++ API (www.liblikely.org), here are my thoughts:
- Perhaps the only reason I'm using the C++ API instead of the C API is that the Kaleidoscope tutorial is written against the C++ API. That's where I started, and momentum has prevented me from switching. It may make sense to re-write this tutorial using the C API if we want to encourage new developers to default to this interface.
- Based on the (excellent) documentation online and a close following of this mailing list, I haven't been convinced that the C API is really given first class support in LLVM. Perhaps this is just an issue of advertising better, but it makes me hesitant to change.
- I run into enough minor bugs that my project ends up tracking the master branch pretty closely. As such, I don't think I'll get away from static builds in the near future. Being unable to switch to shared-library-tagged-releases disincentives the API switch.
- A barebones transition guide hosted with the rest of the LLVM docs could lower the activation energy needed to switch.

Hope that helps!

v/r,
Josh

Filip Pizlo

unread,
Aug 5, 2014, 7:45:40 PM8/5/14
to Eric Christopher, LLVM
There appears to be a patch up for review that takes care of this with a slightly careful dance and there is a PR tracking deprecating the bad construct (two-field version) eventually.  So, I don’t think this qualifies as a change that was made impossible by the C API, since the patch demonstrates that it *is* possible.


and

[PATCH] Add return value attribute to C interface

This appears to be an observation that we should extend the API to better handle attributes.  I agree with that observation and with the general sentiment that strings are better than bits.  This isn’t a reason to burn the API to the ground.  Someone should just make the change, which would probably involve allowing C API clients to use either bits or strings for the time being.


also I think the conversation we were having in here:

[PATCH] Expose MCInst in C Disassembler API

The arguments against exposing MCInst were pretty vague.  I never agreed with any of them.  It was an obviously useful addition and someone should still do it.  That being said, this thread just covered adding more stuff to the C API; it’s not an example of a C++ change that couldn’t be made because of the C API.


is somewhat relevant :)

Just a couple of quick things I could find with a search. I could
probably dig up more given some more time.

Your current examples are just small bugs that can be fixed - and in two out of the three examples, there are sensible patches up for review already.

-Filip

Filip Pizlo

unread,
Aug 5, 2014, 7:51:03 PM8/5/14
to Josh Klontz, LLVM
On Aug 5, 2014, at 4:02 PM, Josh Klontz <josh....@gmail.com> wrote:

Filip,

As a non-WebKit embedder currently using the C++ API (www.liblikely.org), here are my thoughts:
- Perhaps the only reason I'm using the C++ API instead of the C API is that the Kaleidoscope tutorial is written against the C++ API. That's where I started, and momentum has prevented me from switching. It may make sense to re-write this tutorial using the C API if we want to encourage new developers to default to this interface.

That’s a great point!  It would be awesome to have this.  I remember that when I started working on WebKit’s FTL JIT I had to sort of mentally translate a lot of examples written against the C++ API.  It wasn’t easy at first.

- Based on the (excellent) documentation online and a close following of this mailing list, I haven't been convinced that the C API is really given first class support in LLVM. Perhaps this is just an issue of advertising better, but it makes me hesitant to change.

It’s first-class enough that WebKit has been using it for months now.  WebKit is a fairly big project and there is overlap between LLVM and WebKit contributors.  In practice this means that the C API doesn’t break often and when it does, it gets fixed.

- I run into enough minor bugs that my project ends up tracking the master branch pretty closely. As such, I don't think I'll get away from static builds in the near future. Being unable to switch to shared-library-tagged-releases disincentives the API switch.

We track master as well, because we also often find bugs.  Using the C API means makes this easier.  With the C++ API you risk having to change stuff on your end because something in the C++ API changed to support some LLVM refactoring.  With the C API we almost never have this problem.

- A barebones transition guide hosted with the rest of the LLVM docs could lower the activation energy needed to switch.

Good point!


Hope that helps!

It does, thanks!

Eric Christopher

unread,
Aug 5, 2014, 8:33:40 PM8/5/14
to Filip Pizlo, LLVM

As I said, I haven't spent a lot of time on it. It's friction etc.

How about a theoretical?

Let's say we write a decent PRE pass that encompasses these two passes:

/** See llvm::createMergedLoadStoreMotionPass function. */
void LLVMAddMergedLoadStoreMotionPass(LLVMPassManagerRef PM);

/** See llvm::createGVNPass function. */
void LLVMAddGVNPass(LLVMPassManagerRef PM);

then, why not delete the existing passes? No need keeping dead code
around right? Except we can't because the passes are in the C API and
someone might be using them.


Another example: We've already had multiple C lto APIs, moving those
has only been possible because we just change everyone and there are
only 2 users.


Here's a different theoretical that I'm considering - the
TargetMachine interface:

LLVMTargetMachineRef LLVMCreateTargetMachine(LLVMTargetRef T,
const char *Triple, const char *CPU, const char *Features,
LLVMCodeGenOptLevel Level, LLVMRelocMode Reloc, LLVMCodeModel CodeModel);

The last 3 arguments here are a little weird.

Optimization doesn't necessarily make sense within the target machine
and should probably be pulled out somewhere else - perhaps the
PassManagerBuilder? Perhaps somewhere else? The code model and reloc
model make a bit more sense, but are more of an overarching object
file matter for the first. The second is definitely something that
deals with a particular machine. Now, let's say we come up with
extensions to allow multiple architectures generate code within a
single object file. The relocation model will need to be part of some
overarching class on top of the TargetMachine that'll need the
relocation model.

This probably won't be necessary until we hit something like ARM/X86
in the same module. Though with the existing ARM/Thumb support and the
way those subtargets are actually held as target machines we'll
actually need this if we want to handle emitting arm and thumb into
the same module. Now we can try rewriting that support ala mips16 for
the mips port and probably should, but with the increasing relevance
of accelerator computing I see this happening sooner rather than
later. I don't want to close the door on it by having a C API that
can't be revved or changed and is even more extensive than this
relatively odd problem I mention here.

I realize it seems mostly like "aaa! change! aaa! fear!" but we've
been able to do pretty well over the years with the C API by keeping
it very general and having a lot of forethought with what we allow
into it. If we're going to expand the C API greatly (as it sounds like
you want) then I feel we're going to run into issues with wanting to
change the C API and not being able to - hence the request for things
like versioning etc.

Nick Lewycky

unread,
Aug 6, 2014, 3:02:01 AM8/6/14
to Filip Pizlo, LLVM
Filip Pizlo wrote:
> This is exciting!
>
> I would be happy to help.
>
>
>> On Aug 5, 2014, at 12:38 PM, Chris Bieneman<be...@apple.com> wrote:
>>
>> Hello LLVM community,
>>
>> Over the last few years the LLVM team here at Apple and development teams elsewhere have been busily working on finding new and interesting uses for LLVM. Some of these uses are traditional compilers, but a growing number of them aren’t. Some of LLVM’s new clients, like WebKit, are embedding LLVM into existing applications. These embedded uses of LLVM have their own unique challenges.
>>
>> Over the next few months, a few of us at Apple are going to be working on tackling a few new problems that we would like solved in open source so other projects can benefit from them. Some of these efforts will be non-trivial, so we’d like to start a few discussions over the next few weeks.
>>
>> Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.
>>
>> The list of the problems we’re currently planning to tackle is:
>>
>> (1) Reduce or eliminate static initializers, global constructors, and global destructors
>> (2) Clean up cross compiling in the CMake build system
>> (3) Update LLVM debugging mechanisms for being part of a dynamic library
>> (4) Move overridden sys calls (like abort) into the tools, rather than the libraries
>> (5) Update TableGen to support stripping unused content (i.e. Intrinsics for backends you’re not building)
>
> Also:
>
> (6) Determine if command line options are the best way of passing configuration settings into LLVM.

They're already banned, so there isn't anything left to determine here,
just code to fix.

> It’s an awkward abstraction when LLVM is embedded. I suspect (6) will be closely related to (1) since command line option parsing was the hardest impediment to getting rid of static initializers.

Yes, for all these reasons. Two libraries may be using llvm under the
hood unaware of each other, they can't both share global state. Command
line flags block that. Our command-line tools should be parsing their
own flags and setting state through some other mechanism, and that state
musn't be more global than an LLVMContext.

> My understanding of the shared library proposal is that the library only exposes the C API since the C++ API is not intended to allow for binary compatibility. So, I think we need to either add the following as either an explicit goal of the shared library work, or as a closely related project:
>
> (7) Make the C API truly great.
>
> I think it’s harmful to LLVM in the long run if external embedders use the C++ API.

The quality with which we maintain the C API today suggests that we
collectively think of it as an albatross to be suffered. There is work
necessary to change that perception too.

I think that one way of ensuring that they don’t have an excuse to do
it is to flesh out some things:
>
> - Add more tests of the C API to ensure that people don’t break it accidentally and to give more gravitas to the C API backwards compatibility claims.

Yes, for well-designed high level APIs like libLTO and libIndex. For
other APIs, we should remove the backwards compatibility guarantees ...

> - Increase C API coverage.

... which in turn allows us to do this.

Designing a good high-level API is hard (even libLTO has very ugly
cracks in its API surface) and that makes it hard to do. What actually
happens is that people write C APIs that closely match the C++ APIs in
order to access them through other languages, but there's no way we can
guarantee compatibility without freezing the C++ API too. Which we never
will. This isn't a theoretical problem either, look at this case:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140804/229354.html
where we made a straight-forward update to the LLVM IR, but in theory a
user of the C API would be able to observe the difference, and that
could in turn break a C API user that was relying on the way old LLVM
worked.

The solution is to offer two levels of C API, one intended for people to
use to bind to their own language. This matches the C++ API closely and
changes when the C++ API changes. (It could even be partially/wholy
auto-generated via a clang tool?) Users of it will be broken with newer
versions.

Secondly, some people really want a stable interface, so we give them an
API expressed in higher-level tasks they want to achieve, so that we can
change the underlying workings of how LLVM works without disturbing the
API. That can be made ABI stable.

> - For example, WebKit currently sidesteps the C API to pass some commandline options to LLVM. We don’t want that.

Seconded!

> - Add more support for reasoning about targets and triples. WebKit still has to hardcode triples in some places even though it only ever does in-process JITing where host==target. That’s weird.

Sounds good.

> - Expose debugging and runtime stuff and make sure that there’s a coherent integration story with the MCJIT C API.
> - Currently it’s difficult to round-trip debug info: creating it in C is awkward and parsing DWARF sections that MCJIT generates involves lots of weirdness. WebKit has its own DWARF parser for this, which shouldn’t be necessary.
> - WebKit is about to have its own copies of both a compactunwind and EH frame parser. The contributor who “wrote” the EH frame parser actually just took it from LLVM. The licenses are compatible, but nonetheless, copy-paste from LLVM into WebKit should be discouraged.

I am not familiar with the MCJIT C API, but this sounds reasonable. I'll
trust that you know what you're doing.

> - Engage with non-WebKit embedders that currently use the C++ API to figure out what it would take to get them to switch to the C API.

Engage with our users? That's crazy talk! ;)

Nick

David Chisnall

unread,
Aug 6, 2014, 4:38:08 AM8/6/14
to Filip Pizlo, LLVM
On 5 Aug 2014, at 21:17, Filip Pizlo <fpi...@apple.com> wrote:

> - Engage with non-WebKit embedders that currently use the C++ API to figure out what it would take to get them to switch to the C API.

I maintain a reasonable amount of out-of-tree code that embeds LLVM in various things, including a couple of language front ends, an out-of-tree back end, and some tools for interfacing with experimental hardare and so suffer the pain of C++ API changes on a fairly frequent basis. The C APIs generally feel clunky to use. If we have a stable C API that is useful, I'd love to see C++ wrappers so that we don't have to suffer things like iterators.

Currently, we conflate 'C' and 'stable'. In many cases, I'd prefer a C++ API to a C one, or would be happy to write (or even use automatically generated) thin C wrappers around C++. The requirement is not the language, it's the stability. I presume this also applies to WebKit: there's no reason why a C++ library should prefer a C API for talking to a C++ library. Stability isn't just a binary thing. We care about several different definitions:

- ABI doesn't change. I can keep using the same binary with new LLVM shared libraries. Symbols are versioned and new code can just use the new version.

- API doesn't change. I have to recompile, but there are SOVERSION bumps whenever I need to and so the version that I need can easily coexist on the same system with newer ones until I get around to recompiling.

- API doesn't change gratuitously. Public APIs change, but only after a deprecation period and not simply to please some developer's aesthetic. We don't randomly rename classes or change capitalisation of functions without at least shipping one release with both the new and old versions working and being marked as deprecated.

Most of LLVM fails even to meet the third requirement. For most of the code that I maintain, I have no strong requirements for the first, would be very happy with the second, and would find the third acceptable.

Some things clearly can't be supported by a set-in-stone interface. Much as I'd love for out-of-tree back ends to be something that people could just ship as plugins, it's not really feasible. There are lots of things in the back end interface that need fixing, and having to support an interface that's defined before they're fixed would be a lot of pain. I also don't see a benefit in exporting these

A few of the things that I maintain out of tree are optimisations. We've added some infrastructure a few releases ago for plugging optimisations into the pipeline, but the APIs required to write optimisations change a lot. They can't be C APIs, because they require inheriting from FunctionPass or similar (we could, perhaps, have a CFunctionPass class that had callbacks for the virtual functions, but it would be quite clunky).

One of the goals of LLVM was that you'd be able to write optimisations that made use of the LLVM infrastructure but only made sense for a particular source language, or even set of idioms used with a particular library. I'd love to see, for example, Qt ship with a plugin that adds optimisations for their slots and signals mechanism. I wouldn't want this in the LLVM tree, because it's completely useless to anyone not using Qt, but currently the only way of guaranteeing that it will work one svn revision into the future is to put it in the LLVM tree.

I currently have a GSoC student working on using LLVM for high-performance packet filtering. The front-end code could probably use the existing C APIs, but then there will be optimisations that are unlikely to make sense for any code that doesn't have this particular structure (e.g. prefetching the next packet based on knowledge of the structure of the network stack's ring buffers). Having an API that was at least useable for two releases for doing this, even if it spat out a lot of deprecated warnings in the second release, would be immensely helpful.

David

Rafael Espíndola

unread,
Aug 6, 2014, 11:07:58 AM8/6/14
to David Chisnall, LLVM
> Currently, we conflate 'C' and 'stable'. In many cases, I'd prefer a C++ API to a C one, or would be happy to write (or even use automatically generated) thin C wrappers around C++. The requirement is not the language, it's the stability. I presume this also applies to WebKit: there's no reason why a C++ library should prefer a C API for talking to a C++ library. Stability isn't just a binary thing. We care about several different definitions:

The additional effort of providing a stable C++ interface is not one I
would like to undertake. Are you offering to write such an interface
and keep it up to date with the underlying LLVM (which is sure to keep
changing at a fast pace)?

Cheers,
Rafael

David Chisnall

unread,
Aug 6, 2014, 11:10:20 AM8/6/14
to Rafael Espíndola, LLVM
On 6 Aug 2014, at 16:04, Rafael Espíndola <rafael.e...@gmail.com> wrote:

>> Currently, we conflate 'C' and 'stable'. In many cases, I'd prefer a C++ API to a C one, or would be happy to write (or even use automatically generated) thin C wrappers around C++. The requirement is not the language, it's the stability. I presume this also applies to WebKit: there's no reason why a C++ library should prefer a C API for talking to a C++ library. Stability isn't just a binary thing. We care about several different definitions:
>
> The additional effort of providing a stable C++ interface is not one I
> would like to undertake. Are you offering to write such an interface
> and keep it up to date with the underlying LLVM (which is sure to keep
> changing at a fast pace)?

Why do you assume that providing a stable C++ interface to a C++ codebase is more effort than providing a stable C interface to a C++ codebase?

David

Rafael Espíndola

unread,
Aug 6, 2014, 11:21:38 AM8/6/14
to David Chisnall, LLVM
> Why do you assume that providing a stable C++ interface to a C++ codebase is more effort than providing a stable C interface to a C++ codebase?

I personally find that the richer the language the harder it is to
remember in all the ways you can break something. But that is really
just a personal opinion or even deficiency. The point is that I am one
of the developers writing LLVM and I wouldn't want to put the effort
into developing and maintaining a stable C++ interface.

This is an open source project, so if you want a stable C++ interface
you will have to find enough developers with a good track record of
contributions to LLVM that think it worth it do develop and maintain
such interface.

Cheers,
Rafael

Pete Cooper

unread,
Aug 6, 2014, 12:18:54 PM8/6/14
to Nick Lewycky, LLVM
On Aug 6, 2014, at 12:00 AM, Nick Lewycky <nich...@mxc.ca> wrote:

Filip Pizlo wrote:
This is exciting!

I would be happy to help.


On Aug 5, 2014, at 12:38 PM, Chris Bieneman<be...@apple.com>  wrote:

Hello LLVM community,

Over the last few years the LLVM team here at Apple and development teams elsewhere have been busily working on finding new and interesting uses for LLVM. Some of these uses are traditional compilers, but a growing number of them aren’t. Some of LLVM’s new clients, like WebKit, are embedding LLVM into existing applications. These embedded uses of LLVM have their own unique challenges.

Over the next few months, a few of us at Apple are going to be working on tackling a few new problems that we would like solved in open source so other projects can benefit from them. Some of these efforts will be non-trivial, so we’d like to start a few discussions over the next few weeks.

Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.

The list of the problems we’re currently planning to tackle is:

(1) Reduce or eliminate static initializers, global constructors, and global destructors
(2) Clean up cross compiling in the CMake build system
(3) Update LLVM debugging mechanisms for being part of a dynamic library
(4) Move overridden sys calls (like abort) into the tools, rather than the libraries
(5) Update TableGen to support stripping unused content (i.e. Intrinsics for backends you’re not building)

Also:

(6) Determine if command line options are the best way of passing configuration settings into LLVM.

They're already banned, so there isn't anything left to determine here, just code to fix.
Right, we’re *very* interested in fixing this code now, we just need to agree on the best solution before we start committing.


It’s an awkward abstraction when LLVM is embedded. I suspect (6) will be closely related to (1) since command line option parsing was the hardest impediment to getting rid of static initializers.

Yes, for all these reasons. Two libraries may be using llvm under the hood unaware of each other, they can't both share global state. Command line flags block that. Our command-line tools should be parsing their own flags and setting state through some other mechanism, and that state musn't be more global than an LLVMContext.
In terms of "(1) Reduce or eliminate static initializers, global constructors, and global destructors”, the command line is the single biggest contributor to these.  I’d really like to get a conversation going about this specific issue and how we’ll solve it.

Just to kick things off, the majority of command line options are in files for passes, not all, but the majority.  I think we should move the command line options in to the passes themselves.  When the initialize method is called on a pass, it currently adds that passes name to the command line anyway (for opt to use).  I propose that passes be given the ability to register their command line options in initialize.

I haven’t yet worked out the cleanest way to do this.  There’s 2 options I think.  Either make some kind of command line registration available via the PassRegistry already passed to initialize, or we add another parameter to initialize which is the command line registry.

What do you think?

Thanks,
Pete

Eric Christopher

unread,
Aug 6, 2014, 12:31:21 PM8/6/14
to Pete Cooper, LLVM
> I haven’t yet worked out the cleanest way to do this. There’s 2 options I
> think. Either make some kind of command line registration available via the
> PassRegistry already passed to initialize, or we add another parameter to
> initialize which is the command line registry.
>
> What do you think?
>

I was thinking about this a while ago w.r.t. the backend options
coming from clang as well. One thought would be to add
elements/structures onto something like the LLVMContext and have them
be able to be set via populating a structure while we take the command
lines out/separated/etc that can also fill in the structure.

I think this would work for most of the command line options within
the passes as well, just give each one their own options structure?

*shrug*

-eric

Filip Pizlo

unread,
Aug 6, 2014, 1:33:37 PM8/6/14
to Nick Lewycky, LLVM
On Aug 6, 2014, at 12:00 AM, Nick Lewycky <nich...@mxc.ca> wrote:


I want to understand what it is specifically that you’re proposing and how it would differ from the current C API.

To me, there are two separate concerns here: the stability of the C API itself and the stability of the IR and other data formats which the C API necessarily exposes.

I agree that C APIs that directly match some underlying C++ APIs are a bad idea.  This is avoidable.  I think the C API usually does a good job of avoiding it.  I don’t think this was the direct problem in the global_ctors case that you just brought up.

The deeper problem is that the C API currently reveals the full power of the LLVM language, and also details of other data formats - for example, MCJIT clients are now encouraged to parse sections and those sections may contain things formatted in tricky ways deep inside LLVM.  Much of the LLVM IR is essentially baked into a bunch of C function signatures.  I see how you may have been referring to this by saying that “people write C APIs that closely match the C++ APIs” - but I just want to emphasize that the problem isn’t with matching a C++ API as much as it is that both the C API and the C++ API are closely matching the LLVM language.  Also, any non-syntactic invariant of the language is effectively revealed through the API.  If the LLVM language changes - for example global_ctors should now be used in a different way - then C API clients might be broken because of it.  Similar things could happen if the stackmap, dwarf, EH frame, or compactunwind formats change.

I believe that the latter problem is very real and I don’t believe that a solution exists that is both practical and absolute.  An absolute solution would surely involve inventing a whole new IR that is meant to be stable forever, and any C API client that generates IR will use this IR instead of the real LLVM IR, and then internally when you create this IR then it is converted into LLVM IR behind the scenes.  You could alternatively view this dystopian future as being equivalent to forever supporting auto-upgrade from all prior versions of IR for clients of the C API.  That seems really dumb to me, because I believe that such a solution would be more expensive than the price we pay right now for the slight instability - bugs like global_ctors are not super common, have limited fallout, and can be worked around by clients if they are given notice.  So, it would be great to come up with a middle ground: we don’t want to throw C API stability out the window because of a few bugs that sometimes require breaking changes but we also don’t want to carve the API out of stone and never leave wiggle room.  I believe that this “super stable except when it isn’t" philosophy is consistent with what most people mean by “stable API” in the sense that well-maintained APIs end up deprecating things and eventually removing them.  I’ve also seen breaking changes get made to stable API on the grounds that all major clients were in the loop and none of them objected.

WebKit has already in the past cooperated through C API changes and will probably continue to do so in the future.  Of course these happened to involve C APIs that didn’t yet fall under the stability rule because they hadn’t shipped yet - but that doesn’t make much of a difference to us.  When we cut a WebKit release branch we lock it against some LLVM branch, so C API stability is only an issue for active development on trunk, and we already know from experience that there exists some amount of wiggle room that we can cope with.  I don’t yet know how to define what that is other than “if you ask us nicely about an API change then we’ll probably say okay”.

Eric Christopher

unread,
Aug 6, 2014, 1:48:14 PM8/6/14
to Filip Pizlo, LLVM
I think you've got some good points here and I think getting the right
balance will be hard, but if it seems that there's community demand
for this then some concrete proposals sound like a good thing here.
I'm also of the opinion that one of the reasons we don't run into this
much is that we're careful about what we open up in the C API. I could
be wrong here, but am cautious about screwing it up and then learning
:)

-eric

Sean Silva

unread,
Aug 6, 2014, 7:29:40 PM8/6/14
to Eric Christopher, LLVM
On Wed, Aug 6, 2014 at 9:28 AM, Eric Christopher <echr...@gmail.com> wrote:
> I haven’t yet worked out the cleanest way to do this.  There’s 2 options I
> think.  Either make some kind of command line registration available via the
> PassRegistry already passed to initialize, or we add another parameter to
> initialize which is the command line registry.
>
> What do you think?
>

I was thinking about this a while ago w.r.t. the backend options
coming from clang as well. One thought would be to add
elements/structures onto something like the LLVMContext and have them
be able to be set via populating a structure while we take the command
lines out/separated/etc that can also fill in the structure.

I think this would work for most of the command line options within
the passes as well, just give each one their own options structure?

I once thought a bit about this situation, and the (probably hopelessly naive) thing that I came up with is something like adding roughly a std::map<StringRef,StringRef> to the LLVMContext, which is essentially just a "string'ly-typed" key-value store. Then we have a couple helper functions that deserialize their respective values, e.g.

int NumTimes = getParameter<int>(KVs,"foo-num-times");
StringRef CFGRoot = getParameter<StringRef>(KVs,"foo-cfgroot");
std::vector<StringRef> UsedFuncs = getParameter<std::vector<StringRef>>(KVs,"foo-used-funcs");
(a layer of caching of the deserialized values might be useful).

The idea is to avoid having a large impedance mismatch between command line options (which are essentially string'ly typed) and the in-memory configuration storage. 
For tools, we have some sort of translation layer that pulls command line options into the KV structure. For more modular needs, the KV structure can be maintained manually and passed in at the desired granularity.

We can also have pass creation functions accept such a KV structure (if relevant) for configuration stuff (or maybe have each pass have an associated helper function that accepts a KV structure and turns it into the actual configuration struct that the pass accepts).

-- Sean Silva

Philip Reames

unread,
Aug 6, 2014, 7:36:21 PM8/6/14
to llv...@cs.uiuc.edu

On 08/05/2014 12:38 PM, Chris Bieneman wrote:
> Hello LLVM community,
>
> Over the last few years the LLVM team here at Apple and development teams elsewhere have been busily working on finding new and interesting uses for LLVM. Some of these uses are traditional compilers, but a growing number of them aren’t. Some of LLVM’s new clients, like WebKit, are embedding LLVM into existing applications. These embedded uses of LLVM have their own unique challenges.
>
> Over the next few months, a few of us at Apple are going to be working on tackling a few new problems that we would like solved in open source so other projects can benefit from them. Some of these efforts will be non-trivial, so we’d like to start a few discussions over the next few weeks.
>
> Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.
>
> The list of the problems we’re currently planning to tackle is:
>
> (1) Reduce or eliminate static initializers, global constructors, and global destructors
> (2) Clean up cross compiling in the CMake build system
> (3) Update LLVM debugging mechanisms for being part of a dynamic library
> (4) Move overridden sys calls (like abort) into the tools, rather than the libraries
> (5) Update TableGen to support stripping unused content (i.e. Intrinsics for backends you’re not building)
>
> We will be sending more specific proposals and patches for each of the changes listed above starting this week. If you’re interested in these problems and their solutions, please speak up and help us develop a solution that will work for your needs and ours.
>
> Thanks,
> -Chris
> _______________________________________________
> LLVM Developers mailing list
> LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
I'm very happy to see this effort happening. I also maintain an our of
tree frontend and deal with these issues on a semi-regular basis.

Worth noting is that we do not use the C interface and have little
interesting in ever doing so. We've chosen to accept the version lock
as a necessary evil - mostly because we need enough internal changes to
LLVM we'd be stuck with it anyway. I'm mostly just pointing this out
for the purpose of contrast with other responders on the thread.

I would honestly love to see a slightly more stable C++ interface, but
I've mostly accepted that's not going to happen. I'm not even talking
stability from release to release; I'd be thrilled with depreciation
periods in terms of days or weeks. We're generally fairly in sync with
ToT, but even with that, we face a lot of breaking changes. There's a
general assumption that only llvm subprojects need to be migrated, and
as soon as that's complete, old APIs are removed. Simply having a
slightly longer period to do our own migrations would be immensely helpful.

Philip

Reply all
Reply to author
Forward
0 new messages