Replacing GCC-XML with Clang

801 views
Skip to first unread message

João Matos

unread,
May 10, 2012, 11:30:03 PM5/10/12
to mono...@googlegroups.com
Since the GCC-XML project seems mostly dead, I started playing with the Clang compiler. It has a nice library that can be called from C (libclang).

I started working on an alternative code generator based on it. There are still a lot of features missing, and testing is minimal, but I'm liking it. It's in C++ at the moment, because it's easier for me to debug into LLVM/Clang. It's pretty small anyway, so it can later be ported to C# with itself :)

Another nice thing that Clang provides is name mangling, so we could just embed both the Itanium and Microsoft mangled name as metadata and there would be no need for all the complexity of mangling the names at runtime. There are still some bugs in Clang with regard to Microsoft name mangling, so this needs some more work before it is a viable alternative. I'm working on a few details at the moment, as are other people.



John Knipper

unread,
May 14, 2012, 1:21:26 AM5/14/12
to mono...@googlegroups.com
GCC-XML is not dead, they just have limited resources and do not provide builds for windows. You have to checkout their code and build it yourself:

It's not very user friendly but it worked for me.

Nonetheless I will be interested in seeing how far you can go with that approach :)

John.

On Fri, May 11, 2012 at 5:30 AM, João Matos <ripzon...@gmail.com> wrote:
GCC-XML

Andreia Gaita

unread,
May 14, 2012, 8:00:06 AM5/14/12
to mono...@googlegroups.com
Hi!

We definitely had plans to use clang, the only reason we didn't
initially was because the gcc-xml output was very handy for quick
prototyping and for validating the binding output against the xml.
Clang is the preferred solution, definitely.

Name mangling at runtime is done because the first pass of the binding
generation creates generic bindings that can later be targetted at any
compiler-specific version of the library. If the mangled names can be
cached earlier, that's cool, it saves a step on the runtime
generation. But still, it's important to keep that separation in mind,
because someone doing a binding for a compiler that clang doesn't
support is going to need that step.

One important optimization, that I'm not sure which state it's at, is
running the pinvoke generation step manually, for when you don't want
to do it at runtime (especially on small devices and on jit-less
ones). Doing this, the separation between the static binding
generation step and the runtime pinvoke generation step becomes much
more blurred, and it's easy to forget that they should be kept
separate.

BTW, what's your repo url?

~~ andreia gaita ~~
scribled on my iPaddle

João Matos

unread,
May 14, 2012, 1:27:21 PM5/14/12
to mono...@googlegroups.com
Yes, I understand why it's done this way, but in practice pretty much
everyone uses the Itanium or the Microsoft one. It's just another bit
of code we have to keep tested, and that the compiler / generator
could already do for us.

I have also thought about the AOT support, and I'm not sure how that
is gonna work. I've been reading the Mono runtime to see how easy it
would be to add the C++ ABI support in the runtime, instead of an
extra C# layer, but I'm not sure what the benefits would be yet. Can
we just force the IL code to be generated, emit it as a DLL, then
redirect it to the AOT engine?

I don't mind working on this over the summer, because I will need it
for my projects on iOS, though I need some direction on what approach
would be more desirable.

I have a fork on Github, though I still have not pushed anything,
there are still patches I'm trying to get into Clang, else everyone
will need to get my fork of it, which is undesirable. I'll try and
push something in the next few days.

On a related one, and I don't think it's worth a different email,
there are a couple pull requests on Github that should be merged soon.

https://github.com/mono/cxxi/pull/2

That one has major MSVC improvements in name mangling.

https://github.com/mono/cxxi/pull/4

There is also this one with some minor improvements.

--
João Matos

Andreia Gaita

unread,
May 14, 2012, 4:56:16 PM5/14/12
to mono...@googlegroups.com
On Mon, May 14, 2012 at 6:27 PM, João Matos <ripzon...@gmail.com> wrote:
> Yes, I understand why it's done this way, but in practice pretty much
> everyone uses the Itanium or the Microsoft one. It's just another bit
> of code we have to keep tested, and that the compiler / generator
> could already do for us.

Sure, if the generator can output those values, that's fine, but at
runtime the pinvoke generator should still be able to call into a name
mangling module if it needs to target something clang can't support.
"In practice pretty much everyone uses X" is = assumptions are the
mother of all fuckups.

> I have also thought about the AOT support, and I'm not sure how that
> is gonna work. I've been reading the Mono runtime to see how easy it
> would be to add the C++ ABI support in the runtime, instead of an
> extra C# layer, but I'm not sure what the benefits would be yet. Can
> we just force the IL code to be generated, emit it as a DLL, then
> redirect it to the AOT engine?

I'm talking about emitting the DLL. I know me and corrado talked about
that and there was some work done, but I have no idea if it ever got
in. That's the most important optimization, since once you have that,
you're pretty much set.

> I don't mind working on this over the summer, because I will need it
> for my projects on iOS, though I need some direction on what approach
> would be more desirable.

I'll look into what was done and what's missing, I may have some
forgotten code around here for that.

> On a related one, and I don't think it's worth a different email,
> there are a couple pull requests on Github that should be merged soon.
>
> https://github.com/mono/cxxi/pull/2
>
> That one has major MSVC improvements in name mangling.
>
> https://github.com/mono/cxxi/pull/4
>
> There is also this one with some minor improvements.

Ah, yes, these are still hanging. :P I'll work on merging all that stuff in.


shana
--------
blog.worldofcoding.com
github.com/andreiagaita

Alex Corrado

unread,
May 14, 2012, 6:18:49 PM5/14/12
to mono...@googlegroups.com
Hey,

>> I started playing with the Clang compiler. It has a nice library that can be called from C (libclang).

This is excellent! Though I am curious if the C interface is rich
enough for our needs. Another possibility would be binding clang's C++
API with cxxi using the old generator. It would be more work, but it
would help dogfood cxxi. Having rich managed bindings to clang's
parser could also help other projects-- like MonoDevelop's C/C++
support for example.

>>Another nice thing that Clang provides is name mangling, so we could just embed both the Itanium and Microsoft mangled name as metadata and there would be no need for all the complexity of mangling the names at runtime.

Most of the complexity is not actually in the name mangling
implementations. Yes, we need a lot of metadata to enable the name
mangling, but that metadata is also needed to inform the class
layouts, vtable layouts, calling conventions and a myriad of other
things that can differ between C++ ABIs.

> I'm talking about emitting the DLL. I know me and corrado talked about
> that and there was some work done, but I have no idea if it ever got
> in. That's the most important optimization, since once you have that,
> you're pretty much set.

I called this "static mode," and did devise a plan for it awhile back.
Here is my proposal:

In static mode, you would run the generator and create your bindings
like before. When you compiled it, you would get your bindings DLL
that would be linked with Mono.Cxxi.dll and use Reflection.Emit at
runtime. This is no different than before. However, we would add an
additional tool that would load the assembly and inspect the IL for
interfaces that extend ICppClass. You would also pass to this tool a
specific ABI implementation, and for each ICppClass interface that we
find, we would generate the implementation class for that interface
using the given ABI. A registrar would then keep track of the
implementations, either added statically by this tool, or at runtime
by the old Ref.Emit codepath. If, at runtime, the registrar didn't
find an impl for a call to CppLibrary.GetClass, we could either
generate it like we currently do, or throw an exception (for platforms
that don't allow JITing). This last part has already been started on
the "static-mode" branch
(https://github.com/mono/cxxi/tree/static-mode).


-Alex

João Matos

unread,
May 14, 2012, 7:31:12 PM5/14/12
to mono...@googlegroups.com
> This is excellent! Though I am curious if the C interface is rich
> enough for our needs. Another possibility would be binding clang's C++
> API with cxxi using the old generator. It would be more work, but it
> would help dogfood cxxi. Having rich managed bindings to clang's
> parser could also help other projects-- like MonoDevelop's C/C++
> support for example.

I had to extend it to support getting the mangling of the names, but
it's just a new call on the API. The only problem I faced was that
Clang expects you to tell it the ABI when creating the initial parsing
context. So for instance, if it parses the C++ code with the Itanium
ABI (which is the default), you can still get the MSVC mangling, but
it won't assume certain things like thiscall on constructors, so the
mangling will be slightly wrong.

I'm still figuring out if it's possible to get both manglings without
parsing the headers twice, but this will probably need some patches.
But this is only if we want to cache the mangling, if we use the
calculation in Cxxi this is not a problem.

I'll probably switch to the C++ API soon, because it's easier to get
to the advanced details not exposed by the C API. Also the C++ API
allows us to use the plugin API, so you could do the generation as
part of the regular build process if you are already using Clang,
without parsing the code twice.

> I called this "static mode," and did devise a plan for it awhile back.
> Here is my proposal:
>
> In static mode, you would run the generator and create your bindings
> like before. When you compiled it, you would get your bindings DLL
> that would be linked with Mono.Cxxi.dll and use Reflection.Emit at
> runtime. This is no different than before. However, we would add an
> additional tool that would load the assembly and inspect the IL for
> interfaces that extend ICppClass. You would also pass to this tool a
> specific ABI implementation, and for each ICppClass interface that we
> find, we would generate the implementation class for that interface
> using the given ABI. A registrar would then keep track of the
> implementations, either added statically by this tool, or at runtime
> by the old Ref.Emit codepath. If, at runtime, the registrar didn't
> find an impl for a call to CppLibrary.GetClass, we could either
> generate it like we currently do, or throw an exception (for platforms
> that don't allow JITing). This last part has already been started on
> the "static-mode" branch
> (https://github.com/mono/cxxi/tree/static-mode).

Alright, this is how I was expecting it to work. I'll research that
branch once I get the generator done.

--
João Matos
Reply all
Reply to author
Forward
0 new messages