[LLVMdev] [RFC] Resurrecting the C back-end

14 views
Skip to first unread message

Roel Jordans

unread,
Aug 27, 2012, 9:56:26 AM8/27/12
to llv...@cs.uiuc.edu
Hello all,

I am in need for a working C back-end for LLVM for my current research.
I know that the previous incarnation of this back-end has been kicked
out of the tree since the 3.1 release and I have gone through the
archives to restore it to it's previous 'glory'.

So far, I have restored most of the previous version (excluding some of
the parts that needed changes outside of the lib/Targets/CBackend
directory) and I have made the necessary changes to get it back in
'working' state.

I have already had some short discussion on the IRC channel (with
baldric IIRC) and he suggested to include type legalization in the list
of passes to run before generating the output in order to get support
for arbitrarily sized types.

Some other things I am considering for inclusion as improvements to the
new CBackend include the following:

* Simplification of the output
o Printing only the required set of headers/defines for a specific module
o Reducing the number of explicit type casts in the generated code
o Optionally removing the current prefix 'llvm_cbe_' to named variables
o Only printing full prototypes of structs when their internal fields
are actually referred to within the module. (e.g. when using library
calls like fopen a complete description of the struct FILE is generated
whereas a simple 'struct FILE;' should be sufficient.

* An option to insert software floating-point calls and/or library calls
for things like division (I have an embedded processor as target system
in my research which can not always support costly operations)


My hope is that, in generating a more simplified output, it is possible
to produce a more friendly yet portable output.

Furthermore, some of the current features are outside of the scope of my
current work and could make it more difficult for me to maintain the code.

For example, the previous back-end seems to put quite some emphasis on
the different linkage types and the properties of various C compilers
that are required in order to correctly represent them. My guess is that
this is irrelevant for most of the use-cases of the C back-end while it
could take me quite some time to support.

A similar example is the handling of inline assembler statements, which
required a per-target support for e.g. the translation of register
names. For now, this is not something I need (my target architecture is
not supported by LLVM anyway) and I consider myself not yet familiar
enough with the LLVM internals to offer support for this feature.


Anyway, that brings to my final question: Which features are
critical/important/wanted/unwanted for a C back-end?

Cheers,
Roel
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Dmitry N. Mikushin

unread,
Aug 27, 2012, 11:37:03 AM8/27/12
to Roel Jordans, llv...@cs.uiuc.edu
Dear Roel,

Thank you for working on this!
C backend definitely deserves to be resurrected.

Good luck with your effort,
- D.

2012/8/27 Roel Jordans <r.jo...@tue.nl>:

Philipp Klaus Krause

unread,
Aug 27, 2012, 10:23:15 PM8/27/12
to llv...@cs.uiuc.edu
On 27.08.2012 22:56, Roel Jordans wrote:
> * An option to insert software floating-point calls and/or library calls
> for things like division (I have an embedded processor as target system
> in my research which can not always support costly operations)

Shouldn't this be done by the C compiler instead?

Philipp

Philipp Klaus Krause

unread,
Aug 27, 2012, 10:30:33 PM8/27/12
to llv...@cs.uiuc.edu
Will this allow users to compile C++ (or some other language that LLVM
has a frontend for) to C, which then can be compiled using a C compiler
for a target architecture, for which only a C compiler exists?
Which use-cases do you have in mind for this backend?

Philipp

Hongbin Zheng

unread,
Aug 27, 2012, 10:57:51 PM8/27/12
to Philipp Klaus Krause, llv...@cs.uiuc.edu
I think the C backend also allow people performing source-to-source
transform with LLVM (instead of Clang).

ether

陳韋任 (Wei-Ren Chen)

unread,
Aug 27, 2012, 11:10:03 PM8/27/12
to Roel Jordans, llv...@cs.uiuc.edu
Hi Roel,

It's good to know that you're working on C backend. But IMO, the reason that
C backend was removed in LLVM 3.1 is no one maintain the C backend. If you bring
it back, would you like to take the responsibility for the maintaining work?

Regards,
chenwj

--
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

Joshua Cranmer

unread,
Aug 28, 2012, 1:08:03 AM8/28/12
to llv...@cs.uiuc.edu
On 8/27/2012 9:57 PM, Hongbin Zheng wrote:
> I think the C backend also allow people performing source-to-source
> transform with LLVM (instead of Clang).

I do not believe that this would be the case nor that it should be a
goal. Source-to-source transformation requires a lot of accurate
information about the AST, and conversion to LLVM IR is way too lossy.
Signedness, for example, is lost at IR generation, as is any pretense of
machine independence.

--
Joshua Cranmer
News submodule owner
DXR coauthor

Philipp Klaus Krause

unread,
Aug 28, 2012, 1:39:58 AM8/28/12
to llv...@cs.uiuc.edu
On 28.08.2012 14:08, Joshua Cranmer wrote:
> On 8/27/2012 9:57 PM, Hongbin Zheng wrote:
>> I think the C backend also allow people performing source-to-source
>> transform with LLVM (instead of Clang).
>
> I do not believe that this would be the case nor that it should be a
> goal. Source-to-source transformation requires a lot of accurate
> information about the AST, and conversion to LLVM IR is way too lossy.
> Signedness, for example, is lost at IR generation, as is any pretense of
> machine independence.
>

Why is it not possible to have the C backend emit machine-independent
code (i.e. C code that does not rely on implementation-defined behaviour)?

Philipp

Cameron Zwarich

unread,
Aug 28, 2012, 1:47:52 AM8/28/12
to Philipp Klaus Krause, llv...@cs.uiuc.edu
On Aug 27, 2012, at 10:39 PM, Philipp Klaus Krause <p...@spth.de> wrote:

> On 28.08.2012 14:08, Joshua Cranmer wrote:
>> On 8/27/2012 9:57 PM, Hongbin Zheng wrote:
>>> I think the C backend also allow people performing source-to-source
>>> transform with LLVM (instead of Clang).
>>
>> I do not believe that this would be the case nor that it should be a
>> goal. Source-to-source transformation requires a lot of accurate
>> information about the AST, and conversion to LLVM IR is way too lossy.
>> Signedness, for example, is lost at IR generation, as is any pretense of
>> machine independence.
>>
>
> Why is it not possible to have the C backend emit machine-independent
> code (i.e. C code that does not rely on implementation-defined behaviour)?

Because LLVM IR already includes that implementation-defined behaviour.

Cameron

Philipp Klaus Krause

unread,
Aug 28, 2012, 2:20:57 AM8/28/12
to llv...@cs.uiuc.edu
On 27.08.2012 22:56, Roel Jordans wrote:

>
> Anyway, that brings to my final question: Which features are
> critical/important/wanted/unwanted for a C back-end?
>

I'd like it to be easy to configure (e.g. to tell which size int is
assumed to have).

I'd prefer the resulting code to not rely on implementation-defined
behaviour (e.g. not make any assumptions about the size of int).

I'd like the resulting code to containe a lot of (use of data types,
keywords such as cost and restrict) that can be used to generate
optimized code.

As an example, assume I feed some code that uses an int variable into
LLVM. LLVM finds that the value of the variable is assigned only once,
and has a value in the range between 4 and 212. Then the corresponding
variable in the output could be a const uint_fast8_t.

Philipp

Philipp Klaus Krause

unread,
Aug 28, 2012, 2:26:32 AM8/28/12
to Cameron Zwarich, llv...@cs.uiuc.edu
On 28.08.2012 14:47, Cameron Zwarich wrote:
> On Aug 27, 2012, at 10:39 PM, Philipp Klaus Krause <p...@spth.de> wrote:
>
>> On 28.08.2012 14:08, Joshua Cranmer wrote:
>>> On 8/27/2012 9:57 PM, Hongbin Zheng wrote:
>>>> I think the C backend also allow people performing source-to-source
>>>> transform with LLVM (instead of Clang).
>>>
>>> I do not believe that this would be the case nor that it should be a
>>> goal. Source-to-source transformation requires a lot of accurate
>>> information about the AST, and conversion to LLVM IR is way too lossy.
>>> Signedness, for example, is lost at IR generation, as is any pretense of
>>> machine independence.
>>>
>>
>> Why is it not possible to have the C backend emit machine-independent
>> code (i.e. C code that does not rely on implementation-defined behaviour)?
>
> Because LLVM IR already includes that implementation-defined behaviour.
>
> Cameron
>

Sorry, it seems I misunderstood Joshua's mail the first time I did read
it. While the question I asked is one I want to ask, the context may
give a false impression.

Shouldn't work LLVM with the C backend this way:

* The original input is read, and implementation-defined behaviour in
there is assumed to have meaning based on some extra information
supplied (e.g. signedness of char, size of an int, etc)
* LLVM does transformations
* The C backend generates C code, which is machine-independent (i.e.
will behave the same no matter with which C compiler it is compiled with).

The last point is what I meant by "have the C backend emit
machine-independent code (i.e. C code that does not rely on
implementation-defined behaviour)". And I do not see how
implementation-defined behaviour included in LLVM-IR would prevent that.

Philipp

Cameron Zwarich

unread,
Aug 28, 2012, 2:50:45 AM8/28/12
to Philipp Klaus Krause, llv...@cs.uiuc.edu
On Aug 27, 2012, at 11:26 PM, Philipp Klaus Krause <p...@spth.de> wrote:

> The last point is what I meant by "have the C backend emit
> machine-independent code (i.e. C code that does not rely on
> implementation-defined behaviour)". And I do not see how
> implementation-defined behaviour included in LLVM-IR would prevent that.


Without even getting too interesting:

- If your original C program uses uintptr_t (even within the bounds allowed by the standard), that will get turned into some concrete integer type in LLVM IR. But that type might not be large enough to fit a pointer on your target, e.g. if your implementation of C uses fat pointers.

- LLVM assumes that all pointers have the same width (inside of the same address space), but C does not require this.

- LLVM assumes that null pointers are represented with a zero bit pattern, but C does not require this.

Cameron

Roel Jordans

unread,
Aug 28, 2012, 5:14:12 AM8/28/12
to llv...@cs.uiuc.edu


On 28/08/12 04:30, Philipp Klaus Krause wrote:
> Will this allow users to compile C++ (or some other language that LLVM
> has a frontend for) to C, which then can be compiled using a C compiler
> for a target architecture, for which only a C compiler exists?
> Which use-cases do you have in mind for this backend?
>

Possibly yes, compiling C++ to C would require support for things like
exception handling which require more work to be represented in C. I
expect that LLVM has routines to translate exception handling to more C
compatible structures for usage in the other backends. However, this
approach would probably limit the exception handling to work in a
specific way when translated to C which might not be what the user of a
C++ to C compilation flow would like.

In short, I'd need to think about how this should work and how much
would need to be configurable for the user.

My current goal is to be able to use the C backend for my research. I
work within the ASAM project [1] on datapath synthesis for application
specific processors. I have created some application analysis methods
within the LLVM framework and I want to compare their predictions with
real-life results on our target architecture. It is difficult for me to
implement my analysis within our target compiler as it is closed source
but I still want to be sure that the application code has been optimized
in the same way. Therefore I would like to be able to translate the
optimized IR back to C and compile it using the target compiler without
further optimizations. That way I can also support/control some
optimizations better which are more difficult to control in the target
compiler. (writing a complete backend for my target architecture is 'a
bit' too much work for me)


Roel

> Philipp
> _______________________________________________
> LLVM Developers mailing list
> LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

[1] https://www.asam-project.org/

Roel Jordans

unread,
Aug 28, 2012, 5:19:02 AM8/28/12
to llv...@cs.uiuc.edu
Hi chenwj,

I am aware of this and willing to support, at least, the basic operation
of the C backend. I will need to do so for myself anyway and I see that
there are many others which might benefit from it as well.

I am not sure though how much time I will have to support the larger,
more complex features that are outside of my usecase as some of those
might be difficult to replicate for me of outside of my current knowledge.

Anyway, I'd be interested to have the C backend back in LLVM and am
willing to cooperate on that part for as much as I am able.

Regards,
Roel

On 28/08/12 05:10, 陳韋任 (Wei-Ren Chen) wrote:
> Hi Roel,
>
> It's good to know that you're working on C backend. But IMO, the reason that
> C backend was removed in LLVM 3.1 is no one maintain the C backend. If you bring
> it back, would you like to take the responsibility for the maintaining work?
>
> Regards,
> chenwj
>

Sebastian Redl

unread,
Aug 28, 2012, 5:34:51 AM8/28/12
to llv...@cs.uiuc.edu
On 28.08.2012 11:14, Roel Jordans wrote:
> Possibly yes, compiling C++ to C would require support for things like
> exception handling which require more work to be represented in C. I
> expect that LLVM has routines to translate exception handling to more
> C compatible structures for usage in the other backends.
Not really. We have IR representation of stack unwinding, which doesn't
translate well to C. You could probably implement a very expensive SJLJ
mechanism.
> However, this approach would probably limit the exception handling to
> work in a specific way when translated to C which might not be what
> the user of a C++ to C compilation flow would like.
A combination of setjmp/longjmp might work, but would be costly on the
non-exceptional path. Anyway, this was a big limitation of the old C
backend.

Sebastian

Philipp Klaus Krause

unread,
Aug 28, 2012, 9:29:15 PM8/28/12
to llv...@cs.uiuc.edu
On 28.08.2012 15:20, Philipp Klaus Krause wrote:
> On 27.08.2012 22:56, Roel Jordans wrote:
>
>>
>> Anyway, that brings to my final question: Which features are
>> critical/important/wanted/unwanted for a C back-end?
>>
>
> I'd like it to be easy to configure (e.g. to tell which size int is
> assumed to have).
>
> I'd prefer the resulting code to not rely on implementation-defined
> behaviour (e.g. not make any assumptions about the size of int).
>

Ok, Cameron showed me that this one isn't possible with LLVM.

> I'd like the resulting code to containe a lot of (use of data types,
> keywords such as cost and restrict) that can be used to generate
> optimized code.

Here's my use-case: I would use LLVM as a kind of language and
optimization frontend, and use the free sdcc compiler (which has
excellent machine-specific optimization, but is somewhat lacking in
machine-independent optimization) as a backend.
Reply all
Reply to author
Forward
0 new messages