[LLVMdev] MCStreamer itnerface

86 views
Skip to first unread message

Nathan Jeffords

unread,
May 4, 2010, 2:03:58 PM5/4/10
to llv...@cs.uiuc.edu

This is a brain-dump of my thoughts on the MCStreamer interface after several
days of digging around trying to get a COFF writer working.

All fragments should be associated with a symbol. For assembler components, a
unnammed "virtual" symbol can be used when there is no explicit label defined.

Section assignment should be the responsiblity of the object imlementing the
MCStreamer interface, with the caller givin the ability to give hints as to
what section to place the symbol into.

instead of SwitchSection, there would be BeginSymbol, and SymbolSymbol, it would
be illegal to call any EmitXXX function outside of these two calls

BeginSymbol(Symbol, SectionHint)
  EmitAttribute(...)
  EmitAttribute(...)
  ...
StartFragmentEmission()
  EmitFragment(...)
  EmitFragment(...)
  ...
EndSymbol()

Object file writers would typically start recording fragments and attributes for
a symbol on the BeginSymbol, then at EndSymbol they would evaluate what was
streamed, and decide what section the symbol should be placed in.

Assembly writers could with some state data emit assemble as emission calls are
made. Assembler parsers could use 'section symbols' to provide section level
attributes.

Nathan Jeffords

unread,
May 4, 2010, 3:53:50 PM5/4/10
to llv...@cs.uiuc.edu
I should probably elaborate on why I feel the interface should be as such.

It seems to be the common case is compiler outputting to object files. In this case, all fragments are associated with symbols. What section the fragments go into is generally irrelevant to the compiler, except in special cases like global variable constructor/destructor lists and the like.

At code generation time, the compiler would for all normal symbols, not specify any grouping information, and allow the streamer to make decisions about where to place data based of the content. For special scenarios, like constructor/destructor a set of hints could be defined to produce the expected behavior.

i.e.
struct SectionHint {};

// used by assemblers to put symbols into the
// section specified by a section directive
struct NamedSectionHint : SectionHint {
  // then name of the section to put the symbol in
  std::string Name;
};


struct OrderedNamedHint : NamedSection {
  // the order to place the symbol into the section when finally linked,
  // each different "ordered section" should produce a separate section
  // in the object file
  // positive means put in ascending order before any other sections with the same name
  //  1 if first, 2 is second, and so on
  // negative means put in descending order after any other sections with the same name
  // -1 is last, -2 is second to last, and so on
  int Order;
};

Any NamedHints would show up in between positive & negative named hints with the same name.

Chris Lattner

unread,
May 5, 2010, 2:15:10 PM5/5/10
to Nathan Jeffords, Daniel Dunbar, LLVM Developers Mailing List
On May 4, 2010, at 11:03 AM, Nathan Jeffords wrote:
> This is a brain-dump of my thoughts on the MCStreamer interface after several
> days of digging around trying to get a COFF writer working.

Great! Something that is worth pointing out is that the MCStreamer API is intended to directly reflect what is happening in .s files. We basically want one MCStreamer callback to correspond to one statement in the .s file. This makes it easier to handle from the compiler standpoint, but is also very important for the llvm-mc assembly parser itself.

> All fragments should be associated with a symbol. For assembler components, a
> unnammed "virtual" symbol can be used when there is no explicit label defined.

What do you mean by fragment? Can you give me an analogy with what the syntax looks like in a .s file, I'm not sure exactly what you mean here.

> Section assignment should be the responsiblity of the object imlementing the
> MCStreamer interface, with the caller givin the ability to give hints as to
> what section to place the symbol into.

Section assignment really needs to happen at a higher level. The TargetLoweringObjectFile interfaces are the ones responsible for mapping a global/function -> section. This interface (not mcstreamer) should handle this.

The important point here is that the COFF MCSection needs to have the right level of semantic information. In fact, MCSection is the place that I'd start for COFF bringup.

> instead of SwitchSection, there would be BeginSymbol, and SymbolSymbol, it would
> be illegal to call any EmitXXX function outside of these two calls
>
> BeginSymbol(Symbol, SectionHint)
> EmitAttribute(...)
> EmitAttribute(...)
> ...
> StartFragmentEmission()
> EmitFragment(...)
> EmitFragment(...)
> ...
> EndSymbol()
>
> Object file writers would typically start recording fragments and attributes for
> a symbol on the BeginSymbol, then at EndSymbol they would evaluate what was
> streamed, and decide what section the symbol should be placed in.

Why do you need this? This concept doesn't exist in the .s file, so I don't think that MCStreamer is the right level for this.

-Chris


_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Nathan Jeffords

unread,
May 5, 2010, 4:22:19 PM5/5/10
to Chris Lattner, llv...@cs.uiuc.edu
On Wed, May 5, 2010 at 11:15 AM, Chris Lattner <clat...@apple.com> wrote:
On May 4, 2010, at 11:03 AM, Nathan Jeffords wrote:
...  We basically want one MCStreamer callback to correspond to one statement in the .s file.  This makes it easier to handle from the compiler standpoint, but is also very important for the llvm-mc assembly parser itself.

This is an assumption I question. From an evolutionary perspective I agree; Given the existing code base I do see this as a logical transformation. As far as the assembly parser/streamer is concerned it certainly simplifies their implementations. But I also think that this interface could evolve in a direction that simplifies the common case (compiler -> object file) at a small expense to handling assembly language files.

> All fragments should be associated with a symbol. For assembler components, a
> unnammed "virtual" symbol can be used when there is no explicit label defined.

What do you mean by fragment?  Can you give me an analogy with what the syntax looks like in a .s file, I'm not sure exactly what you mean here.

I use the term fragment to refer to the MCFragment class and its derivatives. I understand that to mean any entity representing data in the final linked and loaded form. (something with an address)

> Section assignment should be the responsiblity of the object imlementing the
> MCStreamer interface, with the caller givin the ability to give hints as to
> what section to place the symbol into.

Section assignment really needs to happen at a higher level.  The TargetLoweringObjectFile interfaces are the ones responsible for mapping a global/function -> section.  This interface (not mcstreamer) should handle this.

The important point here is that the COFF MCSection needs to have the right level of semantic information.  In fact, MCSection is the place that I'd start for COFF bringup.

OK, I see that now. The current isolation between TargetLoweringObjectFile -> MCStreamer -> MCObjectWriter has proven somewhat problematic, mostly due to my lack of understanding. I guess MCSectionXXX was meant to provide communication between them. Should the same be true of MCSymbol, and their data counterparts?
 
> instead of SwitchSection, there would be BeginSymbol, and SymbolSymbol, it would
> be illegal to call any EmitXXX function outside of these two calls
>
> BeginSymbol(Symbol, SectionHint)
>   EmitAttribute(...)
>   EmitAttribute(...)
>   ...
> StartFragmentEmission()
>   EmitFragment(...)
>   EmitFragment(...)
>   ...
> EndSymbol()
>
> Object file writers would typically start recording fragments and attributes for
> a symbol on the BeginSymbol, then at EndSymbol they would evaluate what was
> streamed, and decide what section the symbol should be placed in.

Why do you need this?  This concept doesn't exist in the .s file, so I don't think that MCStreamer is the right level for this.
 
I realize that my expectations of the MCStreamer interface are not quite the same as the intentions behind its design. It seemed to be that having these calls with the restrictions on when they would be allowed would simplify an object file writers job sifting though the incoming data and organizing into what will become the output file.

I had a problem with MCStreamer::EmitCommonSymbol & MCStreamer::EmitLocalCommonSymbol. When I implemented them I assumed this meant to put those symbols into the .bss segment. This required me to get a hold of the TLOF from the streamer. I now realize this is wrong after re-reading the description of the '.comm' directive a few times.  I am not sure why an uninitialized global variable was being emitted using this, that seems wrong since global variables in different compilation units with the same name would get merged together at link time. (this is using clang on a C source file)

Thanks for taking the time to read and respond to my post. I think I need to get better acquainted with these components.

- Nathan

Eli Friedman

unread,
May 5, 2010, 5:11:53 PM5/5/10
to Nathan Jeffords, llv...@cs.uiuc.edu
On Wed, May 5, 2010 at 1:22 PM, Nathan Jeffords <blunte...@gmail.com> wrote:
> I had a problem with MCStreamer::EmitCommonSymbol
> & MCStreamer::EmitLocalCommonSymbol. When I implemented them I assumed this
> meant to put those symbols into the .bss segment. This required me to get a
> hold of the TLOF from the streamer. I now realize this is wrong after
> re-reading the description of the '.comm' directive a few times.  I am not
> sure why an uninitialized global variable was being emitted using this, that
> seems wrong since global variables in different compilation units with the
> same name would get merged together at link time. (this is using clang on a
> C source file)

Global definitions like "int x;" are treated as common to allow
linking buggy programs that forget to use "extern" on declarations.

-Eli

Nathan Jeffords

unread,
May 5, 2010, 5:32:58 PM5/5/10
to Eli Friedman, llv...@cs.uiuc.edu

Global definitions like "int x;" are treated as common to allow
linking buggy programs that forget to use "extern" on declarations.

Is this always the behavior, or only when certain options are set? This seems like a violation of the language standard.

Dale Johannesen

unread,
May 5, 2010, 5:45:35 PM5/5/10
to Nathan Jeffords, llv...@cs.uiuc.edu
Technically yes; the original K&R C book had the one-definition rule in it.  Early C compilers did not work this way, however, and by the time that book was published (1978) there was already a large body of code that assumed the "common" model (also Ritchie's preference, I believe).  In practice most compilers still default to this model because a lot of widely used stuff will break if they don't, and the behavior is given in J.5.11 of C99 as a common extension.

Use -fno-common to turn it off.

Nathan Jeffords

unread,
May 5, 2010, 5:48:44 PM5/5/10
to Dale Johannesen, llv...@cs.uiuc.edu

Somewhere I had got it in my head that global variables had static storage class by default. I guess I was wrong.

On May 5, 2010 2:45 PM, "Dale Johannesen" <da...@apple.com> wrote:


On May 5, 2010, at 2:32 PMPDT, Nathan Jeffords wrote:

>>

>> Global definitions like "int x;" are t...

Eugene Toder

unread,
May 5, 2010, 5:52:52 PM5/5/10
to Nathan Jeffords, llv...@cs.uiuc.edu
Re having all fragments associated with some symbol -- this makes
sense if you think in high level terms and assume all symbols to be
some "objects". All data (fragments) you want to output is associated
with some "object" (symbol). However, that's probably too high level
thinking for MC interface. High level objects might not directly
correspond to object-file level symbols. For example, module level
inline assembler does not correspond to any symbol, or function may
have more than one symbol when aliases are used.

Common is not .bss, it's an archaic concept inherited from Fortran. C
language specifies that global uninitialized variables are put into
common. This isn't for "programs that forget to use extern" -- you
can't get the same behaviour with extern, common variables are glued
together and with "normal" variables, so no object is exclusively owns
the variable. There's also some subtle difference when linking
archives.

Eugene

Nathan Jeffords

unread,
May 5, 2010, 7:38:35 PM5/5/10
to Eugene Toder, llv...@cs.uiuc.edu
On Wed, May 5, 2010 at 2:52 PM, Eugene Toder <elt...@gmail.com> wrote:
Re having all fragments associated with some symbol -- this makes
sense if you think in high level terms and assume all symbols to be
some "objects". All data (fragments) you want to output is associated
with some "object" (symbol). However, that's probably too high level
thinking for MC interface. High level objects might not directly
correspond to object-file level symbols.

I agree with you one this, to me its a question of whose responsibility it is to determine the mapping, the compiler or the object file format. Another example would be labels, they are not high level objects in an of themselves, but must be represented as symbols in current object file formats.

For example, module level
inline assembler does not correspond to any symbol, or function may
have more than one symbol when aliases are used.

module level assembly could easily be dealt with as an unnamed symbol that the object file is free to not create a object-level symbol for (though the only context I can see a use for this I think it would still be useful to identity such code).
 
When you say aliases, I assume you mean embedding object level symbol records pointing to the same high-level symbol. A high level symbol could easily report multiple names though its interface.


Common is not .bss, it's an archaic concept inherited from Fortran. C
language specifies that global uninitialized variables are put into
common. This isn't for "programs that forget to use extern" -- you
can't get the same behaviour with extern, common variables are glued
together and with "normal" variables, so no object is exclusively owns
the variable. There's also some subtle difference when linking
archives.

Thanks for the details of this, I will have to update my COFF writer to properly put these symbols into a COMDAT section, as I currently put them into .bss section with static linkage.

- Nathan

Chris Lattner

unread,
May 5, 2010, 7:37:09 PM5/5/10
to Nathan Jeffords, llv...@cs.uiuc.edu
On May 5, 2010, at 1:22 PM, Nathan Jeffords wrote:
On Wed, May 5, 2010 at 11:15 AM, Chris Lattner <clat...@apple.com> wrote:
On May 4, 2010, at 11:03 AM, Nathan Jeffords wrote:
...  We basically want one MCStreamer callback to correspond to one statement in the .s file.  This makes it easier to handle from the compiler standpoint, but is also very important for the llvm-mc assembly parser itself.

This is an assumption I question. From an evolutionary perspective I agree; Given the existing code base I do see this as a logical transformation. As far as the assembly parser/streamer is concerned it certainly simplifies their implementations. But I also think that this interface could evolve in a direction that simplifies the common case (compiler -> object file) at a small expense to handling assembly language files.

The logic to handle this has to go somewhere, putting it in the MCStreamer *implementation* that needs it is the most logical place.  We also aim to implement an assembler, it doesn't make sense to duplicate this logic in the compiler and the assembler parser.

> All fragments should be associated with a symbol. For assembler components, a
> unnammed "virtual" symbol can be used when there is no explicit label defined.

What do you mean by fragment?  Can you give me an analogy with what the syntax looks like in a .s file, I'm not sure exactly what you mean here.

I use the term fragment to refer to the MCFragment class and its derivatives. I understand that to mean any entity representing data in the final linked and loaded form. (something with an address)

Ok, MCFragment should definitely be formed behind the MCStreamer implementation.  The .s printing implementation of MCStreamer, for example, has no use for it.  With the current design, it would be a layering violation to make it earlier.


> Section assignment should be the responsiblity of the object imlementing the
> MCStreamer interface, with the caller givin the ability to give hints as to
> what section to place the symbol into.

Section assignment really needs to happen at a higher level.  The TargetLoweringObjectFile interfaces are the ones responsible for mapping a global/function -> section.  This interface (not mcstreamer) should handle this.

The important point here is that the COFF MCSection needs to have the right level of semantic information.  In fact, MCSection is the place that I'd start for COFF bringup.

OK, I see that now. The current isolation between TargetLoweringObjectFile -> MCStreamer -> MCObjectWriter has proven somewhat problematic, mostly due to my lack of understanding. I guess MCSectionXXX was meant to provide communication between them. Should the same be true of MCSymbol, and their data counterparts?

Yes somewhat.  Currently, the COFF implementation of the assembler backend should maintain a DenseMap from MCSymbol* to whatever data you need to associate with a symbol.  This is equivalent to embedding per-symbol stuff in the MCSymbol itself.  MCSection should be subclassed and you should put COFF specific stuff in MCSectionCOFF.

I had a problem with MCStreamer::EmitCommonSymbol & MCStreamer::EmitLocalCommonSymbol. When I implemented them I assumed this meant to put those symbols into the .bss segment. This required me to get a hold of the TLOF from the streamer. I now realize this is wrong after re-reading the description of the '.comm' directive a few times.  I am not sure why an uninitialized global variable was being emitted using this, that seems wrong since global variables in different compilation units with the same name would get merged together at link time. (this is using clang on a C source file)

As others have pointed out, this is one of the many horrors of C :)

-Chris

Nathan Jeffords

unread,
May 5, 2010, 8:22:38 PM5/5/10
to Chris Lattner, llv...@cs.uiuc.edu

The logic to handle this has to go somewhere, putting it in the MCStreamer *implementation* that needs it is the most logical place.  We also aim to implement an assembler, it doesn't make sense to duplicate this logic in the compiler and the assembler parser.


Assembly language has often been *the* intermediate form for between compilers and object files/executables, but I don't think its the most effective form. That said I have limited experience writing code generators so my opinions do not bear the wisdom of you and other developers of this library on this topic.

> All fragments should be associated with a symbol. For assembler components, a
> unnammed "virtual" symbol can be used when there is no explicit label defined.

What do you mean by fragment?  Can you give me an analogy with what the syntax looks like in a .s file, I'm not sure exactly what you mean here.

I use the term fragment to refer to the MCFragment class and its derivatives. I understand that to mean any entity representing data in the final linked and loaded form. (something with an address)

Ok, MCFragment should definitely be formed behind the MCStreamer implementation.  The .s printing implementation of MCStreamer, for example, has no use for it.  With the current design, it would be a layering violation to make it earlier.


I agree with this completely, I quite like that aspect of the design: The streamer putting fragments into sections and allowing the assembler to combine it all resolving fix-ups when it can, letting the writer deal those it can't.
 
Yes somewhat.  Currently, the COFF implementation of the assembler backend should maintain a DenseMap from MCSymbol* to whatever data you need to associate with a symbol.  This is equivalent to embedding per-symbol stuff in the MCSymbol itself.  MCSection should be subclassed and you should put COFF specific stuff in MCSectionCOFF.

I think this is an important detail I was missing. I can already see how this will help with COMDAT sections. Is there any reason for the difference between symbol and section in this respect?

As others have pointed out, this is one of the many horrors of C :)


Another reason why I am attempting to develop my own language. :)

p.s. I posted my coff backend patch to llvm-commit, but that apears to be the wrong place, where should I have posted it?

- Nathan

Chris Lattner

unread,
May 5, 2010, 8:53:11 PM5/5/10
to Nathan Jeffords, Daniel Dunbar, LLVM Dev
On May 5, 2010, at 5:22 PM, Nathan Jeffords wrote:


The logic to handle this has to go somewhere, putting it in the MCStreamer *implementation* that needs it is the most logical place.  We also aim to implement an assembler, it doesn't make sense to duplicate this logic in the compiler and the assembler parser.


Assembly language has often been *the* intermediate form for between compilers and object files/executables, but I don't think its the most effective form. That said I have limited experience writing code generators so my opinions do not bear the wisdom of you and other developers of this library on this topic.

I completely agree, but it is a very important and effective form of communication :)

One nice fallout of the MCStreamer design is that once the COFF writer is available, we'll have a stand-alone coff assembler mostly "for free".  In fact, developing this as a coff assembler (which can be accessed with 'llvm-mc foo.s -o foo.obj -filetype=obj') is easier in a lot of ways than dealing with the compiler!


Yes somewhat.  Currently, the COFF implementation of the assembler backend should maintain a DenseMap from MCSymbol* to whatever data you need to associate with a symbol.  This is equivalent to embedding per-symbol stuff in the MCSymbol itself.  MCSection should be subclassed and you should put COFF specific stuff in MCSectionCOFF.

I think this is an important detail I was missing. I can already see how this will help with COMDAT sections. Is there any reason for the difference between symbol and section in this respect?

You'd have to ask Daniel about this.  I don't recall if this is a short term thing that he'd like to fix or if this is an important design decision.


As others have pointed out, this is one of the many horrors of C :)


Another reason why I am attempting to develop my own language. :)

p.s. I posted my coff backend patch to llvm-commit, but that apears to be the wrong place, where should I have posted it?

llvm-commits is a great place for it!

-Chris

Peter S. Housel

unread,
May 7, 2010, 2:12:13 AM5/7/10
to Nathan Jeffords, llv...@cs.uiuc.edu
On Wed, 2010-05-05 at 13:22 -0700, Nathan Jeffords wrote:

>
> The important point here is that the COFF MCSection needs to
> have the right level of semantic information. In fact,
> MCSection is the place that I'd start for COFF bringup.
>
> OK, I see that now. The current isolation
> between TargetLoweringObjectFile -> MCStreamer -> MCObjectWriter has
> proven somewhat problematic, mostly due to my lack of understanding.
> I guess MCSectionXXX was meant to provide communication between them.
> Should the same be true of MCSymbol, and their data counterparts?

I'm enclosing my patch for reforming MCSectionCOFF to match the
implementation strategy of the other two MCSection classes. You may find
it useful as a starting point. It seems to be complete and correct, and
worked for what I tried with it, but I didn't find time to test it fully
(e.g., by bootstrapping clang under Cygwin).

Cheers,
-Peter-

mcsectioncoff.diff

Nathan Jeffords

unread,
May 7, 2010, 2:22:15 AM5/7/10
to Peter S. Housel, llv...@cs.uiuc.edu
Thanks! Funny, I was just preparing a patch to submit for my changes to MCSectionCOFF. My changes look to be fairly independent of yours, my change was to deal with COMDAT's. I had dealt with the characteristics flags in the object writer, but I like this. If you don't mind I would like to merge my changes into this patch and submit it. I was just pondering how to deal with the PrintSwitchToSection function without needing the IsDirective flag.

Chris Lattner

unread,
May 7, 2010, 3:05:03 AM5/7/10
to Nathan Jeffords, llv...@cs.uiuc.edu
On May 6, 2010, at 11:22 PM, Nathan Jeffords wrote:

Thanks! Funny, I was just preparing a patch to submit for my changes to MCSectionCOFF. My changes look to be fairly independent of yours, my change was to deal with COMDAT's. I had dealt with the characteristics flags in the object writer, but I like this. If you don't mind I would like to merge my changes into this patch and submit it. I was just pondering how to deal with the PrintSwitchToSection function without needing the IsDirective flag.

I prefer to merge in small independent patches as they are built.  Please review Peter's patch (since you know COFF :).  I'll take a look tomorrow and apply it if you think it is forward progress, and if there aren't other issues.

Thanks!

-Chris


On Thu, May 6, 2010 at 11:12 PM, Peter S. Housel <hou...@acm.org> wrote:
On Wed, 2010-05-05 at 13:22 -0700, Nathan Jeffords wrote:

>
>         The important point here is that the COFF MCSection needs to
>         have the right level of semantic information.  In fact,
>         MCSection is the place that I'd start for COFF bringup.
>
> OK, I see that now. The current isolation
> between TargetLoweringObjectFile -> MCStreamer -> MCObjectWriter has
> proven somewhat problematic, mostly due to my lack of understanding.
> I guess MCSectionXXX was meant to provide communication between them.
> Should the same be true of MCSymbol, and their data counterparts?

I'm enclosing my patch for reforming MCSectionCOFF to match the
implementation strategy of the other two MCSection classes. You may find
it useful as a starting point. It seems to be complete and correct, and
worked for what I tried with it, but I didn't find time to test it fully
(e.g., by bootstrapping clang under Cygwin).

Cheers,
-Peter-


Nathan Jeffords

unread,
May 7, 2010, 3:15:30 AM5/7/10
to Chris Lattner, llv...@cs.uiuc.edu
I have looked over this patch, and do think its forward progress. I was planning additional changes, but I can wait to submit them until this is committed.

Aaron Gray

unread,
May 7, 2010, 6:10:55 AM5/7/10
to Nathan Jeffords, Peter S. Housel, llv...@cs.uiuc.edu
On 7 May 2010 08:15, Nathan Jeffords <blunte...@gmail.com> wrote:


On Fri, May 7, 2010 at 12:05 AM, Chris Lattner <clat...@apple.com> wrote:

On May 6, 2010, at 11:22 PM, Nathan Jeffords wrote:

Thanks! Funny, I was just preparing a patch to submit for my changes to MCSectionCOFF. My changes look to be fairly independent of yours, my change was to deal with COMDAT's. I had dealt with the characteristics flags in the object writer, but I like this. If you don't mind I would like to merge my changes into this patch and submit it. I was just pondering how to deal with the PrintSwitchToSection function without needing the IsDirective flag.

I prefer to merge in small independent patches as they are built.  Please review Peter's patch (since you know COFF :).  I'll take a look tomorrow and apply it if you think it is forward progress, and if there aren't other issues.

Thanks!

-Chris


On Thu, May 6, 2010 at 11:12 PM, Peter S. Housel <hou...@acm.org> wrote:
On Wed, 2010-05-05 at 13:22 -0700, Nathan Jeffords wrote:

>
>         The important point here is that the COFF MCSection needs to
>         have the right level of semantic information.  In fact,
>         MCSection is the place that I'd start for COFF bringup.
>
> OK, I see that now. The current isolation
> between TargetLoweringObjectFile -> MCStreamer -> MCObjectWriter has
> proven somewhat problematic, mostly due to my lack of understanding.
> I guess MCSectionXXX was meant to provide communication between them.
> Should the same be true of MCSymbol, and their data counterparts?

I'm enclosing my patch for reforming MCSectionCOFF to match the
implementation strategy of the other two MCSection classes. You may find
it useful as a starting point. It seems to be complete and correct, and
worked for what I tried with it, but I didn't find time to test it fully
(e.g., by bootstrapping clang under Cygwin).

Cheers,
Looks fine except its 'Emit' not 'Omit'.

Aaron

Peter S. Housel

unread,
May 7, 2010, 10:24:01 AM5/7/10
to Aaron Gray, llv...@cs.uiuc.edu
On Fri, 2010-05-07 at 11:10 +0100, Aaron Gray wrote:

> >
> Looks fine except its 'Emit' not 'Omit'.

'Omit' is what is intended here; it refers to omitting the '.section'
keyword when switching to '.text'/'.data'/'.bss'. Similar code appears
in MCSectionELF.cpp.

Daniel Dunbar

unread,
May 7, 2010, 11:52:28 AM5/7/10
to Chris Lattner, Daniel Dunbar, LLVM Dev
The reason for MCSectionCOFF etc., is that they are shared between the
MC and CodeGen interfaces. They have semantics that apply to both .s
files and object files, and even the frontend has some interest in
them.

OTOH, things like MCSymbolData, MCSectionData, are private to the
assembler backend, and so only the assembler and object writer need to
know about them (they are unused when writing to a .s file, for
example).

MCAssembler already maintains its own association of these data
structures, and there are a few bits available for the object file
backends inside MCSymbolData. I would be fine adding a few more for
use by specific object writers, if it simplifies your implementation.

I'm sorry I have had time to be very present on this thread, but
please feel free to mail me / ping me if there is a something about
the assembler backend you have questions on. I'm very excited to see
COFF support coming up!

- Daniel

>> As others have pointed out, this is one of the many horrors of C :)
>
> Another reason why I am attempting to develop my own language. :)
> p.s. I posted my coff backend patch to llvm-commit, but that apears to be
> the wrong place, where should I have posted it?
>
> llvm-commits is a great place for it!
> -Chris
>

Nathan Jeffords

unread,
May 7, 2010, 1:19:06 PM5/7/10
to Daniel Dunbar, Daniel Dunbar, LLVM Dev

The reason for MCSectionCOFF etc., is that they are shared between the
MC and CodeGen interfaces. They have semantics that apply to both .s
files and object files, and even the frontend has some interest in
them.

OTOH, things like MCSymbolData, MCSectionData, are private to the
assembler backend, and so only the assembler and object writer need to
know about them (they are unused when writing to a .s file, for
example).

MCAssembler already maintains its own association of these data
structures, and there are a few bits available for the object file
backends inside MCSymbolData. I would be fine adding a few more for
use by specific object writers, if it simplifies your implementation.


would it make sense to allow the object file streamer/writer code to provide custom derivations of MCSymbolData/MCSectionData so that its is free to define and interpret that data without other object file formats being aware of it? something like

struct MCAssemblyDataFactory {
  virtual MCSymbolData * createSectionData (MCSection* Symbol) = 0;
  virtual MCSymbolData * createSymbolData (MCSymbol* Symbol) = 0;
};

which would then me given to the assembler when created.

I had started down a different road, that if acceptable, might provide a more general solution. I have added some API's to MCContext to allow me to associate arbitrary data with section's & symbol's.

template <value_type> void setSectionData (MCSection const * Section, value_type const & Value);
template <value_type> bool getSectionData (MCSection const * Section, value_type & Value) const;

template <value_type> void setSymbolData (MCSymbol const * Symbol, value_type const & Value);
template <value_type> bool getSymbolData (MCSymbol const * Symbol, value_type & Value) const;
 
I can implement these with in a completely type safe manner without using virtual functions or virtual function tables and no dynamic runtime type information. With some extra work, they can use a single allocation per data item, currently their is two. I do rely on the typeid to return a typeinfo object for value_type, but I do this at compile time, not runtime.

I'm sorry I have had time to be very present on this thread, but
please feel free to mail me / ping me if there is a something about
the assembler backend you have questions on. I'm very excited to see
COFF support coming up!
 
Hopefully, I will have something people can look at pretty soon here.

Chris Lattner

unread,
May 7, 2010, 1:19:11 PM5/7/10
to Peter S. Housel, llv...@cs.uiuc.edu
Looks really great to me, applied in r103267, thanks!

One thing:

+++ include/llvm/CodeGen/TargetLoweringObjectFileImpl.h (working copy)
@@ -161,13 +161,15 @@
...
+ virtual const MCSection *getDrectveSection() const { return DrectveSection; }

This shouldn't need to be virtual?
Reply all
Reply to author
Forward
0 new messages