[llvm-dev] Emiting linkage names for Types to Debuginfo (C++ RTTI support in GDB/LLDB)

Roman Popov via llvm-dev

unread,

Mar 2, 2018, 6:59:25 PM3/2/18

to llvm...@lists.llvm.org, Clang Dev

Hi all,

As you may know modern C++ debuggers (GDB and LLDB) support dynamic type identification for polymorphic objects, by utilizing C++ RTTI.

Unfortunately this feature does not work with Clang and GDB >= 7.x . The last compiler that worked well was G++ 6.x

I've asked about this issue both on GDB and LLDB maillists. Unfortunately it's hard or impossible to fix it on debugger side.

The problem is that compilers do not emit linkage name of type to debug information, and so debuggers can't link RTTI with DW_TAG_*_types reliably.

Consider example:

////////////////////////////////////////////////////////////////

enum class EN{ONE,TWO};

template<auto x>

struct foo { virtual ~foo() {} };

foo<11u> fu;

foo<11> fi;

foo<EN::ONE> fe;

////////////////////////////////////////////////////////////////

Clang will put following names to RTTI:

3fooILi11EE -> foo<11>

3fooILj11EE -> foo<11u>

3fooIL2EN0EE -> foo<(EN)0>

And to debuginfo it will emit 3 DW_TAG_structure_types with following DW_AT_names:

DW_AT_name: foo<11>

DW_AT_name: foo<11>
DW_AT_name: foo<EN::ONE>

Currently what debugger has to do is to demangle RTTI name and try to match it to DW_AT_name attribute to find type. As you can see it does not work for any of 3 examples.

I've asked about the problem on G++ maillist, and one of the proposed solutions is to emit DW_AT_linkage_name for types.

Can this solution be also implemented in LLVM?

I've checked LLVM docs and found out that LLVM generates DWARF from LLVM metadata. LLVM metadata for types already contains linkage name in "identifier" field: https://llvm.org/docs/LangRef.html#dicompositetype

So LLVM itself can identify types by name, the only remaining issue is to emit it to debuginfo. That should be two lines of code in : DwarfUnit::constructTypeDIE, something like:

StringRef LinkageName = CTy->getIdentifier();

addString(Buffer, dwarf::DW_AT_linkage_name, LinkageName);

Thanks,

Roman

via llvm-dev

unread,

Mar 2, 2018, 8:43:15 PM3/2/18

to rip...@gmail.com, llvm...@lists.llvm.org

> Currently what debugger has to do is to demangle RTTI name and try to
> match it to DW_AT_name attribute to find type. As you can see it does
> not work for any of 3 examples.
>
> I've asked about the problem on G++ maillist, and one of the proposed
> solutions is to emit DW_AT_linkage_name for types.
>
> Can this solution be also implemented in LLVM?

It could, but mangled names can be very long and we need to consider
whether the additional size cost is worth it under various conditions.
For example, does this type matching work when a program is compiled
with `-fno-rtti`? (Clang itself is compiled this way by default.)
Thanks,
--paulr

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Roman Popov via llvm-dev

unread,

Mar 2, 2018, 9:12:27 PM3/2/18

to Robinson, Paul, llvm...@lists.llvm.org

Mangled names can be long indeed, but pretty-printed types are also long. I can evaluate effect on size on clang codebase itself.

If you disable RTTI, than obviously you can't use it. So if RTTI is disabled, we can disable mangled names in DWARF. Clang is compiled without standard C++ RTTI because it has it's own RTTI. In general, however, many libraries use standard RTTI.

Roman Popov via llvm-dev

unread,

Mar 2, 2018, 10:55:16 PM3/2/18

to Robinson, Paul, llvm...@lists.llvm.org

Here is result of experiment:

(Original size , with DW_AT_linkage_name for composites, % increase)

clang-7.0 1926574256 1952846192 1.4%

clang-tidy 1220980360 1238498112 1.4%

llvm-mt 7404728 7525328 1.6 %

std::cout << "hello world!" 21552 22080 2.4 %

IMO, not that big price for reliable dynamic type identification (Full disclosure: I need depend on this feature, since I'm writing Python pretty printers for GDB )

-Roman

David Blaikie via llvm-dev

unread,

Mar 3, 2018, 11:16:25 PM3/3/18

to Roman Popov, llvm...@lists.llvm.org

Is that putting linkage name on every class that has an identifier in the IR?

I'm guessing that's not necessary/excessive - only RTTI types would need it? Perhaps without changing Clang, LLVM could be changed to only attach the linkage name to a type that it was emitting the virtual parts of a class?

Also: The original example with "template<auto T>" (I don't know what that feature is called, sorry) does present some problems that probably need to be fixed regardless (two different instantiations having the same name's clearly a problem, for example - and I might be convinced that the enum case should change so it can produce a consistent type even in the face of a forward-declared enum (with an explicit underlying type specified), for example) - but even with those things fixed, I agree it's still not a great idea to have debuggers need to match mangled names to pretty printed names.

Daniel Berlin via llvm-dev

unread,

Mar 3, 2018, 11:20:48 PM3/3/18

to Roman Popov, llvm-dev, Clang Dev

On Fri, Mar 2, 2018 at 3:58 PM, Roman Popov via llvm-dev <llvm...@lists.llvm.org> wrote:

Hi all,

As you may know modern C++ debuggers (GDB and LLDB) support dynamic type identification for polymorphic objects, by utilizing C++ RTTI.
Unfortunately this feature does not work with Clang and GDB >= 7.x . The last compiler that worked well was G++ 6.x

I've asked about this issue both on GDB and LLDB maillists. Unfortunately it's hard or impossible to fix it on debugger side.

Errr, i posited a solution on the gdb mailing list that i haven't seen shot down so far, that doesn't require linkage names, it only requires one new attribute that is a DW_FORM_ref, and very cheap.

I also wrote the RTTI code for GDB :)

Currently what debugger has to do is to demangle RTTI name and try to match it to DW_AT_name attribute to find type. As you can see it does not work for any of 3 examples.

I've asked about the problem on G++ maillist, and one of the proposed solutions is to emit DW_AT_linkage_name for types.

Can this solution be also implemented in LLVM?

Please, no.

This is completely unneeded and wastes a huge amount of space.

As you can see from the replies to my solution on the gdb mailing list, it is used by other languages (rust, for example) *anyway*, so we might as well use it for C++ too.

Daniel Berlin via llvm-dev

unread,

Mar 3, 2018, 11:30:49 PM3/3/18

to Roman Popov, llvm-dev, Clang Dev

To explain to others who didn't follow that thread:

GDB currently does something amazingly stupid (and has since i wrote it) to find the RTTI type. There were no other good options at the type.

What it does is find the vtable for the object, find the symbol that represents the vtable, demangle it, , chops off "vtable for", and tries to find the symbol for the string that results.

If you don't emit the linkage name, there are cases it won't find it, because this is a really dumb way of trying to find the answer :)

It also wont' find it depending on what demangler you use, etc.

Here's a more direct way:

For each vtable DIE, link to the concrete type it represents.

Now you just go from vtable object to concrete type with no string lookup, which is faster, doesnt' require linkage names, doesn't depend on demanglers matching, etc.

As an added bonus: This is what Tom Tromey already added to Rust to do this. So it's even been implemented before.

John McCall via llvm-dev

unread,

Mar 4, 2018, 3:33:50 AM3/4/18

to Daniel Berlin, llvm-dev, Clang Dev

On Mar 3, 2018, at 11:30 PM, Daniel Berlin via cfe-dev <cfe...@lists.llvm.org> wrote:

To explain to others who didn't follow that thread:

GDB currently does something amazingly stupid (and has since i wrote it) to find the RTTI type. There were no other good options at the type.

What it does is find the vtable for the object, find the symbol that represents the vtable, demangle it, , chops off "vtable for", and tries to find the symbol for the string that results.

Glorious. :)

Do any of the common C++ demangler implementations provide any sort of API to get at the demangler tree? We did this in Swift, and even though our tree design isn't real great, it's been a huge help for implementing various reflection / debugging features.

John.

If you don't emit the linkage name, there are cases it won't find it, because this is a really dumb way of trying to find the answer :)

It also wont' find it depending on what demangler you use, etc.

Here's a more direct way:
For each vtable DIE, link to the concrete type it represents.

Now you just go from vtable object to concrete type with no string lookup, which is faster, doesnt' require linkage names, doesn't depend on demanglers matching, etc.

As an added bonus: This is what Tom Tromey already added to Rust to do this. So it's even been implemented before.

On Sat, Mar 3, 2018 at 8:20 PM, Daniel Berlin <dbe...@dberlin.org> wrote:

On Fri, Mar 2, 2018 at 3:58 PM, Roman Popov via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi all,

As you may know modern C++ debuggers (GDB and LLDB) support dynamic type identification for polymorphic objects, by utilizing C++ RTTI.
Unfortunately this feature does not work with Clang and GDB >= 7.x . The last compiler that worked well was G++ 6.x

I've asked about this issue both on GDB and LLDB maillists. Unfortunately it's hard or impossible to fix it on debugger side.

Errr, i posited a solution on the gdb mailing list that i haven't seen shot down so far, that doesn't require linkage names, it only requires one new attribute that is a DW_FORM_ref, and very cheap.

I also wrote the RTTI code for GDB :)

Currently what debugger has to do is to demangle RTTI name and try to match it to DW_AT_name attribute to find type. As you can see it does not work for any of 3 examples.

I've asked about the problem on G++ maillist, and one of the proposed solutions is to emit DW_AT_linkage_name for types.

Can this solution be also implemented in LLVM?

Please, no.

This is completely unneeded and wastes a huge amount of space.

As you can see from the replies to my solution on the gdb mailing list, it is used by other languages (rust, for example) *anyway*, so we might as well use it for C++ too.

_______________________________________________
cfe-dev mailing list
cfe...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Daniel Berlin via llvm-dev

unread,

Mar 4, 2018, 10:20:37 AM3/4/18

to John McCall, llvm-dev, Clang Dev

On Sun, Mar 4, 2018 at 12:33 AM, John McCall <rjmc...@apple.com> wrote:

On Mar 3, 2018, at 11:30 PM, Daniel Berlin via cfe-dev <cfe...@lists.llvm.org> wrote:

To explain to others who didn't follow that thread:

GDB currently does something amazingly stupid (and has since i wrote it) to find the RTTI type. There were no other good options at the type.

What it does is find the vtable for the object, find the symbol that represents the vtable, demangle it, , chops off "vtable for", and tries to find the symbol for the string that results.

Glorious. :)

I regretted it pretty much the second it was done :)

(but nothing else implemented the itanium C++ ABI yet, we still had to deal with STABS, DBX, etc, so there wasn't a great way to push conformity here).

You can imagine what happens - demangler differences between host and target, compilers, etc, of course, will cause failure here.

It's also the case that the demangled name may not be the symbol as known in DWARF, etc.

One of the issues here is the demangling difference between binary and runtime, where, one produced Foo<2u> and one produced Foo<2>

Personally, as is apparent, i don't think we should solve these by going down the rabbit hole of "using more names", when it's pretty trivial to just link the things together and not have to do the lookup at all.

(There are a bunch of open gdb bugs on differences like the above)

Do any of the common C++ demangler implementations provide any sort of API to get at the demangler tree?

Not that i know of :(

We did this in Swift, and even though our tree design isn't real great, it's been a huge help for implementing various reflection / debugging features.

Yeah, it definitely would be.

Most of what you see to support C++ in GDB are hacks, of course, from overload resolution to you name it.

:(

John McCall via llvm-dev

unread,

Mar 4, 2018, 3:32:57 PM3/4/18

to Daniel Berlin, llvm-dev, Clang Dev

On Mar 4, 2018, at 10:20 AM, Daniel Berlin <dbe...@dberlin.org> wrote:

On Sun, Mar 4, 2018 at 12:33 AM, John McCall <rjmc...@apple.com> wrote:
On Mar 3, 2018, at 11:30 PM, Daniel Berlin via cfe-dev <cfe...@lists.llvm.org> wrote:

To explain to others who didn't follow that thread:

GDB currently does something amazingly stupid (and has since i wrote it) to find the RTTI type. There were no other good options at the type.

What it does is find the vtable for the object, find the symbol that represents the vtable, demangle it, , chops off "vtable for", and tries to find the symbol for the string that results.

Glorious. :)

I regretted it pretty much the second it was done :)

(but nothing else implemented the itanium C++ ABI yet, we still had to deal with STABS, DBX, etc, so there wasn't a great way to push conformity here).

You can imagine what happens - demangler differences between host and target, compilers, etc, of course, will cause failure here.
It's also the case that the demangled name may not be the symbol as known in DWARF, etc.

One of the issues here is the demangling difference between binary and runtime, where, one produced Foo<2u> and one produced Foo<2>

Personally, as is apparent, i don't think we should solve these by going down the rabbit hole of "using more names", when it's pretty trivial to just link the things together and not have to do the lookup at all.

Yeah, having debug info associated with the v-table object that links to the type information would be pretty sensible.

(There are a bunch of open gdb bugs on differences like the above)

Do any of the common C++ demangler implementations provide any sort of API to get at the demangler tree?

Not that i know of :(

Seems like a reasonable project! Maybe we can get a SoC student to make a standalone C++ demangler library with a tree API (an unstable one should be fine), and debuggers can just use that instead of relying on the OS's cxa_demangle. (I'm really not sure why development tools rely on the system demangler anyway; surely it's always easier to tell users that they'd get a better experience with a new debugger than to tell them that they need to replace their system's C++ standard library?)

We did this in Swift, and even though our tree design isn't real great, it's been a huge help for implementing various reflection / debugging features.

Yeah, it definitely would be.

Most of what you see to support C++ in GDB are hacks, of course, from overload resolution to you name it.
:(

Yeah. LLDB's C++ support doesn't always work (grr argh templates), but being able to rely on an actual compiler frontend is just a huge step forward for making all the language features work right.

John.

David Blaikie via llvm-dev

unread,

Mar 5, 2018, 11:38:09 AM3/5/18

to Daniel Berlin, llvm-dev, Clang Dev

On Sat, Mar 3, 2018 at 8:20 PM Daniel Berlin via llvm-dev <llvm...@lists.llvm.org> wrote:

On Fri, Mar 2, 2018 at 3:58 PM, Roman Popov via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi all,

As you may know modern C++ debuggers (GDB and LLDB) support dynamic type identification for polymorphic objects, by utilizing C++ RTTI.
Unfortunately this feature does not work with Clang and GDB >= 7.x . The last compiler that worked well was G++ 6.x

I've asked about this issue both on GDB and LLDB maillists. Unfortunately it's hard or impossible to fix it on debugger side.

Errr, i posited a solution on the gdb mailing list that i haven't seen shot down so far, that doesn't require linkage names, it only requires one new attribute that is a DW_FORM_ref, and very cheap.

FWIW, for C++ at least, neither Clang nor GCC (6.3) produce any DWARF to describe the vtable itself (they describe the vtable pointer inside the struct, but not the constant vtable array) - so it'll be a bit more than one attribute, but the bytes describe the vtable (as a global variable? Do we give it a name? (if so, we're back to paying that cost)) first, then to add the reference from that to the type.

& I'm not sure what Apple would do or anyone else that has libraries without debug info shipped & users have to debug them (this is what broke -fno-standalone-debug for Apple - their driver API which ships without debug info of its own, has strong vtables in it).

I can go into more detail there - but there are certainly some annoying edge cases/questions I have here :/

I also wrote the RTTI code for GDB :)

Currently what debugger has to do is to demangle RTTI name and try to match it to DW_AT_name attribute to find type. As you can see it does not work for any of 3 examples.

I've asked about the problem on G++ maillist, and one of the proposed solutions is to emit DW_AT_linkage_name for types.

Can this solution be also implemented in LLVM?

Please, no.

This is completely unneeded and wastes a huge amount of space.

As you can see from the replies to my solution on the gdb mailing list, it is used by other languages (rust, for example) *anyway*, so we might as well use it for C++ too.

Daniel Berlin via llvm-dev

unread,

Mar 5, 2018, 12:10:29 PM3/5/18

to David Blaikie, llvm-dev, Clang Dev

On Mon, Mar 5, 2018 at 8:37 AM, David Blaikie <dbla...@gmail.com> wrote:

On Sat, Mar 3, 2018 at 8:20 PM Daniel Berlin via llvm-dev <llvm...@lists.llvm.org> wrote:
On Fri, Mar 2, 2018 at 3:58 PM, Roman Popov via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi all,

As you may know modern C++ debuggers (GDB and LLDB) support dynamic type identification for polymorphic objects, by utilizing C++ RTTI.
Unfortunately this feature does not work with Clang and GDB >= 7.x . The last compiler that worked well was G++ 6.x

I've asked about this issue both on GDB and LLDB maillists. Unfortunately it's hard or impossible to fix it on debugger side.

Errr, i posited a solution on the gdb mailing list that i haven't seen shot down so far, that doesn't require linkage names, it only requires one new attribute that is a DW_FORM_ref, and very cheap.

FWIW, for C++ at least, neither Clang nor GCC (6.3) produce any DWARF to describe the vtable itself (they describe the vtable pointer inside the struct, but not the constant vtable array) - so it'll be a bit more than one attribute, but the bytes describe the vtable (as a global variable? Do we give it a name? (if so, we're back to paying that cost)) first, then to add the reference from that to the type.

Right, they produce a named symbol but not debug info.

The only thing you need is a single DIE for that symbol, with a single ref.

(IE they just need to be able to say "find me the DIE for this address range", have it get to the vtable DIE, and get to the concrete type die)

& I'm not sure what Apple would do or anyone else that has libraries without debug info shipped & users have to debug them (this is what broke -fno-standalone-debug for Apple - their driver API which ships without debug info of its own, has strong vtables in it).

I'm confused.

This already seems to have has the same issue?
Just because it uses one linker symbol, it still requires full debug info to print the type anyway.

So if it's gone, nothing changes.

I can go into more detail there - but there are certainly some annoying edge cases/questions I have here :/

Constructive alternative?

Right now, relying on *more* names, besides being huge in a lot of binaries, relies on the demangler producing certain text (which is not guaranteed)

That text has to exactly match the text of some other symbol (which is not guaranteed).

That 10 second delay you get sometimes with going to print a C++ symbol in a large binary?

That's this lookup.

So right now it:
1. Uses a ton of memory

2. Uses a ton of time

3. Doesn't work all the time (depends on demanglers, and there are very weird edge cases here).

Adding linkage names will not change any of these, whereas adding a DWARF extension fixes all three, forever.

I don't even care about the details of the extension, my overriding constraint is "please don't extend this hack further given the above".

David Blaikie via llvm-dev

unread,

Mar 5, 2018, 12:26:52 PM3/5/18

to Daniel Berlin, llvm-dev, Clang Dev

On Mon, Mar 5, 2018 at 9:09 AM Daniel Berlin <dbe...@dberlin.org> wrote:

On Mon, Mar 5, 2018 at 8:37 AM, David Blaikie <dbla...@gmail.com> wrote:

On Sat, Mar 3, 2018 at 8:20 PM Daniel Berlin via llvm-dev <llvm...@lists.llvm.org> wrote:
On Fri, Mar 2, 2018 at 3:58 PM, Roman Popov via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi all,

As you may know modern C++ debuggers (GDB and LLDB) support dynamic type identification for polymorphic objects, by utilizing C++ RTTI.
Unfortunately this feature does not work with Clang and GDB >= 7.x . The last compiler that worked well was G++ 6.x

I've asked about this issue both on GDB and LLDB maillists. Unfortunately it's hard or impossible to fix it on debugger side.

Errr, i posited a solution on the gdb mailing list that i haven't seen shot down so far, that doesn't require linkage names, it only requires one new attribute that is a DW_FORM_ref, and very cheap.

FWIW, for C++ at least, neither Clang nor GCC (6.3) produce any DWARF to describe the vtable itself (they describe the vtable pointer inside the struct, but not the constant vtable array) - so it'll be a bit more than one attribute, but the bytes describe the vtable (as a global variable? Do we give it a name? (if so, we're back to paying that cost)) first, then to add the reference from that to the type.

Right, they produce a named symbol but not debug info.

The only thing you need is a single DIE for that symbol, with a single ref.

When you say "a single DIE" what attributes are you picturing that DIE having? If it has a single attribute, a ref_addr to the type, that doesn't seem to provide anything useful. Presumably this DIE would need a DW_AT_location with the address of the vtable (with a relocation to resolve that address, etc).

No name? No other identifying features? I don't think we've ever really produced DIEs like that, though it sounds OK to me.

(IE they just need to be able to say "find me the DIE for this address range", have it get to the vtable DIE, and get to the concrete type die)

& I'm not sure what Apple would do or anyone else that has libraries without debug info shipped & users have to debug them (this is what broke -fno-standalone-debug for Apple - their driver API which ships without debug info of its own, has strong vtables in it).

I'm confused.
This already seems to have has the same issue?
Just because it uses one linker symbol, it still requires full debug info to print the type anyway.

So if it's gone, nothing changes.

Sorry, I don't quite understand your comment here - could you explain it in more detail - the steps/issues you're seeing?

I'll try to do the same:
Currently the DWARF type information (the actual DW_TAG_class_type DIE with the full definition of the class - its members, etc) on OSX goes everywhere the type is used (rather than only in the object files where the vtable is defined) to ensure that types defined in objects built without debug info, but used in objects built with debug info can still be debugged. (whereas on other platforms, like Linux, the assumption is made that the whole program is built with debug info - OSX is different because it has these system libraries for drivers that break this convention (& because LLDB can't handle this situation) - so, because the system itself breaks the assumption, the default is to turn off the assumption)

I assumed your proposal would only add this debug info to describe the vtable constant where the vtable is defined. Which would break OSX.

If the idea would be to, in OSX (& other -fstandalone-debug situations/platforms/users) would be to include this vtable DIE even where the vtable is not defined - that adds a bit more debug info & also it means debug info describing the declaration of a variable, also something we haven't really done in LLVM before - again, technically possible, but a nuance I'd call out/want to be aware of/think about/talk about (hence this conversation), etc.

I can go into more detail there - but there are certainly some annoying edge cases/questions I have here :/

Constructive alternative?

Not sure - not saying what your proposing isn't workable - but I do want to understand the practical/implementation details a bit to see how it plays out - hence the conversation above.

Right now, relying on *more* names, besides being huge in a lot of binaries, relies on the demangler producing certain text (which is not guaranteed)
That text has to exactly match the text of some other symbol (which is not guaranteed).

*nod* I agree that the name matching based on demangling is a bad idea.

That 10 second delay you get sometimes with going to print a C++ symbol in a large binary?

That's this lookup.

So right now it:
1. Uses a ton of memory
2. Uses a ton of time
3. Doesn't work all the time (depends on demanglers, and there are very weird edge cases here).

Adding linkage names will not change any of these, whereas adding a DWARF extension fixes all three, forever.

Not sure I follow this - debuggers do lots of name lookups, I would've thought linkage name<>linkage name lookup could be somewhat practical (without all the fuzzy matching logic).

I don't even care about the details of the extension, my overriding constraint is "please don't extend this hack further given the above".

Mangled to demangled name matching seems like a hack - matching the mangled names doesn't seem like such a hack to me - but, yeah, I'm totally open to an address based solution as you're suggesting, just trying to figure out the details/issues.

Have you got a link/steps to a sample/way to get GCC to produce this sort of debug info? (at least with 6.3 using C++ I don't see any debug info like this describing a vtable)

- Dave

Daniel Berlin via llvm-dev

unread,

Mar 5, 2018, 1:46:41 PM3/5/18

to David Blaikie, llvm-dev, Clang Dev

On Mon, Mar 5, 2018, 9:26 AM David Blaikie <dbla...@gmail.com> wrote:

On Mon, Mar 5, 2018 at 9:09 AM Daniel Berlin <dbe...@dberlin.org> wrote:
On Mon, Mar 5, 2018 at 8:37 AM, David Blaikie <dbla...@gmail.com> wrote:

On Sat, Mar 3, 2018 at 8:20 PM Daniel Berlin via llvm-dev <llvm...@lists.llvm.org> wrote:
On Fri, Mar 2, 2018 at 3:58 PM, Roman Popov via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi all,

As you may know modern C++ debuggers (GDB and LLDB) support dynamic type identification for polymorphic objects, by utilizing C++ RTTI.
Unfortunately this feature does not work with Clang and GDB >= 7.x . The last compiler that worked well was G++ 6.x

I've asked about this issue both on GDB and LLDB maillists. Unfortunately it's hard or impossible to fix it on debugger side.

Errr, i posited a solution on the gdb mailing list that i haven't seen shot down so far, that doesn't require linkage names, it only requires one new attribute that is a DW_FORM_ref, and very cheap.

FWIW, for C++ at least, neither Clang nor GCC (6.3) produce any DWARF to describe the vtable itself (they describe the vtable pointer inside the struct, but not the constant vtable array) - so it'll be a bit more than one attribute, but the bytes describe the vtable (as a global variable? Do we give it a name? (if so, we're back to paying that cost)) first, then to add the reference from that to the type.

Right, they produce a named symbol but not debug info.

The only thing you need is a single DIE for that symbol, with a single ref.

When you say "a single DIE" what attributes are you picturing that DIE having? If it has a single attribute, a ref_addr to the type, that doesn't seem to provide anything useful. Presumably this DIE would need a DW_AT_location with the address of the vtable (with a relocation to resolve that address, etc).

Location and concrete type it belongs to. That's the minimum you should need here.

You don't need the name, though it doesn't hurt.

No name? No other identifying features? I don't think we've ever really produced DIEs like that, though it sounds OK to me.

(IE they just need to be able to say "find me the DIE for this address range", have it get to the vtable DIE, and get to the concrete type die)

& I'm not sure what Apple would do or anyone else that has libraries without debug info shipped & users have to debug them (this is what broke -fno-standalone-debug for Apple - their driver API which ships without debug info of its own, has strong vtables in it).

I'm confused.
This already seems to have has the same issue?
Just because it uses one linker symbol, it still requires full debug info to print the type anyway.
So if it's gone, nothing changes.

Sorry, I don't quite understand your comment here - could you explain it in more detail - the steps/issues you're seeing?

I think we are starting from different positions here, so let me add a few pieces of data and see how it helps.

Let's assume the below is true and it won't work on OSX as described (i'm certainly in no place to disagree).

Some data points:

1. LLDB works just fine on Darwin (it appears to do the same thing we did in gdb, staring at source/Plugins/LanguageRuntime/CPlusPlus/ItaniumABI/ItaniumABILanguageRuntime.cpp)

2. GDB does not work on Darwin at all for any real debugging right now (You can't debug llvm with it, for example). There are barely working versions here and there. The startup time to debug an "opt" binary from llvm is well over 2 minutes alone to get to a prompt just from typing "gdb bin/opt". It requires 4 gigs of ram. It usually fails to print most symbols/types/crashes calling functions, blah blah blah.

You can't even quit most of the time without hitting an assert.

(gdb) q

thread.c:93: internal-error: struct thread_info *inferior_thread(): Assertion `tp' failed.

A problem internal to GDB has been detected,

further debugging may prove unreliable.

Quit this debugging session? (y or n) y

3. On every platform, GDB will have to continue to use what it does now as a fallback anyway, as all existing binaries will not be rebuilt.

4. Ditto LLDB

So for GDB, it doesn't really matter whether it breaks OSX, to start. Even if it did, it will still work as well or as not well as it has in the past :)

LLDB works, and should work as well as it did with or without this as well.

Given all that: No matter what we do, LLDB and GDB will continue to work exactly as well or as broken as they have before on OSX. Nothing will change.

So i wouldn't call it broken, i'd call it, at worst, inapplicable to certain situations on OSX, and triggering a fallback :)

I'll try to do the same:
Currently the DWARF type information (the actual DW_TAG_class_type DIE with the full definition of the class - its members, etc) on OSX goes everywhere the type is used (rather than only in the object files where the vtable is defined) to ensure that types defined in objects built without debug info, but used in objects built with debug info can still be debugged. (whereas on other platforms, like Linux, the assumption is made that the whole program is built with debug info - OSX is different because it has these system libraries for drivers that break this convention (& because LLDB can't handle this situation) - so, because the system itself breaks the assumption, the default is to turn off the assumption)

I assumed your proposal would only add this debug info to describe the vtable constant where the vtable is defined. Which would break OSX.

If the idea would be to, in OSX (& other -fstandalone-debug situations/platforms/users) would be to include this vtable DIE even where the vtable is not defined - that adds a bit more debug info & also it means debug info describing the declaration of a variable, also something we haven't really done in LLVM before - again, technically possible, but a nuance I'd call out/want to be aware of/think about/talk about (hence this conversation), etc.

I can go into more detail there - but there are certainly some annoying edge cases/questions I have here :/

Constructive alternative?

Not sure - not saying what your proposing isn't workable - but I do want to understand the practical/implementation details a bit to see how it plays out - hence the conversation above.

FWIW, i don't have a lot of time/energy to push this, so i'm pretty much going to bow out at this point and leave folks to their own devices. I just wanted to point out there are other solutions that would likely work a lot better over time.

Right now, relying on *more* names, besides being huge in a lot of binaries, relies on the demangler producing certain text (which is not guaranteed)
That text has to exactly match the text of some other symbol (which is not guaranteed).

*nod* I agree that the name matching based on demangling is a bad idea.

That 10 second delay you get sometimes with going to print a C++ symbol in a large binary?

That's this lookup.

So right now it:
1. Uses a ton of memory
2. Uses a ton of time
3. Doesn't work all the time (depends on demanglers, and there are very weird edge cases here).

Adding linkage names will not change any of these, whereas adding a DWARF extension fixes all three, forever.

Not sure I follow this - debuggers do lots of name lookups, I would've thought linkage name<>linkage name lookup could be somewhat practical (without all the fuzzy matching logic).

You'd think it would be optimized for this, but for GDB, it will now pull in every symbol table looking for the name, until it finds it. It does not, for example, build a global index of names so it knows what CU to go read from or anything smart like that.

(it's a little more nuanced than this, but in practice, not)

I don't even care about the details of the extension, my overriding constraint is "please don't extend this hack further given the above".

Mangled to demangled name matching seems like a hack - matching the mangled names doesn't seem like such a hack to me - but, yeah, I'm totally open to an address based solution as you're suggesting, just trying to figure out the details/issues.

At the time, the mangled name was not available anywhere.

It looks like name() is supposed to now return the mangled name in the itanium ABI.

So theoretically, you could just change GDB to call the name function(), look that up in the minimal symbol tables (name->address mappings, without debug info), and go to the full symbol table info for that address. This avoids needing the DW_AT_name in the debuginfo to match, only the name in the symbol table.

This will break if you use -fno-rtti, whereas the vtable way (either existing or proposed) would still work.

G++ actually *had* linkage names for types for a long time in the debug info, and deliberately removed them due to space usage.

Have you got a link/steps to a sample/way to get GCC to produce this sort of debug info? (at least with 6.3 using C++ I don't see any debug info like this describing a vtable)

Yeah, nothing does it yet.

Bug tom tromey, who did it for Rust, not C++

- Dave

Roman Popov via llvm-dev

unread,

Mar 6, 2018, 2:55:33 AM3/6/18

to Daniel Berlin, llvm-dev, Clang Dev

I don't understand how extra vtable ref DIE will help in case on non-polymorphic classes. If you remove virtual destructor from example, vtable won't be generated for class, but DWARF will still have incorrect ambiguous names for types.

It will become a problem when you need to use debuginfo as a C++ runtime reflection (I've already seen this in a couple of projects). Or when you need to go back from LLVM IR to Clang AST (I've already encountered this problem).

I wonder if abi::__cxa_demangle guarantees unambigous names? If so, then I can just replace current incorrect names that Clang generates, with names from demangler. In this case I don't even need to patch gdb, it will work as is.

-Roman

Pavel Labath via llvm-dev

unread,

Mar 6, 2018, 7:13:45 AM3/6/18

to John McCall, Erik Pilkington, LLVM Dev, Clang Dev

On Sun, 4 Mar 2018 at 20:33, John McCall via llvm-dev <llvm...@lists.llvm.org> wrote:

Seems like a reasonable project! Maybe we can get a SoC student to make a standalone C++ demangler library with a tree API (an unstable one should be fine), and debuggers can just use that instead of relying on the OS's cxa_demangle. (I'm really not sure why development tools rely on the system demangler anyway; surely it's always easier to tell users that they'd get a better experience with a new debugger than to tell them that they need to replace their system's C++ standard library?)

I believe there is some work being done on that already <http://lists.llvm.org/pipermail/lldb-dev/2018-January/013186.html>, but I'm not sure what's the current state of it. Also, in the default configuration, LLDB will use lllvm::itaniumDemangle for demangling (although there is a build option to use cxa_demangle).

Erik Pilkington via llvm-dev

unread,

Mar 6, 2018, 10:14:06 AM3/6/18

to Pavel Labath, John McCall, LLVM Dev, Clang Dev

Coincidentally, I was just finishing up the refactoring work I wanted to do first on the demangler in libcxxabi. I'm going to put together an API for the AST and put a patch up on phab as a starting point, hopefully we can figure out the details of what LLDB wants access to from there. I'll ping the thread you linked to when I put that patch up.

Erik

Daniel Berlin via llvm-dev

unread,

Mar 6, 2018, 11:39:57 AM3/6/18

to Roman Popov, llvm-dev, Clang Dev

On Mon, Mar 5, 2018 at 11:55 PM, Roman Popov <rip...@gmail.com> wrote:

I don't understand how extra vtable ref DIE will help in case on non-polymorphic classes. If you remove virtual destructor from example, vtable won't be generated for class, but DWARF will still have incorrect ambiguous names for types.

1. Calling them incorrect is ... not right. As Andrew quoted on the gdb mailing list, this is what DWARF specifies should happen, so they are correct by spec. If you believe the spec is wrong, file an issue on the DWARF web site and discuss it on the mailing list, and bring back the consensus of the committee as to what to do :)

2. The failure that was cited on the gdb mailing list only occurs on polymorphic classes. If you have it occurring on non-polymorphic classes, that seems like a very different problem, and probably related to the fact that GDB does not know how to assemble or parse C++ names properly in some cases. Otherwise, this would occur on literally every class you saw in GDB, and that's definitely not the case:)

The only reason linkage names would fix that issue is because they provide an exact match to GDB's parsing failure.

You should just fix GDB.

(GDB already knows how to collect and print out multiple symbols in the case they have the same name, FWIW)

It will become a problem when you need to use debuginfo as a C++ runtime reflection (I've already seen this in a couple of projects).

Or when you need to go back from LLVM IR to Clang AST (I've already encountered this problem).

I don't understand these use cases well enough to help, but if you think it's a serious issue, again, i'd take it up with the DWARF folks.

I wonder if abi::__cxa_demangle guarantees unambigous names?

No, it does not.

David Blaikie via llvm-dev

unread,

Mar 6, 2018, 12:18:34 PM3/6/18

to Roman Popov, llvm-dev, Clang Dev

On Mon, Mar 5, 2018 at 11:55 PM Roman Popov <rip...@gmail.com> wrote:

I don't understand how extra vtable ref DIE will help in case on non-polymorphic classes.

The case we seem to be discussing is about dynamic types (types with vtables). Non-dynamic types don't have type info in the object code to compare against/match/test to find the dynamic type of an object (eg: you can't dynamic_cast or use typeid on a type without a vtable).

If you remove virtual destructor from example, vtable won't be generated for class, but DWARF will still have incorrect ambiguous names for types.

As I've noted: Having ambiguous names for a type is something that should be fixed because otherwise a debugger's going to get pretty confused about matching types up between TUs.

But unambiguous doesn't necessarily mean "exactly the same name as a certain demangler produces".

It will become a problem when you need to use debuginfo as a C++ runtime reflection (I've already seen this in a couple of projects).

Or when you need to go back from LLVM IR to Clang AST (I've already encountered this problem).

Not sure I quite follow these two points - though they're quite different from the issues discussed so far in terms of motivation/solutions - so might be worth diving into them further to understand if/how they could be supported.

I wonder if abi::__cxa_demangle guarantees unambigous names? If so, then I can just replace current incorrect names that Clang generates, with names from demangler. In this case I don't even need to patch gdb, it will work as is.

The problem is that the ABI doesn't guarantee any particular demangling - different implementations could demangle differently (eg: "(unsigned)1" versus "1u" for example). Making a strict contract between the demangler and the pretty printed names is probably not a workable idea.

Roman Popov via llvm-dev

unread,

Mar 6, 2018, 12:20:43 PM3/6/18

to Daniel Berlin, llvm-dev, Clang Dev

I wonder if abi::__cxa_demangle guarantees unambigous names?

No, it does not.

Interesting. Can you give an example of type where it fails?

I'm currently working on hardware construction library for C++ (similar to Chisel (which is written in Scala)). And since C++ has no standardized reflection, I use DWARF as a source of reflection metadata. And in case of G++ 6.3, which seem to emit same name names as abi::__cxa_demangle, it has never failed so far in my case. And I have very diverse inputs.

In fact I was working on it for about a year, and I was thinking that it how it supposed to work. Only after I upgraded to g++ 7 I've found out that both modern g++ and clang do not emit unambiguous debuginfo.

2. The failure that was cited on the gdb mailing list only occurs on polymorphic classes. If you have it occurring on non-polymorphic classes, that seems like a very different problem, and probably related to the fact that GDB does not know how to assemble or parse C++ names properly in some cases. Otherwise, this would occur on literally every class you saw in GDB, and that's definitely not the case:)

That is correct, these two problems are different. However they can be solved together.

Consider for example if we had only standardized mangled names in debuginfo:

1) GDB can demangle them and pretty-print in interactive sessions

2) Since it is the same name in debuginfo and RTTI, no problems with dynamic type identification in debugger

3) Since mangled names is unique for type, we can use it in any other scenarios. For example I want to analyze state of process in debugger, and then load it in Clang to do some source-to-source transformations. This is what I've planned for my hardware construction library.

-Roman

David Blaikie via llvm-dev

unread,

Mar 6, 2018, 12:22:51 PM3/6/18

to Daniel Berlin, llvm-dev, Clang Dev

On Tue, Mar 6, 2018 at 8:39 AM Daniel Berlin <dbe...@dberlin.org> wrote:

On Mon, Mar 5, 2018 at 11:55 PM, Roman Popov <rip...@gmail.com> wrote:
I don't understand how extra vtable ref DIE will help in case on non-polymorphic classes. If you remove virtual destructor from example, vtable won't be generated for class, but DWARF will still have incorrect ambiguous names for types.
1. Calling them incorrect is ... not right. As Andrew quoted on the gdb mailing list, this is what DWARF specifies should happen,

Might be helpful to point to/include any details cited here for the purpose of this conversation - a bit hard for the rest of us to follow along.

so they are correct by spec. If you believe the spec is wrong, file an issue on the DWARF web site and discuss it on the mailing list, and bring back the consensus of the committee as to what to do :)

The ambiguous names are probably incorrect - having two distinct types that have the same name's not really going to work out well for a consumer. (so having the distinct types foo<11u> and foo<11> in source both produce a DWARF type named "foo<11>" I'd say is a bug that ought to be fixed - as is any other case where the names become ambiguous, otherwise matching up types between TUs would become impossible, which would be not good)

2. The failure that was cited on the gdb mailing list only occurs on polymorphic classes. If you have it occurring on non-polymorphic classes, that seems like a very different problem, and probably related to the fact that GDB does not know how to assemble or parse C++ names properly in some cases. Otherwise, this would occur on literally every class you saw in GDB, and that's definitely not the case:)

Sounds like Roman's talking about other use cases apart from GDB.

The only reason linkage names would fix that issue is because they provide an exact match to GDB's parsing failure.

Not sure I follow this - providing linkage names would provide a reliable mechanism to match the vtable symbol. There wouldn't need to be any parsing, or any failure of parsing involved.

But, yes, addresses would be potentially a better description rather than having to match names in the object's symbol table.

Daniel Berlin via llvm-dev

unread,

Mar 6, 2018, 12:28:55 PM3/6/18

to David Blaikie, llvm-dev, Clang Dev

On Tue, Mar 6, 2018 at 9:22 AM, David Blaikie <dbla...@gmail.com> wrote:

On Tue, Mar 6, 2018 at 8:39 AM Daniel Berlin <dbe...@dberlin.org> wrote:
On Mon, Mar 5, 2018 at 11:55 PM, Roman Popov <rip...@gmail.com> wrote:
I don't understand how extra vtable ref DIE will help in case on non-polymorphic classes. If you remove virtual destructor from example, vtable won't be generated for class, but DWARF will still have incorrect ambiguous names for types.
1. Calling them incorrect is ... not right. As Andrew quoted on the gdb mailing list, this is what DWARF specifies should happen,

Might be helpful to point to/include any details cited here for the purpose of this conversation - a bit hard for the rest of us to follow along.

"
Reading http://wiki.dwarfstd.org/index.php?title=Best_Practices:
the DW_AT_name attribute should contain the name of the corresponding
program object as it appears in the source code, without any
qualifiers such as namespaces, containing classes, or modules (see
Section 2.15). A consumer can easily reconstruct the fully-qualified
name from the DIE hierarchy. In general, the value of DW_AT_name
should be such that a fully-qualified name constructed from the
DW_AT_name attributes of the object and its containing objects will
uniquely represent that object in a form natural to the source
language."

so they are correct by spec. If you believe the spec is wrong, file an issue on the DWARF web site and discuss it on the mailing list, and bring back the consensus of the committee as to what to do :)

The ambiguous names are probably incorrect - having two distinct types that have the same name's not really going to work out well for a consumer. (so having the distinct types foo<11u> and foo<11> in source both produce a DWARF type named "foo<11>" I'd say is a bug that ought to be fixed - as is any other case where the names become ambiguous, otherwise matching up types between TUs would become impossible, which would be not good)

I'm sure the spec needs to be updated, i'm just saying "it's not wrong by what the spec and best practices say to do right now".

2. The failure that was cited on the gdb mailing list only occurs on polymorphic classes. If you have it occurring on non-polymorphic classes, that seems like a very different problem, and probably related to the fact that GDB does not know how to assemble or parse C++ names properly in some cases. Otherwise, this would occur on literally every class you saw in GDB, and that's definitely not the case:)

Sounds like Roman's talking about other use cases apart from GDB.

Yes.

The only reason linkage names would fix that issue is because they provide an exact match to GDB's parsing failure.

Not sure I follow this - providing linkage names would provide a reliable mechanism to match the vtable symbol. There wouldn't need to be any parsing, or any failure of parsing involved.

But, yes, addresses would be potentially a better description rather than having to match names in the object's symbol table.

I'm saying the only reason it would fix non-polymorphic classes is if gdb is failing to parse names so that it can do die lookup properly.

GDB gives up in some cases and incorrectly says "lookup foo::bar::fred in the global symbol namespace" instead of "lookup fred inside class bar symbol namespace".

In those cases, the linkage name would fix it because it will appear in the global symbol namespace.

But it would also work if you just fixed the name parsing.

Daniel Berlin via llvm-dev

unread,

Mar 6, 2018, 12:36:58 PM3/6/18

to David Blaikie, llvm-dev, Clang Dev

If you want an example, gdb's parser understands that Foo<unsigned int> and Foo<unsigned> are the same because it parses them properly.

It does not understand that Foo<2> and Foo<2u> are the same because it parses them incorrectly.

Fixing the parsing would fix the lookup issue in that case.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81932

etc

David Blaikie via llvm-dev

unread,

Mar 6, 2018, 12:47:54 PM3/6/18

to Daniel Berlin, llvm-dev, Clang Dev

On Tue, Mar 6, 2018 at 9:28 AM Daniel Berlin <dbe...@dberlin.org> wrote:

On Tue, Mar 6, 2018 at 9:22 AM, David Blaikie <dbla...@gmail.com> wrote:

On Tue, Mar 6, 2018 at 8:39 AM Daniel Berlin <dbe...@dberlin.org> wrote:
On Mon, Mar 5, 2018 at 11:55 PM, Roman Popov <rip...@gmail.com> wrote:
I don't understand how extra vtable ref DIE will help in case on non-polymorphic classes. If you remove virtual destructor from example, vtable won't be generated for class, but DWARF will still have incorrect ambiguous names for types.
1. Calling them incorrect is ... not right. As Andrew quoted on the gdb mailing list, this is what DWARF specifies should happen,

Might be helpful to point to/include any details cited here for the purpose of this conversation - a bit hard for the rest of us to follow along.

"
Reading http://wiki.dwarfstd.org/index.php?title=Best_Practices:
the DW_AT_name attribute should contain the name of the corresponding
program object as it appears in the source code, without any
qualifiers such as namespaces, containing classes, or modules (see
Section 2.15). A consumer can easily reconstruct the fully-qualified
name from the DIE hierarchy. In general, the value of DW_AT_name
should be such that a fully-qualified name constructed from the
DW_AT_name attributes of the object and its containing objects will
uniquely represent that object in a form natural to the source
language."

so they are correct by spec. If you believe the spec is wrong, file an issue on the DWARF web site and discuss it on the mailing list, and bring back the consensus of the committee as to what to do :)

The ambiguous names are probably incorrect - having two distinct types that have the same name's not really going to work out well for a consumer. (so having the distinct types foo<11u> and foo<11> in source both produce a DWARF type named "foo<11>" I'd say is a bug that ought to be fixed - as is any other case where the names become ambiguous, otherwise matching up types between TUs would become impossible, which would be not good)

I'm sure the spec needs to be updated, i'm just saying "it's not wrong by what the spec and best practices say to do right now".

Looks wrong to me. It doesn't "uniquely represent" the object nor is it natural to the source language (foo<11> gets you the signed one, you'd have to write foo<11u> or foo<(unsigned)11> to get the unsigned one - yet Clang's DWARF currently names them both foo<11>).

2. The failure that was cited on the gdb mailing list only occurs on polymorphic classes. If you have it occurring on non-polymorphic classes, that seems like a very different problem, and probably related to the fact that GDB does not know how to assemble or parse C++ names properly in some cases. Otherwise, this would occur on literally every class you saw in GDB, and that's definitely not the case:)

Sounds like Roman's talking about other use cases apart from GDB.

Yes.

The only reason linkage names would fix that issue is because they provide an exact match to GDB's parsing failure.

Not sure I follow this - providing linkage names would provide a reliable mechanism to match the vtable symbol. There wouldn't need to be any parsing, or any failure of parsing involved.

But, yes, addresses would be potentially a better description rather than having to match names in the object's symbol table.

I'm saying the only reason it would fix non-polymorphic classes is if gdb is failing to parse names so that it can do die lookup properly.

GDB gives up in some cases and incorrectly says "lookup foo::bar::fred in the global symbol namespace" instead of "lookup fred inside class bar symbol namespace".

In those cases, the linkage name would fix it because it will appear in the global symbol namespace.
But it would also work if you just fixed the name parsing.

Can't say I'm following this part.. well, sort of following. But doesn't seem relevant to Roman's situation, which isn't about GDB.

I think the only problem being addressed for GDB is the polymorphic case. The ability to match non-polymorphic types (with what, I'm not sure - not vtables in any case) is motivated by Roman's other examples of IR, etc, not GDB's dynamic type discovery.

- Dave

David Blaikie via llvm-dev

unread,

Mar 6, 2018, 12:49:39 PM3/6/18

to Daniel Berlin, llvm-dev, Clang Dev

Ah, but they aren't necessarily the same type. Clang (& GCC?) are producing two different types but naming them both Foo<2>. That's the ambiguity Roman's referring to.

That's a bug in Clang (& GCC?) that ought to be fixed.

Daniel Berlin via llvm-dev

unread,

Mar 6, 2018, 12:50:58 PM3/6/18

to David Blaikie, llvm-dev, Clang Dev

On Tue, Mar 6, 2018 at 9:46 AM, David Blaikie <dbla...@gmail.com> wrote:

On Tue, Mar 6, 2018 at 9:28 AM Daniel Berlin <dbe...@dberlin.org> wrote:
On Tue, Mar 6, 2018 at 9:22 AM, David Blaikie <dbla...@gmail.com> wrote:

On Tue, Mar 6, 2018 at 8:39 AM Daniel Berlin <dbe...@dberlin.org> wrote:
On Mon, Mar 5, 2018 at 11:55 PM, Roman Popov <rip...@gmail.com> wrote:
I don't understand how extra vtable ref DIE will help in case on non-polymorphic classes. If you remove virtual destructor from example, vtable won't be generated for class, but DWARF will still have incorrect ambiguous names for types.
1. Calling them incorrect is ... not right. As Andrew quoted on the gdb mailing list, this is what DWARF specifies should happen,

Might be helpful to point to/include any details cited here for the purpose of this conversation - a bit hard for the rest of us to follow along.

"
Reading http://wiki.dwarfstd.org/index.php?title=Best_Practices:
the DW_AT_name attribute should contain the name of the corresponding
program object as it appears in the source code, without any
qualifiers such as namespaces, containing classes, or modules (see
Section 2.15). A consumer can easily reconstruct the fully-qualified
name from the DIE hierarchy. In general, the value of DW_AT_name
should be such that a fully-qualified name constructed from the
DW_AT_name attributes of the object and its containing objects will
uniquely represent that object in a form natural to the source
language."

so they are correct by spec. If you believe the spec is wrong, file an issue on the DWARF web site and discuss it on the mailing list, and bring back the consensus of the committee as to what to do :)

The ambiguous names are probably incorrect - having two distinct types that have the same name's not really going to work out well for a consumer. (so having the distinct types foo<11u> and foo<11> in source both produce a DWARF type named "foo<11>" I'd say is a bug that ought to be fixed - as is any other case where the names become ambiguous, otherwise matching up types between TUs would become impossible, which would be not good)

I'm sure the spec needs to be updated, i'm just saying "it's not wrong by what the spec and best practices say to do right now".

Looks wrong to me. It doesn't "uniquely represent" the object nor is it natural to the source language (foo<11> gets you the signed one, you'd have to write foo<11u> or foo<(unsigned)11> to get the unsigned one - yet Clang's DWARF currently names them both foo<11>).

Yes, this is the gdb/gcc bug i cited, and ther eis probably a clang bug somewhere too.

2. The failure that was cited on the gdb mailing list only occurs on polymorphic classes. If you have it occurring on non-polymorphic classes, that seems like a very different problem, and probably related to the fact that GDB does not know how to assemble or parse C++ names properly in some cases. Otherwise, this would occur on literally every class you saw in GDB, and that's definitely not the case:)

Sounds like Roman's talking about other use cases apart from GDB.

Yes.

The only reason linkage names would fix that issue is because they provide an exact match to GDB's parsing failure.

Not sure I follow this - providing linkage names would provide a reliable mechanism to match the vtable symbol. There wouldn't need to be any parsing, or any failure of parsing involved.

But, yes, addresses would be potentially a better description rather than having to match names in the object's symbol table.

I'm saying the only reason it would fix non-polymorphic classes is if gdb is failing to parse names so that it can do die lookup properly.

GDB gives up in some cases and incorrectly says "lookup foo::bar::fred in the global symbol namespace" instead of "lookup fred inside class bar symbol namespace".

In those cases, the linkage name would fix it because it will appear in the global symbol namespace.
But it would also work if you just fixed the name parsing.

Can't say I'm following this part.. well, sort of following. But doesn't seem relevant to Roman's situation, which isn't about GDB.

He did in fact claim non-polymorphic gdb lookup would fail (it won't), then later talked about different use cases .

I'm just pointing out it will not.

I think the only problem being addressed for GDB is the polymorphic case. The ability to match non-polymorphic types (with what, I'm not sure - not vtables in any case) is motivated by Roman's other examples of IR, etc, not GDB's dynamic type discovery.

Sure.

In those cases, there are a host of other problems

Daniel Berlin via llvm-dev

unread,

Mar 6, 2018, 12:51:46 PM3/6/18

to Roman Popov, llvm-dev, Clang Dev

On Tue, Mar 6, 2018 at 9:20 AM, Roman Popov <rip...@gmail.com> wrote:

I wonder if abi::__cxa_demangle guarantees unambigous names?

No, it does not.

Interesting. Can you give an example of type where it fails?

I can't construct one out of thin air, but i believe someone cited one to you on the gdb mailing list. It's entirely possible for the human readable form of two symbols to be the same when the symbols are different.

I really just don't have the energy to copy the entire discussion on the other mailing list here.

More to the point, the ABI literally does not guarantee it, and different demanglers for the Itanium ABI (there are a bunch) do different things for human readable names.

You cite below gcc vs gcc, which is different versions of the same demangler. There are a bunch of Itanium C++ ABI implementations, including demanglers and compilers, and i'd strongly caution you to remember that all the world is not clang and GNU.

I'm currently working on hardware construction library for C++ (similar to Chisel (which is written in Scala)). And since C++ has no standardized reflection, I use DWARF as a source of reflection metadata. And in case of G++ 6.3, which seem to emit same name names as abi::__cxa_demangle, it has never failed so far in my case. And I have very diverse inputs.
In fact I was working on it for about a year, and I was thinking that it how it supposed to work. Only after I upgraded to g++ 7 I've found out that both modern g++ and clang do not emit unambiguous debuginfo.

This seems to be a different question than i thought you asked.

If you are asking "where will the demangled name between what abi::__cxa_demangle and what GCC outputs in the debug info differ", it's unlikely to differ if you use the same versions of both ;)

But it will in some cases. Some bugs, some not.

Mangled names are not a panacea. I think you also wildly underestimate the cost of demangling every symbol in a large binary, for example, which would required for your suggestion, as well as the size of these symbols, etc. It's enough that people wrote a fast demangler, for example. That's just one issue.

As for the rest, you are taking a perspective that is pretty strongly focused on your use cases, and currently, DWARF is pretty focused on the other ones.

If you want to convince the committee/others that it should give up on the part of the best practices i cited, go for it.

But you started by claiming this was necessary/important to fix the GDB problems here, and it's simply not. In fact, it would not fix most of them without a serious change in the way these things operate, and at high cost.

Suggesting to change gdb, gcc, clang, to fit your non-debugger use case is taking a very big hammer and saying "it can also pound these nails".

While true, that doesn't mean you should.

I'd strongly suggest, if you have concerns about the ability of DWARF to handle your use cases without linkage names, that you go to the DWARF mailing list and start a discussion about, rather than just proposing a solution.

In my experience, the people there have thought a lot about all of these use cases, and you may in fact find a solution that doesn't require doing anything at all.

Daniel Berlin via llvm-dev

unread,

Mar 6, 2018, 12:53:27 PM3/6/18

to David Blaikie, llvm-dev, Clang Dev

Agreed.

Daniel Berlin via llvm-dev

unread,

Mar 6, 2018, 12:56:12 PM3/6/18

to David Blaikie, llvm-dev, Clang Dev

On Tue, Mar 6, 2018 at 9:46 AM, David Blaikie <dbla...@gmail.com> wrote:

On Tue, Mar 6, 2018 at 9:28 AM Daniel Berlin <dbe...@dberlin.org> wrote:
On Tue, Mar 6, 2018 at 9:22 AM, David Blaikie <dbla...@gmail.com> wrote:

On Tue, Mar 6, 2018 at 8:39 AM Daniel Berlin <dbe...@dberlin.org> wrote:
On Mon, Mar 5, 2018 at 11:55 PM, Roman Popov <rip...@gmail.com> wrote:
I don't understand how extra vtable ref DIE will help in case on non-polymorphic classes. If you remove virtual destructor from example, vtable won't be generated for class, but DWARF will still have incorrect ambiguous names for types.
1. Calling them incorrect is ... not right. As Andrew quoted on the gdb mailing list, this is what DWARF specifies should happen,

Might be helpful to point to/include any details cited here for the purpose of this conversation - a bit hard for the rest of us to follow along.

"
Reading http://wiki.dwarfstd.org/index.php?title=Best_Practices:
the DW_AT_name attribute should contain the name of the corresponding
program object as it appears in the source code, without any
qualifiers such as namespaces, containing classes, or modules (see
Section 2.15). A consumer can easily reconstruct the fully-qualified
name from the DIE hierarchy. In general, the value of DW_AT_name
should be such that a fully-qualified name constructed from the
DW_AT_name attributes of the object and its containing objects will
uniquely represent that object in a form natural to the source
language."

so they are correct by spec. If you believe the spec is wrong, file an issue on the DWARF web site and discuss it on the mailing list, and bring back the consensus of the committee as to what to do :)

The ambiguous names are probably incorrect - having two distinct types that have the same name's not really going to work out well for a consumer. (so having the distinct types foo<11u> and foo<11> in source both produce a DWARF type named "foo<11>" I'd say is a bug that ought to be fixed - as is any other case where the names become ambiguous, otherwise matching up types between TUs would become impossible, which would be not good)

I'm sure the spec needs to be updated, i'm just saying "it's not wrong by what the spec and best practices say to do right now".

Looks wrong to me.

It doesn't "uniquely represent" the object nor is it natural to the source language (foo<11> gets you the signed one, you'd have to write foo<11u> or foo<(unsigned)11> to get the unsigned one - yet Clang's DWARF currently names them both foo<11>).

Great, so fix clang/llvm to make it true :)

That still doesn't involve adding linkage names to everything.

If we can't fix it, that's worth a discussion with the DWARF folks.

David Blaikie via llvm-dev

unread,

Mar 6, 2018, 12:59:51 PM3/6/18

to Daniel Berlin, llvm-dev, Clang Dev

I think the only reason Roman's discussing ways that would work without linkage names is because you pretty firmly said that adding linkage names was a bad idea (for the original issue of dynamic class identification).

Which, sure, it's an idea with some issues - but the alternative (vtable DIEs with address/ref) isn't without complications too - not to dismiss it, but to suggest having some real conversation about the tradeoffs seems worthwhile.

& while the vtable DIE solution addresses the dynamic class identification case, it doesn't cover the other use cases Roman has in mind - other use cases that sound like they would be solved by having the linkage name of a type provided in the DWARF.

that you go to the DWARF mailing list and start a discussion about, rather than just proposing a solution.

In my experience, the people there have thought a lot about all of these use cases, and you may in fact find a solution that doesn't require doing anything at all.

*nod* fair, might be worth it for the broader set of issues Roman seems to be dealing with (beyond the dynamic type identification issues that GDB demonstrates).

Roman Popov via llvm-dev

unread,

Mar 6, 2018, 1:17:54 PM3/6/18

to David Blaikie, llvm-dev, Clang Dev

"
Reading http://wiki.dwarfstd.org/index.php?title=Best_Practices:
the DW_AT_name attribute should contain the name of the corresponding
program object as it appears in the source code, without any
qualifiers such as namespaces, containing classes, or modules (see
Section 2.15). A consumer can easily reconstruct the fully-qualified
name from the DIE hierarchy. In general, the value of DW_AT_name
should be such that a fully-qualified name constructed from the
DW_AT_name attributes of the object and its containing objects will
uniquely represent that object in a form natural to the source
language."

And continuing the quote from same webpage:

"For template instantiations, the DW_AT_name attribute should contain both the source language name of the object and the template parameters that distinguish one instantiation from another. The resulting string should be in the natural form for the language, and should have a canonical representation (i.e., different producers should generate the same representation). For C++, the string should match that produced by the target platform's canonical demangler; spaces should only be inserted where syntactically required by the compiler."

As I said, for about a year I thought that this is how it supposed to work. Only after I upgraded compiler, I found all those issues.

that you go to the DWARF mailing list and start a discussion about, rather than just proposing a solution.
In my experience, the people there have thought a lot about all of these use cases, and you may in fact find a solution that doesn't require doing anything at all.

*nod* fair, might be worth it for the broader set of issues Roman seems to be dealing with (beyond the dynamic type identification issues that GDB demonstrates).

I can ping DWARF maillist about using DWARF as a reflection mechanism. DWARF however is language-agnostic, and this seem to be C++ -specific issue.

Reply all

Reply to author

Forward