[llvm-dev] yaml2obj support for COFF debug directories

30 views
Skip to first unread message

Penzin, Petr via llvm-dev

unread,
Mar 3, 2020, 10:10:53 PM3/3/20
to llvm...@lists.llvm.org, jh7...@my.bristol.ac.uk

Spoiler: the following only applies to Windows binary format handling.

Potential for extending yaml2obj to support COFF debug directories recently came up during a code review. Currently, its COFF syntax allows for specifying section data, but not debug directories, that's why llvm-readobj tests which depend on debug directory contents use pre-built executable images instead of yaml2obj.

It is possible to extend the tool, but first I would be interested in gathering feedback on usability of this, especially on potential uses of this change. It looks like porting llvm-readobj tests for codeview would depend on this and also D70606 is introducing another possible use. But I am not sure how trivial would the codeview effort, would it be worth it or is it easier to leave things as they are for now?

In case this is interesting, base Yaml syntax for COFF debug directory may look like this (enum values representing COFF Debug Types):

DebugDirectory:
  - Type: [ {type: str, enum: [...]}, {type: int} ]
  - DebugDirectoryData: {type: str}

This may have to be further specialized for sub-categories, specifically codeview.

 

Best,

Petr

 

Rui Ueyama via llvm-dev

unread,
Mar 4, 2020, 1:29:55 AM3/4/20
to Penzin, Petr, llvm...@lists.llvm.org, jh7...@my.bristol.ac.uk
Hi Penzin,

From the practical standpoint, I think this is a matter of investment and reward. If we are going to use the feature only for writing a test for lld, I guess it might not be worth it, and we can live with binary test file though it's not ideal.

My feeling is that we eventually have to implement the feature, as Microsoft seem to add a new bit to DLLCharacteristics every few years and thus we'll see more bits defined for ExtendedDLLCharacteristics in the future, but for now, I don't see an immediate need to implement it as there's only one bit defined for ExtendedDLLCharacteristics.

James Henderson via llvm-dev

unread,
Mar 4, 2020, 4:25:39 AM3/4/20
to Penzin, Petr, llvm...@lists.llvm.org
I'm not sure I know enough about COFF and debug directories to know how useful this feature will be, but I do have some thoughts on the syntax, based on my experience working with the ELF part of yaml2obj. From reading the spec you linked, I would think it might look something like the following:

DebugDirectory:
  - Characteristics: 1234 # Optional, defaults to 0. Contains value to write in Characteristics field.
    TimeDateStamp: 4321 # Optional, defaults to 0(?).
    MajorVersion: 1 # Optional, defaults to 0.
    MinorVersion: 2 # Optional, defaults to 0.
    Data: # Required
    - Type: 12 # Required, contains the value of the Type field, can be written as raw number or enum value (see how ELF works for various fields).
      Size: 1111 # Optional, derives size from data field, if not specified.
      Address: 2222 # Optional, defaults to 0(?)
      Pointer: 3333 # Optional, defaults to wherever yaml2obj chooses to place the data.
      RawData: '12345678abcdef0' # Optional byte string (see 'Content' fields for ELF sections). Defaults to empty if not specified.
      ## The following fields are all defined based on the Type value (for unrecognised values, by default only RawData is allowed). Cannot be mixed with RawData field. Only those actually required need to be implemented up front.
      ExtendedDLLCharacteristics: # Used for IMAGE_DEBUG_TYPE_EX_DLLCHARACTERISTICS
        - ... # Fields related to DLL Characteristics
      FPOInfo: # Used for IMAGE_DEBUG_TYPE_FP
        - ...
        - ... # FPO Information array
      ...

Does this make sense? It's somewhat similar to how Sections are defined in ELF yaml2obj.

James

Pavel Labath via llvm-dev

unread,
Mar 4, 2020, 8:37:12 AM3/4/20
to Penzin, Petr, llvm...@lists.llvm.org
I also don't know much about COFF, but I am always interested in using
yaml2obj to generate "interesting" test cases for lldb. So, if you're
looking for a use case, this sounds like it could be very useful there.

cheers,
pavel

> <mailto:petr....@intel.com>> wrote:
>
> /Spoiler:/ the following only applies to Windows binary format
> handling.____


>
> Potential for extending yaml2obj to support COFF debug directories

> <https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#debug-directory-image-only>


> recently came up during a code review

> <https://reviews.llvm.org/D70606#1873185>. Currently, its COFF
> syntax <http://llvm.org/docs/yaml2obj.html#coff-syntax> allows for


> specifying section data, but not debug directories, that's why
> llvm-readobj tests which depend on debug directory contents use

> pre-built executable images instead of yaml2obj.____


>
> It is possible to extend the tool, but first I would be interested
> in gathering feedback on usability of this, especially on potential
> uses of this change. It looks like porting llvm-readobj tests for
> codeview would depend on this and also D70606

> <https://reviews.llvm.org/D70606> is introducing another possible


> use. But I am not sure how trivial would the codeview effort, would

> it be worth it or is it easier to leave things as they are for now?____


>
> In case this is interesting, base Yaml syntax for COFF debug
> directory may look like this (enum values representing COFF Debug
> Types

> <https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#debug-type>):____


>
> DebugDirectory:
>   - Type: [ {type: str, enum: [...]}, {type: int} ]

>   - DebugDirectoryData: {type: str}____


>
> This may have to be further specialized for sub-categories,

> specifically codeview.____
>
> __ __
>
> Best,____
>
> Petr____
>
> __ __
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reid Kleckner via llvm-dev

unread,
Mar 4, 2020, 3:48:39 PM3/4/20
to Penzin, Petr, llvm...@lists.llvm.org, jh7...@my.bristol.ac.uk
I think it seems like an oversight, and improvements in this area would be welcome.

I think most of the effort in COFF <-> YAML translation has been for representing object files, and debug directories are a feature of fully linked PE images. With that in mind, it's not too surprising that the feature is missing.

_______________________________________________

Martin Storsjö via llvm-dev

unread,
Mar 5, 2020, 3:11:25 AM3/5/20
to Reid Kleckner, llvm...@lists.llvm.org, jh7...@my.bristol.ac.uk
On Wed, 4 Mar 2020, Reid Kleckner via llvm-dev wrote:

> I think it seems like an oversight, and improvements in this area would be
> welcome.
> I think most of the effort in COFF <-> YAML translation has been for
> representing object files, and debug directories are a feature of fully
> linked PE images. With that in mind, it's not too surprising that the
> feature is missing.

In general, it should be possible to roundtrip linked PE images via yaml
just fine - their contents would just be part of the opaque section
contents blob. Hard to inspect and tweak by hand, but so are lots of other
things that are referended via data directories (like base relocation
tables) and stored in the plain section contents.

But debug directories have got one property which would break this - they
have a PointerToRawData field, that should contain the raw byte offset
within the linked PE image, to their content data. As roundtrip via yaml
does rewrite the file structure (and the output layout of yaml2obj isn't
supposed to be fixed), the exact value of this field would have to be
updated. As far as I know, yaml2obj doesn't do this at the moment.

llvm-objcopy's COFF backend does try to do it
(COFFWriter::patchDebugDirectory in
llvm/tools/llvm-objcopy/COFF/Writer.cpp), but when I now reread the code
there, I'm pretty sure I made some mistakes there. (I incorrectly assumed
that the raw data is interleaved after each debug directory header.) With
your lld patch for the CET compat flag, it should be easy to generate a
testcase for that, with more than one debug directory.

One general design question regarding this in obj2yaml, is that when the
debug directories are synthesized, should they be appended onto one of the
existing sections (with normal hex dumped contents) or created as an
entirely new section? Synthesizing them separately works fine for cases
where a file is generated entirely from scratch with yaml, but is tricky
for obj2yaml, where the original debug directories pretty much need to be
left in place. In that case, each time a PE image is roundtripped via
yaml, it would generate yet another set of debug directories, orphaning
the old ones.

Finally, when reading the spec, it also seems like the payload of a debug
directory doesn't even need to be in the mappable parts of sections, but
could be in unmapped areas of the PE image file (by having
AddressOfRawData set to zero, so it can only be found via
PointerToRawData). This doesn't seem like something that e.g.
llvm-readobj's --coff-debug-directoriy currently supports though (and
llvm-objcopy expects the paylaod to be moved along as part of sections'
contents).

I'll make a note to try to fix llvm-objcopy's assumptions about the
location of the payload this sometime in the future.


// Martin

Reply all
Reply to author
Forward
0 new messages