Development of 32- and 64-bit Portable Executable File Format Signatures and results

248 views
Skip to first unread message

ross-spencer

unread,
Feb 16, 2016, 3:18:10 AM2/16/16
to droid-list
I've been working on a method of distinguishing between 32-bit and 64-bit windows executable files based on the documentation on the MSDN developer network: https://msdn.microsoft.com/en-us/library/ms809762.aspx 

It describes the Portable Executable header as: 

DWORD Signature; 
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER OptionalHeader;

We know the Signature to be 0x50450000

The documentation for the IMAGE_FILE_HEADER and IMAGE_OPTIONAL_HEADER are here:


And we can interpret them for the following structure sizes and convert them to offsets: 

File Header:

2   WORD  Machine;
2   WORD  NumberOfSections;
4   DWORD TimeDateStamp;
4   DWORD PointerToSymbolTable;
4   DWORD NumberOfSymbols;
2   WORD  SizeOfOptionalHeader;
2   WORD  Characteristics;
20

Optional Header:

2 ~~~   WORD                 Magic;
1 1   BYTE                 MajorLinkerVersion;
1 1   BYTE                 MinorLinkerVersion;
4 4   DWORD                SizeOfCode;
4 4   DWORD                SizeOfInitializedData;
4 4   DWORD                SizeOfUninitializedData;
4 4   DWORD                AddressOfEntryPoint;
4 4   DWORD                BaseOfCode;
4 4   DWORD                BaseOfData;
4 4   DWORD                ImageBase;
4 4   DWORD                SectionAlignment;
4 4   DWORD                FileAlignment;
2 2   WORD                 MajorOperatingSystemVersion;
2 2   WORD                 MinorOperatingSystemVersion;
2 2   WORD                 MajorImageVersion;
2 2   WORD                 MinorImageVersion;
2 2   WORD                 MajorSubsystemVersion;
2 2   WORD                 MinorSubsystemVersion;
4 4   DWORD                Win32VersionValue;
4 4   DWORD                SizeOfImage;
4 4   DWORD                SizeOfHeaders;
4 4   DWORD                CheckSum;
2 ~~   WORD                 Subsystem;
70 66


My research and also the result of my trial signatures below show the FILE_HEADER to be always 20 bytes and so we can consistently find the OTIONAL_HEADER following that. 

To differentiate between 32-bit and 64-bit I have used the OPTIONAL magic:

ValueMeaning
IMAGE_NT_OPTIONAL_HDR_MAGIC

The file is an executable image. This value is defined as IMAGE_NT_OPTIONAL_HDR32_MAGIC in a 32-bit application and asIMAGE_NT_OPTIONAL_HDR64_MAGIC in a 64-bit application.

IMAGE_NT_OPTIONAL_HDR32_MAGIC
0x10b

The file is an executable image.

IMAGE_NT_OPTIONAL_HDR64_MAGIC
0x20b

The file is an executable image.

IMAGE_ROM_OPTIONAL_HDR_MAGIC
0x107

The file is a ROM image.


I was tempted to leave the signature here, but wanted to iron out potential mutliple-identifications and so took the signature down to the subsystem field a further 66 bytes on.

This results in the two signatures: 

  • 32-bit: 4D5A{126-128500}50450000{20}0B01{66}[0000:1000] 
  • 64-bit: 4D5A{126-128500}50450000{20}0B02{66}[0000:1000] 

I haven't created a ROM image signature as I haven't an ability to test these. 

Subsystem is a WORD and in the documentation there are 16 values, 0x00 to 0x16, I opted for sequence syntax to use the extended signature purely to strengthen the identification, not to provide additional characterization. 

Attached is a sample signature file, there are a number of different ways to code the signature in PRONOM including multiple byte sequences, or a single BOF. 

A strengthening of the original PE signature is the removal of the wildcard: 

4D5A*50450000

The following two charts show analysis of the location of the second byte sequence in portable executable files based on 1690 *.exe files (no filename duplicates):



The smallest offset is 126 bytes, the largest in the chart here is 1174. Importantly there is a single outlier not shown here but represented in the signature of 128342 bytes. There seems to be no correlation between file size (bytes) and offset location.  


Perhaps if this signature is not accepted this analysis can be used to remove the (*) wildcard in the existing signature if considered to be of benefit. 


Just FYI I created the tool here to create the analysis values: https://github.com/ross-spencer/bindist/releases/tag/v1.0.0-beta 


I have reported an issue that seems to exist in DROID 6.1.5 which seems to be returning a small percentage of multiple identification results here: https://github.com/digital-preservation/droid/issues/85 


The results of the signature including these multiple IDs are: 


  • 15 multiple-ids
  • 0 zero-id
  • 933 64-bit portable exes
  • 742 32-bit portable exes


As noted in the issue above, some of the 15 results above seem to be due to a issue in DROID's processing. From the 15 I have, it does not seem to be the cause of a potential overlap between {126-128500} specified in the signature. 


The questions for TNA and the list then are:

  • Do we differentiate between 32-bit and 64-bit like we do with Linux ELF files? 
  • Do we use any of the other information available to us to create more granular signatures than these still? 
Comments, and other's testing results appreciated. 

Cheers,

Ross


portable-exe-32-64-bit-v1.0-signature-file-subsy.xml
Reply all
Reply to author
Forward
0 new messages