I've been working on a method of distinguishing between 32-bit and 64-bit windows executable files based on the documentation on the MSDN developer network:
https://msdn.microsoft.com/en-us/library/ms809762.aspx
It describes the Portable Executable header as:
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER OptionalHeader;
We know the Signature to be 0x50450000
The documentation for the IMAGE_FILE_HEADER and IMAGE_OPTIONAL_HEADER are here:
And we can interpret them for the following structure sizes and convert them to offsets:
File Header:
2 |
WORD Machine; |
2 |
WORD
NumberOfSections; |
4 |
DWORD TimeDateStamp; |
4 |
DWORD
PointerToSymbolTable; |
4 |
DWORD NumberOfSymbols; |
2 |
WORD
SizeOfOptionalHeader; |
2 |
WORD
Characteristics; |
20 |
|
Optional Header:
2 |
~~~ |
WORD
Magic; |
1 |
1 |
BYTE
MajorLinkerVersion; |
1 |
1 |
BYTE
MinorLinkerVersion; |
4 |
4 |
DWORD
SizeOfCode; |
4 |
4 |
DWORD
SizeOfInitializedData; |
4 |
4 |
DWORD
SizeOfUninitializedData; |
4 |
4 |
DWORD
AddressOfEntryPoint; |
4 |
4 |
DWORD
BaseOfCode; |
4 |
4 |
DWORD
BaseOfData; |
4 |
4 |
DWORD
ImageBase; |
4 |
4 |
DWORD
SectionAlignment; |
4 |
4 |
DWORD
FileAlignment; |
2 |
2 |
WORD
MajorOperatingSystemVersion; |
2 |
2 |
WORD
MinorOperatingSystemVersion; |
2 |
2 |
WORD
MajorImageVersion; |
2 |
2 |
WORD
MinorImageVersion; |
2 |
2 |
WORD
MajorSubsystemVersion; |
2 |
2 |
WORD
MinorSubsystemVersion; |
4 |
4 |
DWORD
Win32VersionValue; |
4 |
4 |
DWORD
SizeOfImage; |
4 |
4 |
DWORD
SizeOfHeaders; |
4 |
4 |
DWORD
CheckSum; |
2 |
~~ |
WORD
Subsystem; |
70 |
66 |
|
My research and also the result of my trial signatures below show the FILE_HEADER to be always 20 bytes and so we can consistently find the OTIONAL_HEADER following that.
To differentiate between 32-bit and 64-bit I have used the OPTIONAL magic:
Value | Meaning |
---|
- IMAGE_NT_OPTIONAL_HDR_MAGIC
| The file is an executable image. This value is defined as IMAGE_NT_OPTIONAL_HDR32_MAGIC in a 32-bit application and asIMAGE_NT_OPTIONAL_HDR64_MAGIC in a 64-bit application. |
- IMAGE_NT_OPTIONAL_HDR32_MAGIC
- 0x10b
| The file is an executable image. |
- IMAGE_NT_OPTIONAL_HDR64_MAGIC
- 0x20b
| The file is an executable image. |
- IMAGE_ROM_OPTIONAL_HDR_MAGIC
- 0x107
| The file is a ROM image. |
I was tempted to leave the signature here, but wanted to iron out potential mutliple-identifications and so took the signature down to the subsystem field a further 66 bytes on.
This results in the two signatures:
- 32-bit: 4D5A{126-128500}50450000{20}0B01{66}[0000:1000]
- 64-bit: 4D5A{126-128500}50450000{20}0B02{66}[0000:1000]
I haven't created a ROM image signature as I haven't an ability to test these.
Subsystem is a WORD and in the documentation there are 16 values, 0x00 to 0x16, I opted for sequence syntax to use the extended signature purely to strengthen the identification, not to provide additional characterization.
Attached is a sample signature file, there are a number of different ways to code the signature in PRONOM including multiple byte sequences, or a single BOF.
A strengthening of the original PE signature is the removal of the wildcard:
4D5A*50450000
The following two charts show analysis of the location of the second byte sequence in portable executable files based on 1690 *.exe files (no filename duplicates):
The smallest offset is 126 bytes, the largest in the chart here is 1174. Importantly there is a single outlier not shown here but represented in the signature of 128342 bytes. There seems to be no correlation between file size (bytes) and offset location.
Perhaps if this signature is not accepted this analysis can be used to remove the (*) wildcard in the existing signature if considered to be of benefit.
Just FYI I created the tool here to create the analysis values: https://github.com/ross-spencer/bindist/releases/tag/v1.0.0-beta
I have reported an issue that seems to exist in DROID 6.1.5 which seems to be returning a small percentage of multiple identification results here: https://github.com/digital-preservation/droid/issues/85
The results of the signature including these multiple IDs are:
- 15 multiple-ids
- 0 zero-id
- 933 64-bit portable exes
- 742 32-bit portable exes
As noted in the issue above, some of the 15 results above seem to be due to a issue in DROID's processing. From the 15 I have, it does not seem to be the cause of a potential overlap between {126-128500} specified in the signature.
The questions for TNA and the list then are:
- Do we differentiate between 32-bit and 64-bit like we do with Linux ELF files?
- Do we use any of the other information available to us to create more granular signatures than these still?
Comments, and other's testing results appreciated.
Cheers,
Ross