Suggestion: Add a GetAssemblyHash() method

587 views
Skip to first unread message

Florian

unread,
Jun 18, 2011, 7:08:34 AM6/18/11
to mono-cecil, sta...@gmx.de
Hi there,

I am trying to encounter wether two assemblies have actually the same
content (i.e. being build from the same sources).

My first attempt was to create a MD5 hash on the assembly file content
and compare that MD5 hash to the hash of another assembly. This proved
to be not working because there are some fields within the assembly
file format that vary on every build, even if built from the exact
same sources.
One example of these fields is the MVID (ModuleVersionID) - see also:
http://stackoverflow.com/questions/2940857/determine-whether-net-assemblies-were-built-from-the-same-source.
The StackOverflow post says that with .NET 4 there are now 3 different
fields that vary on each build.

I really need to find a fast way to compare if the data that I cached
away from an assembly needs to be updated (because the assembly
content has changed - i.e. it has been built from changed sources).
So would it be possible to create a GetAssemblyHash() method in the
AssemblyDefinition class that basically takes the file content
bytearray, blanks out those fields that vary on each build and creates
a md5 hash on it?

I would propably give it a go on my own but I guess my knowledge about
the assembly file format is just not enough to make a stable solution.

If it helps, here is how I did the Md5 Hashing:

var md5 = new MD5CryptoServiceProvider();
var result = md5.ComputeHash(byteArray);
var sb = new StringBuilder();
for (var i = 0; i < result.Length; i++)
sb.Append(result[i].ToString("X2"));
return sb.ToString();

Thanks a lot for any help, its really appreciated.

Best regards

Florian

Alex

unread,
Jun 18, 2011, 7:50:16 AM6/18/11
to mono-...@googlegroups.com, sta...@gmx.de
Hi,

Just use asmDef.Name.Hash[Algorithm]?

Regards,
Alex

2011/6/18 Florian <sta...@gmx.de>:

> --
> --
> mono-cecil

Florian

unread,
Jun 18, 2011, 9:57:36 AM6/18/11
to mono-cecil
Hi Alex,

thanks for the pointer, I didnt know that one. But I am afraid it
doesnt help in my case.

The Hash property of AssemblyNameDefinition always returns an empty
bytearray.

Does anyone know if

MVID
Timestamp
Checksum (in .NET 4 Assemblies)

have fixed positions within the bytearray of a assembly? I would
probably go and write a own hashalgorithm on my own then.

Thanks
Florian

On 18 Jun., 13:50, Alex <xtzgzo...@gmail.com> wrote:
> Hi,
>
> Just use asmDef.Name.Hash[Algorithm]?
>
> Regards,
> Alex
>
> 2011/6/18 Florian <sta...@gmx.de>:
>
>
>
> > Hi there,
>
> > I am trying to encounter wether two assemblies have actually the same
> > content (i.e. being build from the same sources).
>
> > My first attempt was to create a MD5 hash on the assembly file content
> > and compare that MD5 hash to the hash of another assembly. This proved
> > to be not working because there are some fields within the assembly
> > file format that vary on every build, even if built from the exact
> > same sources.
> > One example of these fields is the MVID (ModuleVersionID) - see also:
> >http://stackoverflow.com/questions/2940857/determine-whether-net-asse....
> > The StackOverflow post says that with .NET 4 there are now 3 different
> > fields that vary on each build.
>
> > I really need to find a fast way to compare if the data that I cached
> > away from an assembly needs to be updated (because the assembly
> > content has changed - i.e. it has been built from changed sources).
> > So would it be possible to create a GetAssemblyHash() method in the
> > AssemblyDefinition class that basically takes the file content
> > bytearray, blanks out those fields that vary on each build and creates
> > a md5 hash on it?
>
> > I would propably give it a go on my own but I guess my knowledge about
> > the assembly file format is just not enough to make a stable solution.
>
> > If it helps, here is how I did the Md5 Hashing:
>
> >            var md5 = new MD5CryptoServiceProvider();
> >            var result = md5.ComputeHash(byteArray);
> >            var sb = new StringBuilder();
> >            for (var i = 0; i < result.Length; i++)
> >                sb.Append(result[i].ToString("X2"));
> >            return sb.ToString();
>
> > Thanks a lot for any help, its really appreciated.
>
> > Best regards
>
> > Florian
>
> > --
> > --
> > mono-cecil- Zitierten Text ausblenden -
>
> - Zitierten Text anzeigen -

Florian

unread,
Jun 18, 2011, 4:30:54 PM6/18/11
to mono-cecil
Well as it seems, the locations of TimeDateStamp and FileChecksum are
quite easy to find. Their positions can be found in the DOSHeader and
in PEOptionalHeader, the first two headers read from a assembly file
(looked the code up in Mono.Cecil.PE.ImageReader.ReadImage()).
Unfortunately locating the position of MVID is more complicated.

I am not quite sure what to do. Changing the sourcecode of ImageReader
on my local machine to save the byte positions of those three fields
would be the easiest thing to do. This way I could later on build a
file hash that ignores these fields and reuse the filehash on
different dlls / exes built on the same sources to get them compared.

But of course every time the next Mono.Cecil version is released I
would have to do the same changes again and again.

@JB: Would you bother including such a hashing method in the
Mono.Cecil sources if I can provide you a proper implementation that
meets your quality expectations? I guess the requirement I have there
would also be present for other people, for example for working with
continuous integration build systems.

Best Regards Florian
> > - Zitierten Text anzeigen -- Zitierten Text ausblenden -

Greg Young

unread,
Jun 18, 2011, 5:07:54 PM6/18/11
to mono-...@googlegroups.com

I could use this code as well

On 18 Jun 2011 16:30, "Florian" <sta...@gmx.de> wrote:
Well as it seems, the locations of TimeDateStamp and FileChecksum are
quite easy to find. Their positions can be found in the DOSHeader and
in PEOptionalHeader, the first two headers read from a assembly file
(looked the code up in Mono.Cecil.PE.ImageReader.ReadImage()).
Unfortunately locating the position of MVID is more complicated.

I am not quite sure what to do. Changing the sourcecode of ImageReader
on my local machine to save the byte positions of those three fields
would be the easiest thing to do. This way I could later on build a
file hash that ignores these fields and reuse the filehash on
different dlls / exes built on the same sources to get them compared.

But of course every time the next Mono.Cecil version is released I
would have to do the same changes again and again.

@JB: Would you bother including such a hashing method in the
Mono.Cecil sources if I can provide you a proper implementation that
meets your quality expectations? I guess the requirement I have there
would also be present for other people, for example for working with
continuous integration build systems.

Best Regards Florian



On 18 Jun., 15:57, Florian <sta...@gmx.de> wrote:
> Hi Alex,
>

> thanks for the pointer, I didnt k...

> > - Zitierten Text anzeigen -- Zitierten Text ausblenden -

>
> - Zitierten Text anzeigen -

--
--
mono-cecil

Florian

unread,
Jun 18, 2011, 5:34:20 PM6/18/11
to mono-cecil
Hi everybody,

well I got it working - thanks to JBs for commenting the field
positions in ImageReader.
In case anybody needs the same really urgently it may be helpfull so I
am posting it here:

Addtional code in class ModuleDefinition:

public int TimeDateStampPosition { get { return
Image.TimeDateStampPosition; } }
public int FileChecksumPosition { get { return
Image.FileChecksumPosition; } }

public string GetBuildIndependentHash()
{
var byteArray = File.ReadAllBytes(Image.FileName);

//overwrite parts of the assembly file content byte array
//(only in memory of course) that change on every build
with
//an empty bytearray and thus make it compareable
var guidSection = Image.GuidHeap.Section;
var empty = new byte[guidSection.SizeOfRawData];
Buffer.BlockCopy(empty, 0, byteArray,
TimeDateStampPosition, 4);
Buffer.BlockCopy(empty, 0, byteArray,
FileChecksumPosition, 4);
//MVID
Buffer.BlockCopy(empty, 0, byteArray,
(int) guidSection.PointerToRawData, (int)
guidSection.SizeOfRawData);

var md5 = new MD5CryptoServiceProvider();
var result = md5.ComputeHash(byteArray);

var sb = new StringBuilder();
for (var i = 0; i < result.Length; i++)
sb.Append(result[i].ToString("X2"));
return sb.ToString();
}

The int field TimeDateStampPosition was added to the Image class and
filled in the ReadImage() method of ImageReader (right above the //
TimeDateStamp comment) using the following line:
image.TimeDateStampPosition = (int) BaseStream.Position;

Also the int field FileChecksumPosition was added to the Image class
and filled in method ReadOptionalHeaders() of class ImageReader. The
line stands under under the Advance(66); statement (which follows
the // FileChecksum comment) and looks like this:
image.FileChecksumPosition = (int) (BaseStream.Position-4);

For my requirements this works good enough but I would of course be
interested about your thoughts about this, JB.

Best regards

Florian

Gábor Kozár

unread,
Jun 18, 2011, 5:39:40 PM6/18/11
to mono-...@googlegroups.com
Well done! :)

Just as a side note, you could use BitConverter.ToString() when stringifying the md5 hash instead of that ugly loop using StringBuilder.

2011/6/18 Florian <sta...@gmx.de>
--
--
mono-cecil

Florian

unread,
Jun 18, 2011, 5:50:37 PM6/18/11
to mono-cecil
@Gábor: Thanks, I didnt know that one, thats nice and elegant.
Everyday there's something new to learn in programming, thats cool.
Edited the example to do it this way, and removed the two public
properties from moduledefinition since they dont make much sense.


public string GetBuildIndependentHash()
{
var byteArray = File.ReadAllBytes(Image.FileName);

//overwrite parts of the assembly file content byte array
//(only in memory of course) that change on every build
//with an empty bytearray and thus make it compareable
var guidSection = Image.GuidHeap.Section;
var empty = new byte[guidSection.SizeOfRawData];
Buffer.BlockCopy(empty, 0, byteArray,
Image.TimeDateStampPosition, 4);
Buffer.BlockCopy(empty, 0, byteArray,
Image.FileChecksumPosition, 4);
//MVID
Buffer.BlockCopy(empty, 0, byteArray,
(int) guidSection.PointerToRawData, (int)
guidSection.SizeOfRawData);

//Create Md5 Hash
var md5 = new MD5CryptoServiceProvider();
var result = md5.ComputeHash(byteArray);

//Stringify hash for human readability
var hashString = BitConverter.ToString(result);
return hashString;
}
Well, anyone else using this might propably prefer returning the hash
using a byte[] instead of the string, would make it more low level and
faster and stuff... I just did it this way because its how it fits my
needs best at the moment.

Best regars

Florian
> > mono-cecil- Zitierten Text ausblenden -
Reply all
Reply to author
Forward
0 new messages