Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Checksum with no creation date

1,169 views
Skip to first unread message

David

unread,
Feb 24, 2011, 6:45:37 AM2/24/11
to
Hi
I want to implement a jar comparation files application using a
checksum. Two jar files should be the same if both have the same
content (although the files have different date of creation).
Comparing bytes files takes too many time and I haven't assurance
comparing only files size. So I must use a checksum algorithm.
I have tried whith CRC32 and MD5 checksums but they use the creation
file date for make the calculation. Dou you know any checksum
algorithm for do it? Or may be there's another way for make the
comparation.

Thanks
David

Roedy Green

unread,
Feb 24, 2011, 6:50:42 AM2/24/11
to
On Thu, 24 Feb 2011 03:45:37 -0800 (PST), David <dav....@gmail.com>
wrote, quoted or indirectly quoted someone who said :

you might cannibalise the code from UNtouch which does the same thing
for standalone files. Adler is quick. Combine that with size and the
odds of a false equal are pretty slim. If dates are equal and sizes
are equal you can trust a match without computing a checksum.
see http://mindprod.com/products1.html#UNTOUCH


Jarsigner is slow, but it leaves a checksum for you in a manifest than
you can test without computing anything yourself. It is cryptographic
quality.


--
Roedy Green Canadian Mind Products
http://mindprod.com
Refactor early. If you procrastinate, you will have
even more code to adjust based on the faulty design.
.

Lew

unread,
Feb 24, 2011, 7:09:53 AM2/24/11
to
On 02/24/2011 06:45 AM, David wrote:
> I want to implement a jar comparation files application using a
> checksum. Two jar files should be the same if both have the same
> content (although the files have different date of creation).
> Comparing bytes files takes too many time and I haven't assurance
> comparing only files size. So I must use a checksum algorithm.
> I have tried whith CRC32 and MD5 checksums but they use the creation
> file date for make the calculation. Dou you know any checksum

Huh? They do not!

> algorithm for do it? Or may be there's another way for make the
> comparation.

You do realize that checksum calculations examine every byte in the file,
right? So you are misinformed on two counts: these algorithms don't use file
dates, nor even rely on there being a file in the first place; and you don't
avoid examining every byte of input in the calculations.

http://en.wikipedia.org/wiki/MD5
http://en.wikipedia.org/wiki/CRC32

So, what now?

--
Lew
Honi soit qui mal y pense.

Message has been deleted

David

unread,
Feb 24, 2011, 8:25:17 AM2/24/11
to
I have tried this test:
- Generate a jar file from some java sources (with Netbeans)
- Using Apache FileUtils class I generate a checksum value
(FileUtils.checksumCRC32())
- Generate again the jar file without any change in java files)
- If I execute again FileUtils.checksumCRC32 the value is different.

That's why I think that CRC32 depends on date.

David

David

unread,
Feb 24, 2011, 10:56:41 AM2/24/11
to
I answer myself. It seems that class files save a timestamp data. So,
although the source code don't change, the jar file generated have
different data and different checksum.
In this post talks about that problem:

http://www.velocityreviews.com/forums/t150783-creating-new-jar-same-code-different-md5.html

David

Lew

unread,
Feb 24, 2011, 11:00:14 AM2/24/11
to
David wrote:
> I have tried this test:
> - Generate a jar [sic] file from some java [sic] sources (with Netbeans [sic])

> - Using Apache FileUtils class I generate a checksum value
> (FileUtils.checksumCRC32())
> - Generate again the jar [sic] file without any change in java [sic] files)

> - If I execute again FileUtils.checksumCRC32 the value is different.
>
> That's why I think that CRC32 depends on date.
>

Jukka explained that. The hash/checksum just uses the file contents;
it's the JAR itself that's different.

I thought you meant the JAR file date, which is not involved in the
calculation. Nor, actually, are the file dates as such - it's their
representation in the JAR file that changes the result.

--
Lew

Lew

unread,
Feb 24, 2011, 3:05:27 PM2/24/11
to
> http://www.velocityreviews.com/forums/t150783-creating-new-jar-same-c...
>

It's not a "problem", it's a feature.

The ZIP file format has very good reasons for maintaining file dates
and other metadata. To lack that data would be a problem. You only
call it a "problem" because you're trying to misuse the format.

--
Lew

Roedy Green

unread,
Feb 24, 2011, 11:28:43 PM2/24/11
to
On Thu, 24 Feb 2011 14:33:29 +0200, Jukka Lahtinen
<jtfj...@hotmail.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>I think he also meant the dates of the files contained in the jar, not
>just the dates of the jars.

IF you want to exclude information from the jar from your checksum,
such as dates, you can't very well treat the jar as a single blob. You
must look at each member individually. You could do the checksum on
the compressed or uncompressed member. Computing a checksum on the
compressed would be faster since there are fewer bytes. I am not sure
if there is a way look at the compressed members unless you parse the
index at the end of the file yourself.

I could write this to your spec for a fee.

Roedy Green

unread,
Feb 24, 2011, 11:43:21 PM2/24/11
to
On Thu, 24 Feb 2011 05:25:17 -0800 (PST), David <dav....@gmail.com>

wrote, quoted or indirectly quoted someone who said :

>That's why I think that CRC32 depends on date.

see http://mindprod.com/jgloss/crc.html

The CRC algorithm implemented by java.util.zip.CRC32 is just a way of
computing a checksum on a set of bytes. The
org.apache.commons.io.FileUtils.checksumCRC32
method you found computes a CRC on the entire contents of a file,
which in your case is a jar and contains dates and sizes. I strongly
doubt it is including the file date and size. You can see if I am
correct by trying it on a simple non-jar file, change the file date
and see if the checksum changes.

Roedy Green

unread,
Feb 25, 2011, 12:12:19 AM2/25/11
to
On Thu, 24 Feb 2011 12:05:27 -0800 (PST), Lew <l...@lewscanon.com>

wrote, quoted or indirectly quoted someone who said :

>


>The ZIP file format has very good reasons for maintaining file dates

The advantage of embedding the compiled date in the class file is that
you can tell for example if the a class file were compiled with the
old or new version of JavaC. Normally it does not matter, but if there
were a bug in Javac or a new optimisation, it might.

Further, it is necessary to do a clean compile to propagate new values
of public of package static finals. You want a way to make sure
EVERYTHING got recompiled.

I wrote a program called JarCheck which does a similar check to make
sure every module was compiled with the proper JDK level target. It
is amazing how often it gets out of whack. see
http://mindprod.com/products1.html#JARCHECK
You might cannibalise some of the logic in it for your program.

It sounds like you would need a checksum program than understood the
class file format, and optionally skipped that embedded date. There
are also embedded timestamps in the zip/jar format not embedded in the
class file members. See
http://mindprod.com/jgloss/classfileformat.html
http://mindprod.com/jgloss/zip.html
http://mindprod.com/jgloss/jar.html

The game may not be worth the candle. Are you using ANT? Builds are
is so much quicker than doing it with bat files. It loads JavaC only
once. Many years ago Jonathan Revusky invented a primitive sort of
ANT. It speeded things up 100 fold over the Linux make we were using.

See http://mindprod.com/jgloss/ant.html

Message has been deleted

David

unread,
Feb 25, 2011, 4:32:45 AM2/25/11
to
Thanks for your answers.
As Lew said, Jukka was right. I have got the CRC value from a class
file, compiled again without modifications and got the CRC again. The
values are the same so, it's the jar file and not the class file which
has the date information.
I need the jar files comparation for the implementation of an
application update. I have several jar files and an xml containing the
CRC value of each file in a server. The client application compare
remote and local xml files and decide which jar must download.
Each time that have some modification all jars are compiled and upload
to the server.
I could use a version number instead a CRC in the xml file, and change
the value manually every time that a jar have a change. But I'd prefer
make something automatic, that's why I thougth that jar CRC was a good
option. I will must change the strategy.
Thanks for the links Roedy. I'll check out.

David

Lew

unread,
Feb 25, 2011, 7:59:36 AM2/25/11
to
On 02/25/2011 04:32 AM, David wrote:
> I need the jar [sic] files [sic] comparation [sic] for the implementation of an

I know this is off topic, but "comparation"? I believe the word you intend is
"comparison".

Lew

unread,
Feb 25, 2011, 8:07:54 AM2/25/11
to
Roedy Green wrote:
> The advantage of embedding the compiled date in the class file is that
> you can tell for example if the a class file were compiled with the
> old or new version of JavaC [sic]. Normally it does not matter, but if there

As Jukka said, that is not true.

> were a bug in Javac or a new optimisation, it might.
>
> Further, it is necessary to do a clean compile to propagate new values
> of public of package static finals. You want a way to make sure
> EVERYTHING got recompiled.

That's only true if they're compile-time constants, but not generally.

> I wrote a program called JarCheck which does a similar check to make
> sure every module was compiled with the proper JDK level target. It

There's a standard way to do that, and it has nothing to do with dates. Dates
wouldn't work, as Jukka explained. The class version, OTOH, is reliable and
is in there for just that very purpose.

> is amazing how often it gets out of whack. see
> http://mindprod.com/products1.html#JARCHECK
> You might cannibalise some of the logic in it for your program.
>
> It sounds like you would need a checksum program than understood the
> class file format, and optionally skipped that embedded date. There

What embedded date? There's no embedded date in the classfile format! As
this thread had already established.

ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}

> are also embedded timestamps in the zip/jar format not embedded in the
> class file members. See
> http://mindprod.com/jgloss/classfileformat.html
> http://mindprod.com/jgloss/zip.html
> http://mindprod.com/jgloss/jar.html
>
> The game may not be worth the candle. Are you using ANT? Builds are
> is so much quicker than doing it with bat files. It loads JavaC only
> once. Many years ago Jonathan Revusky invented a primitive sort of
> ANT. It speeded things up 100 fold over the Linux make we were using.
>
> See http://mindprod.com/jgloss/ant.html

--

Lew

unread,
Feb 25, 2011, 8:10:35 AM2/25/11
to
On 02/24/2011 11:28 PM, Roedy Green wrote:
> On Thu, 24 Feb 2011 14:33:29 +0200, Jukka Lahtinen
> <jtfj...@hotmail.com.invalid> wrote, quoted or indirectly quoted
> someone who said :
>
>> I think he also meant the dates of the files contained in the jar, not
>> just the dates of the jars.
>
> IF you want to exclude information from the jar from your checksum,
> such as dates, you can't very well treat the jar as a single blob. You
> must look at each member individually. You could do the checksum on
> the compressed or uncompressed member. Computing a checksum on the
> compressed would be faster since there are fewer bytes. I am not sure
> if there is a way look at the compressed members unless you parse the
> index at the end of the file yourself.
>
> I could write this to your spec for a fee.

Does your fee discount for the fact that you misinformed the OP about the
facts involved?

Daniel Pitts

unread,
Feb 26, 2011, 1:43:58 PM2/26/11
to
MD5 and CRC32 should be hashes of data only. In theory, they shouldn't
even /care/ that they are working on a file which has metadata. Perhaps
you're using them incorrectly?

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Roedy Green

unread,
Feb 26, 2011, 2:50:37 PM2/26/11
to
On Fri, 25 Feb 2011 10:30:24 +0200, Jukka Lahtinen
<jtfj...@hotmail.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>If a .class file has a new date, that doesn't ensure that it was compiled
>on the newest javac version.
>It may have been compiled on any version released before the compilation
>date.

If you know when you installed the new compiler, you can tell.

The major and minor class file versions can also give you a clue.

Lew

unread,
Feb 26, 2011, 3:39:41 PM2/26/11
to

He was encountering a difference in data, not metadata, because he was hashing
the JAR file. The JAR file includes file dates in its data. It was the
representation of those dates that differed between JARs, ergo the JARs were
not the same, ergo their hashes likely differed, and in the OP's particular
case, actually did.

Had the OP compared the file-by-file hashes of the files that were copied into
the JAR, they would have had the kind of comparison they wanted.

Message has been deleted

Lew

unread,
Feb 27, 2011, 9:13:47 AM2/27/11
to
Jukka Lahtinen wrote:

> Roedy Green writes:
>> The major and minor class file versions can also give you a clue.


> Better and more relevant than the file timestamp, anyway.
> Still, the .class may have been compiled using a newer compiler with the
> -target option, but I believe that doesn't matter in most cases.

That would set the class-file version appropriately. Nothing to fear there.

What do you care that the code was compiled in June, save that you know it was
Java 5?

0 new messages