Thanks
David
you might cannibalise the code from UNtouch which does the same thing
for standalone files. Adler is quick. Combine that with size and the
odds of a false equal are pretty slim. If dates are equal and sizes
are equal you can trust a match without computing a checksum.
see http://mindprod.com/products1.html#UNTOUCH
Jarsigner is slow, but it leaves a checksum for you in a manifest than
you can test without computing anything yourself. It is cryptographic
quality.
--
Roedy Green Canadian Mind Products
http://mindprod.com
Refactor early. If you procrastinate, you will have
even more code to adjust based on the faulty design.
.
Huh? They do not!
> algorithm for do it? Or may be there's another way for make the
> comparation.
You do realize that checksum calculations examine every byte in the file,
right? So you are misinformed on two counts: these algorithms don't use file
dates, nor even rely on there being a file in the first place; and you don't
avoid examining every byte of input in the calculations.
http://en.wikipedia.org/wiki/MD5
http://en.wikipedia.org/wiki/CRC32
So, what now?
--
Lew
Honi soit qui mal y pense.
That's why I think that CRC32 depends on date.
David
http://www.velocityreviews.com/forums/t150783-creating-new-jar-same-code-different-md5.html
David
Jukka explained that. The hash/checksum just uses the file contents;
it's the JAR itself that's different.
I thought you meant the JAR file date, which is not involved in the
calculation. Nor, actually, are the file dates as such - it's their
representation in the JAR file that changes the result.
--
Lew
It's not a "problem", it's a feature.
The ZIP file format has very good reasons for maintaining file dates
and other metadata. To lack that data would be a problem. You only
call it a "problem" because you're trying to misuse the format.
--
Lew
>I think he also meant the dates of the files contained in the jar, not
>just the dates of the jars.
IF you want to exclude information from the jar from your checksum,
such as dates, you can't very well treat the jar as a single blob. You
must look at each member individually. You could do the checksum on
the compressed or uncompressed member. Computing a checksum on the
compressed would be faster since there are fewer bytes. I am not sure
if there is a way look at the compressed members unless you parse the
index at the end of the file yourself.
I could write this to your spec for a fee.
>That's why I think that CRC32 depends on date.
see http://mindprod.com/jgloss/crc.html
The CRC algorithm implemented by java.util.zip.CRC32 is just a way of
computing a checksum on a set of bytes. The
org.apache.commons.io.FileUtils.checksumCRC32
method you found computes a CRC on the entire contents of a file,
which in your case is a jar and contains dates and sizes. I strongly
doubt it is including the file date and size. You can see if I am
correct by trying it on a simple non-jar file, change the file date
and see if the checksum changes.
>
>The ZIP file format has very good reasons for maintaining file dates
The advantage of embedding the compiled date in the class file is that
you can tell for example if the a class file were compiled with the
old or new version of JavaC. Normally it does not matter, but if there
were a bug in Javac or a new optimisation, it might.
Further, it is necessary to do a clean compile to propagate new values
of public of package static finals. You want a way to make sure
EVERYTHING got recompiled.
I wrote a program called JarCheck which does a similar check to make
sure every module was compiled with the proper JDK level target. It
is amazing how often it gets out of whack. see
http://mindprod.com/products1.html#JARCHECK
You might cannibalise some of the logic in it for your program.
It sounds like you would need a checksum program than understood the
class file format, and optionally skipped that embedded date. There
are also embedded timestamps in the zip/jar format not embedded in the
class file members. See
http://mindprod.com/jgloss/classfileformat.html
http://mindprod.com/jgloss/zip.html
http://mindprod.com/jgloss/jar.html
The game may not be worth the candle. Are you using ANT? Builds are
is so much quicker than doing it with bat files. It loads JavaC only
once. Many years ago Jonathan Revusky invented a primitive sort of
ANT. It speeded things up 100 fold over the Linux make we were using.
See http://mindprod.com/jgloss/ant.html
David
I know this is off topic, but "comparation"? I believe the word you intend is
"comparison".
As Jukka said, that is not true.
> were a bug in Javac or a new optimisation, it might.
>
> Further, it is necessary to do a clean compile to propagate new values
> of public of package static finals. You want a way to make sure
> EVERYTHING got recompiled.
That's only true if they're compile-time constants, but not generally.
> I wrote a program called JarCheck which does a similar check to make
> sure every module was compiled with the proper JDK level target. It
There's a standard way to do that, and it has nothing to do with dates. Dates
wouldn't work, as Jukka explained. The class version, OTOH, is reliable and
is in there for just that very purpose.
> is amazing how often it gets out of whack. see
> http://mindprod.com/products1.html#JARCHECK
> You might cannibalise some of the logic in it for your program.
>
> It sounds like you would need a checksum program than understood the
> class file format, and optionally skipped that embedded date. There
What embedded date? There's no embedded date in the classfile format! As
this thread had already established.
ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}
> are also embedded timestamps in the zip/jar format not embedded in the
> class file members. See
> http://mindprod.com/jgloss/classfileformat.html
> http://mindprod.com/jgloss/zip.html
> http://mindprod.com/jgloss/jar.html
>
> The game may not be worth the candle. Are you using ANT? Builds are
> is so much quicker than doing it with bat files. It loads JavaC only
> once. Many years ago Jonathan Revusky invented a primitive sort of
> ANT. It speeded things up 100 fold over the Linux make we were using.
>
> See http://mindprod.com/jgloss/ant.html
--
Does your fee discount for the fact that you misinformed the OP about the
facts involved?
--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
>If a .class file has a new date, that doesn't ensure that it was compiled
>on the newest javac version.
>It may have been compiled on any version released before the compilation
>date.
If you know when you installed the new compiler, you can tell.
The major and minor class file versions can also give you a clue.
He was encountering a difference in data, not metadata, because he was hashing
the JAR file. The JAR file includes file dates in its data. It was the
representation of those dates that differed between JARs, ergo the JARs were
not the same, ergo their hashes likely differed, and in the OP's particular
case, actually did.
Had the OP compared the file-by-file hashes of the files that were copied into
the JAR, they would have had the kind of comparison they wanted.
> Better and more relevant than the file timestamp, anyway.
> Still, the .class may have been compiled using a newer compiler with the
> -target option, but I believe that doesn't matter in most cases.
That would set the class-file version appropriately. Nothing to fear there.
What do you care that the code was compiled in June, save that you know it was
Java 5?