Expected vs. observed performance of java.util.zip.CRC32 in Java 7 and 8

504 views
Skip to first unread message

Ariel Weisberg

unread,
Jan 14, 2015, 6:00:51 PM1/14/15
to mechanica...@googlegroups.com
I heard that CRC32 became an intrinsic in Java 8 that uses hardware support if available. I tried it out, but got some odd numbers and I am having trouble nailing down the difference between Java 7 and 8.

I saw some mailing list traffic mentioning that Adler32 got the intrinsic treatment as well, but looking at the source I don't see any mention in C1. Maybe it is just non-obvious?

I compared the JDK CRC32 and Adler32 with a pure java slicing by 8 implementation. I was not expecting to see 13 gigabytes/second per core. Certainly not in Java 7 or 8 without the hardware support. I also was not expecting to see the performance so close for small sizes, but so far for large sizes.

The parameter is the number of bytes being checksummed. JMH code is at http://pastebin.com/y96EFwcL I ran on a Haswell quad-core macbook pro. I can tell the correct JDK is running by having setup print the java.version property.

jdk1.7.0_71

     [java] Benchmark                              (byteSize)   Mode  Samples         Score         Error  Units


     [java] o.a.c.t.m.Sample.Adler32Array                 128  thrpt        6   9484470.705 ± 3544496.362  ops/s


     [java] o.a.c.t.m.Sample.Adler32Array                 512  thrpt        6   7553107.572 ± 2017788.822  ops/s


     [java] o.a.c.t.m.Sample.Adler32Array                1024  thrpt        6   5925103.324 ± 1237581.263  ops/s


     [java] o.a.c.t.m.Sample.Adler32Array             1048576  thrpt        6     12958.857 ±     313.405  ops/s


     [java] o.a.c.t.m.Sample.CRC32OriginalArray           128  thrpt        6   8675457.907 ± 2920818.797  ops/s


     [java] o.a.c.t.m.Sample.CRC32OriginalArray           512  thrpt        6   6906837.280 ±  949573.737  ops/s


     [java] o.a.c.t.m.Sample.CRC32OriginalArray          1024  thrpt        6   5537421.656 ±  658220.086  ops/s


     [java] o.a.c.t.m.Sample.CRC32OriginalArray       1048576  thrpt        6     13103.481 ±     400.833  ops/s


     [java] o.a.c.t.m.Sample.PureJavaCrc32Array           128  thrpt        6  11013067.959 ±  688587.910  ops/s


     [java] o.a.c.t.m.Sample.PureJavaCrc32Array           512  thrpt        6   2991944.703 ±   72920.216  ops/s


     [java] o.a.c.t.m.Sample.PureJavaCrc32Array          1024  thrpt        6   1516586.386 ±   68061.147  ops/s


     [java] o.a.c.t.m.Sample.PureJavaCrc32Array       1048576  thrpt        6      1483.413 ±      88.246  ops/s





jdk1.8.0_25

     [java] Benchmark                              (byteSize)   Mode  Samples         Score          Error  Units


     [java] o.a.c.t.m.Sample.Adler32Array                 128  thrpt        6   9216616.134 ±  4037583.644  ops/s


     [java] o.a.c.t.m.Sample.Adler32Array                 512  thrpt        6   7470492.783 ±  2059216.459  ops/s


     [java] o.a.c.t.m.Sample.Adler32Array                1024  thrpt        6   5792188.710 ±  1363845.066  ops/s


     [java] o.a.c.t.m.Sample.Adler32Array             1048576  thrpt        6     12273.582 ±     1077.855  ops/s


     [java] o.a.c.t.m.Sample.CRC32OriginalArray           128  thrpt        6  25619960.160 ± 34966219.926  ops/s


     [java] o.a.c.t.m.Sample.CRC32OriginalArray           512  thrpt        6  13858414.869 ±  9113069.866  ops/s


     [java] o.a.c.t.m.Sample.CRC32OriginalArray          1024  thrpt        6   8781575.118 ±  3524581.216  ops/s


     [java] o.a.c.t.m.Sample.CRC32OriginalArray       1048576  thrpt        6     12631.141 ±     1340.355  ops/s


     [java] o.a.c.t.m.Sample.PureJavaCrc32Array           128  thrpt        6  10772800.588 ±   398939.396  ops/s


     [java] o.a.c.t.m.Sample.PureJavaCrc32Array           512  thrpt        6   2917829.545 ±    96841.740  ops/s


     [java] o.a.c.t.m.Sample.PureJavaCrc32Array          1024  thrpt        6   1541478.127 ±    27234.072  ops/s


     [java] o.a.c.t.m.Sample.PureJavaCrc32Array       1048576  thrpt        6      1526.471 ±       27.515  ops/s



jdk1.8.0_25 with -XX:-UseCLMUL -XX:-UseCRC32Intrinsics

     [java] Benchmark                              (byteSize)   Mode  Samples         Score         Error  Units


     [java] o.a.c.t.m.Sample.Adler32Array                 128  thrpt        6   9469461.434 ± 3770031.916  ops/s


     [java] o.a.c.t.m.Sample.Adler32Array                 512  thrpt        6   7675500.241 ± 2044377.592  ops/s


     [java] o.a.c.t.m.Sample.Adler32Array                1024  thrpt        6   5693849.558 ± 1477732.642  ops/s


     [java] o.a.c.t.m.Sample.Adler32Array             1048576  thrpt        6     13110.626 ±     386.136  ops/s


     [java] o.a.c.t.m.Sample.CRC32OriginalArray           128  thrpt        6   8850789.721 ± 3204385.552  ops/s


     [java] o.a.c.t.m.Sample.CRC32OriginalArray           512  thrpt        6   7118956.038 ± 1004576.517  ops/s


     [java] o.a.c.t.m.Sample.CRC32OriginalArray          1024  thrpt        6   5590757.571 ±  448188.497  ops/s


     [java] o.a.c.t.m.Sample.CRC32OriginalArray       1048576  thrpt        6     13270.839 ±     598.086  ops/s


     [java] o.a.c.t.m.Sample.PureJavaCrc32Array           128  thrpt        6  11351558.878 ±  546114.618  ops/s


     [java] o.a.c.t.m.Sample.PureJavaCrc32Array           512  thrpt        6   3011605.270 ±  217000.560  ops/s


     [java] o.a.c.t.m.Sample.PureJavaCrc32Array          1024  thrpt        6   1529659.010 ±   70247.359  ops/s


     [java] o.a.c.t.m.Sample.PureJavaCrc32Array       1048576  thrpt        6      1466.362 ±     211.293  ops/s




Ariel Weisberg

unread,
Jan 25, 2015, 1:28:33 PM1/25/15
to mechanica...@googlegroups.com
Hi,

Well to wrap this up for me at least. I tested with a very simple benchmark on Linux and I see the expected behavior.

JDK 8  CRC32 is fast, but slows down with -XX:-UseCLMUL -XX:-UseCRC32Intrinsics.
JDK 7 both implementations perform similarly

The JDK 8 non-intrinsic may have gotten slower. I didn't repeat enough to confirm.

I repeated the same test on OS X and -XX:-UseCLMUL -XX:-UseCRC32Intrinsics have no effect on JDK 8. The performance remains high. JDK 7  had the same high performance as JDK 8 and this is with me invoking javac/java from the CLI with an absolute path. The benchmark also prints out the java.version property which confirms that JDK 7 is really going that fast.

Regards,
Ariel
...
Reply all
Reply to author
Forward
0 new messages