Fastest way to parse data out of array of bytes?

Alan Kent

unread,

Jul 21, 2010, 10:18:01 AM7/21/10

to java...@googlegroups.com

I was wondering what the fastest way (most highly performant) was to
parse data structures serialized as an array of bytes. In my case its
like a network packet (a true array of bytes where I need to peel of 1,
2, 4, and 8 byte integers, or variable length ASCII (8-bit) strings, etc.)

Note I am after the FASTEST way to do this in Java (and/or Scala). Is
it better to use the stream based classes, or is it better to do direct
array accesses and do bit shift operations and masks with 0xff etc (to
strip sign extension Java will do otherwise)? I suspect the stream
based approaches would be slower. Sample code that sneaks in:

byte b1 = buf[i++];
byte b2 = buf[i++];
byte b3 = buf[i++];
byte b4 = buf[i++];
int n = (b1 << 24) | ((b2 & 0xff) << 16) | ((b3 & 0xff) << 8) | (b4
& 0xff); // Must mask with 0xff or else sign extension will mess up the
result. Java does not have unsigned bytes or ints!

I was looking into array bounds checks, and what I found via Google
indicated that hotspot leaves in array bounds checks as there was only a
minor performance improvement found in practice. This lead me to wonder
if there is a faster way to do the code since I would be doing lots of
array accesses, each with a bounds check.

Just curious!

Thanks!
Alan

Steven Siebert

unread,

Jul 21, 2010, 11:42:32 AM7/21/10

to java...@googlegroups.com

Alan,

Just ensure you get the correct answer for your situation, when you say a lot of access, do you mean read-only or r/w? Concurrency a concern? You say you're looking for the fastest...but do you have any memory limitations to be concerned about?

Cheers,

Steve

--
You received this message because you are subscribed to the Google Groups "The Java Posse" group.
To post to this group, send email to java...@googlegroups.com.
To unsubscribe from this group, send email to javaposse+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/javaposse?hl=en.

Alexey Zinger

unread,

Jul 21, 2010, 3:26:52 PM7/21/10

to java...@googlegroups.com

Not to nitpick, but in your sample, you're better off using ++i instead of i++ in the interests of speed and memory.

From: Alan Kent <ALAN....@saic.com>
To: java...@googlegroups.com
Sent: Wed, July 21, 2010 10:18:01 AM
Subject: [The Java Posse] Fastest way to parse data out of array of bytes?

-- You received this message because you are subscribed to the Google Groups "The Java Posse" group.
To post to this group, send email to java...@googlegroups.com.

To unsubscribe from this group, send email to javaposse+unsub...@googlegroups.com.

Fabrizio Giudici

unread,

Jul 21, 2010, 3:38:29 PM7/21/10

to java...@googlegroups.com, Alexey Zinger

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>
> Note I am after the FASTEST way to do this in Java (and/or Scala).
> Is it better to use the stream based classes, or is it better to
> do direct array accesses and do bit shift operations and masks
> with 0xff etc (to strip sign extension Java will do otherwise)? I
> suspect the stream based approaches would be slower.

I think you have to try and measure. I've faced the same problem with
image decoding. The ImageI/O API contains an ImageInputStream which is
supposed to have some optimizations - even for bitwise, not only
bytewise operations. When I first worked on it a few years ago, I
found that writing some specific classes for specific cases (e.g.
reading 12 or 16 bits) was faster - in other words, the library didn't
provide the faster code. A few time later, I discovered that some
newer JRE was better and dropped some of my code because it was
useless. So, it's a matter of trying, measuring and re-measuring for
each JRE update.

- --
Fabrizio Giudici - Java Architect, Project Manager
Tidalwave s.a.s. - "We make Java work. Everywhere."
java.net/blog/fabriziogiudici - www.tidalwave.it/people
Fabrizio...@tidalwave.it
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxHTLUACgkQeDweFqgUGxcwxACeLpaYIPiaCCWZe4Q+mSPl4P6x
DMUAn2+o7vmQ32CxIcB+QobTB6Vov4ek
=F5Co
-----END PGP SIGNATURE-----

mikeb01

unread,

Jul 21, 2010, 4:58:57 PM7/21/10

to The Java Posse

If you care about speed put together a good set of performance tests
to measure what you're doing.

On the specific example you could avoid some of the intermediate
variables and a few implicit widening operations (however hotspot will
probably optimise this for you). Probably drop it into a static
method for a bit of reuse:

public static int toInt(byte[] buf, int offset) {
return (buf[offset] & 0xFF) << 24 + (buf[offset + 1] & 0xFF) <<
16 +
(buf[offset + 2] & 0xFF) << 8 + (buf[offset + 3] &
0xFF);
}

There are a whole bunch of these methods in java.io.Bits (package
protected) which may help with your implementation.

Mike.

On Jul 21, 8:38 pm, Fabrizio Giudici <fabrizio.giud...@tidalwave.it>
wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> > Note I am after the FASTEST way to do this in Java (and/or Scala).
> > Is it better to use the stream based classes, or is it better to
> > do direct array accesses and do bit shift operations and masks
> > with 0xff etc (to strip sign extension Java will do otherwise)? I
> > suspect the stream based approaches would be slower.
>
> I think you have to try and measure. I've faced the same problem with
> image decoding. The ImageI/O API contains an ImageInputStream which is
> supposed to have some optimizations - even for bitwise, not only
> bytewise operations. When I first worked on it a few years ago, I
> found that writing some specific classes for specific cases (e.g.
> reading 12 or 16 bits) was faster - in other words, the library didn't
> provide the faster code. A few time later, I discovered that some
> newer JRE was better and dropped some of my code because it was
> useless. So, it's a matter of trying, measuring and re-measuring for
> each JRE update.
>
> - --
> Fabrizio Giudici - Java Architect, Project Manager
> Tidalwave s.a.s. - "We make Java work. Everywhere."
> java.net/blog/fabriziogiudici -www.tidalwave.it/people

> Fabrizio.Giud...@tidalwave.it

> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.14 (Darwin)

> Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/

Rob Wilson

unread,

Jul 22, 2010, 2:13:24 AM7/22/10

to The Java Posse

Are each of the 4 groups of 8 bits likely to be signed? if not, you
could perhaps 'or' together the first three, then perhaps only the
last needs to be 'anded' with FF?

I don't know where your getting your data from, but wonder whether 3/4
of them are unsigned?

Certainly don't hold temporary variables inside the loop, the static
method sounds better, although I don't know how many invocations will
be made before it is in-lined; perhaps you should copy the code in-
line to be sure?

Cheers
Rob Wilson

Alan Kent

unread,

Jul 22, 2010, 3:53:24 AM7/22/10

to java...@googlegroups.com

Replying to lots of different emails in one (sorry if got attribute
messed up!)

On 22/07/2010 1:42 AM, Steven Siebert wrote:
> Just ensure you get the correct answer for your situation, when you
> say a lot of access, do you mean read-only or r/w? Concurrency a
> concern? You say you're looking for the fastest...but do you have any
> memory limitations to be concerned about?

Single threaded. Might pass different blobs off to different threads to
be processed, but each blob would be parsed by a single thread. So no
concurrency concerns. The result of decoding might be handed off to
other threads to process.

Best analogy is writing a server in Java receiving binary encoded
packets. Need to decode packet read from a stream (socket), decode it,
do some work (one packet might be able to be spawned off into multiple
requests done by different threads), the formulate a response packet and
send it back. (I am not actually reading from a socket in my case - its
coming from files). But primary interest is decoding a packet. Also
interested now you mention it to build up a new packet to respond. But
it would be read and pull apart, or write and build up a new one (not
both at same time).

Someone wrote: about buf[i++] versus buf[++i]
I would hope the JRE would be able to optimize things so there is no
real difference here? For buf[++i] I would have to start i at -1, which
feels a bit funny. A loop looking for the end of the packet would also
be a bit strange (while (i + 1 < buf.length) instead of while (i <
buf.length)).

Rob wrote:
> Are each of the 4 groups of 8 bits likely to be signed? if not, you

> could perhaps 'or' together the first three, then perhaps only the
> last needs to be 'anded' with FF?

For a 32-bit integer, since its an array of byte (which is signed) you have to worry about the signed value for 3 of the 4 bytes. The most significant byte you don't have to worry about the sign of because the<<24 shifts the sign off the end of the integer (right fills with zeros). The<< operator upcasts the byte to integer before the shift occurs, which is when the sign extension occurs.

> I don't know where your getting your data from, but wonder whether 3/4
> of them are unsigned?

Its an arbitrary 32-bit integer, so when spliced into bytes I have no guarantee the individual bytes are signed or not.

> Certainly don't hold temporary variables inside the loop, the static
> method sounds better, although I don't know how many invocations will
> be made before it is in-lined; perhaps you should copy the code in-
> line to be sure?

Do you know that local variables inside a loop are slower? I would have assumed with all the cleverness of the JRE that would have been optimized out! I was hoping the&0xff would also be optimized out with the assembler using unsigned byte->int conversions (rather than signed extension functions).

Yes, I would rather use methods not repeat the code. Have to do the operation say a hundred times - certainly don't want to repeat the code all over the place!

Someone wrote: java.io.Bits
java.io.Bits! I did not know of that one, thanks. Will have a look. Hmm, seems to be doing pretty much the same thing as my code, but using offsets of +1, +2 etc instead of ++. Interesting... I never realized there was a>>> operator in Java! (Unsigned shift.) There is no<<< operator, but its not the shift that is an issue anyway - its the upcast from byte to int.

Fabrizio wrote:
> So, it's a matter of trying, measuring and re-measuring for each JRE update.

Hmmm. That is exactly what I was hoping NOT to hear!!!

Thanks everyone for comments!
Alan

Fabrizio Giudici

unread,

Jul 22, 2010, 4:13:07 AM7/22/10

to java...@googlegroups.com, Alan Kent

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 7/22/10 09:53 , Alan Kent wrote:
>
> Fabrizio wrote:
>> So, it's a matter of trying, measuring and re-measuring for each
>> JRE update.
>
> Hmmm. That is exactly what I was hoping NOT to hear!!!
>
>

I know. :-) But one of the first things that I learned when I started
writing an image codec is that I needed performance testing
(automated, I mean). Then I understood I even need continuous
performance testing, as I discovered that some apparently minor
refactorings o my code could have an impact on performance. And as
it's usual for continuous practices, the point is to be aware of the
change as soon as possible, so you an track it down to the cause.

Forgot to say that my experience was also about very different
performance with different operating systems, but this wasn't due to
low level code as the one we're discussing about, rather to the
imaging libraries in the runtime, so it probably doesn't affect you.

- --
Fabrizio Giudici - Java Architect, Project Manager
Tidalwave s.a.s. - "We make Java work. Everywhere."
java.net/blog/fabriziogiudici - www.tidalwave.it/people

Fabrizio...@tidalwave.it

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxH/ZMACgkQeDweFqgUGxf7aACfRSYjXG+I/2DpawbDtwwUBTjj
IP0AnjHjBVO7Tb5N64ueRwQqbjfL248Z
=vU5v
-----END PGP SIGNATURE-----

Reinier Zwitserloot

unread,

Jul 22, 2010, 6:43:10 AM7/22/10

to The Java Posse

TL;DR: Your question cannot be answered. Those who are trying are
giving you bad advice. The only way to solve your issue is to run a
profiler on real world data.

More extensive answer:

Heh, nice. This is exactly why you shouldn't ask these questions.
Alexey Zinger's comment that i++ is slower than ++i is exactly the
kind of completely wrong micro optimization bullpuckey you get when
you do.

This is java, it's not C. The answer depends on use case, behaviour,
java version, hot spot compiler, OS, environment, alternate load, and
a few other things. Therefore, your original question: "What's the
fastest way to do X in java" is simply impossible to answer.

The right way to handle this situation is to measure a complete use
case. Do NOT micro benchmark. Run a real profiler, on real-world
representative data, in a real world situation (i.e. run a music
player and a webserver that's being actively pinged if that's the
situation that's likely when performance counts). As a practical
matter, and going mostly by gut instinct, most of the things you said
(such as, should I go by stream) don't really work; how do you think
that stream class reads bytes from the array in a way that avoids a
generated bounds check? It won't.

Also, if this is for android, well, that's another beast altogether.
There's no actual bounds check in class files, the JVM inserts it when
running your class file. Android takes class files and rewrites it
into dalvik opcodes. Who knows if dalvik optimizes array bounds
checks? Also, when you say that some optimization such as bound check
reduction wasn't added because it barely made a difference then..
well, experts say it barely makes a difference. That's better advice
any of us on this list could possibly give you!

jitesh dundas

unread,

Jul 22, 2010, 6:46:16 AM7/22/10

to java...@googlegroups.com

PLease dont forget that the way you implement the method will also matter...rather than just Java and its functionality..

Regards,

jd

--
You received this message because you are subscribed to the Google Groups "The Java Posse" group.
To post to this group, send email to java...@googlegroups.com.

To unsubscribe from this group, send email to javaposse+...@googlegroups.com.

Viktor Klang

unread,

Jul 22, 2010, 7:04:23 AM7/22/10

to java...@googlegroups.com

For uber Java performance you want to eliminate all virtual calls, keep methods small (so they can be inlined and optimized further),
you also want to avoid volatile reads and writes as well as avoiding branch-misprediction and L2 trashing, blocking IO and locks.

--
You received this message because you are subscribed to the Google Groups "The Java Posse" group.
To post to this group, send email to java...@googlegroups.com.
To unsubscribe from this group, send email to javaposse+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/javaposse?hl=en.

--
Viktor Klang
| "A complex system that works is invariably
| found to have evolved from a simple system
| that worked." - John Gall

Akka - the Actor Kernel: Akkasource.org
Twttr: twitter.com/viktorklang

Roel Spilker

unread,

Jul 22, 2010, 7:11:03 AM7/22/10

to java...@googlegroups.com

Maybe we should stop trying to give advice, apart from "Benchmark it in a real world scenario". The virtual calls could (and probably would) still be inlined if the VM can determine that it's possible. If not now, then probably in the next VM update.

Roel

Van: viktor...@gmail.com [mailto:java...@googlegroups.com] Namens Viktor Klang
Verzonden: 22 July 2010 13:04
Aan: java...@googlegroups.com
Onderwerp: Re: [The Java Posse] Re: Fastest way to parse data out of array of bytes?

Fabrizio Giudici

unread,

Jul 22, 2010, 7:15:54 AM7/22/10

to java...@googlegroups.com, Roel Spilker

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 7/22/10 13:11 , Roel Spilker wrote:
> Maybe we should stop trying to give advice, apart from "Benchmark
> it in a real world scenario". The virtual calls could (and
> probably would) still be inlined if the VM can determine that it's
> possible. If not now, then probably in the next VM update.
>

Agreed. The point is that too many things are happening inside the VM,
and unless you're really a super expert (which means: you are Kirk
Pepperdine, one of the guys who write the C code inside the VM or at
least one who is able to read it) my impression is that any knowledge
about it is at least partially wrong or obsolete. Really, the least
risky strategy is to benchmark appropriately on a real world scenario.
Do that.

- --
Fabrizio Giudici - Java Architect, Project Manager
Tidalwave s.a.s. - "We make Java work. Everywhere."
java.net/blog/fabriziogiudici - www.tidalwave.it/people
Fabrizio...@tidalwave.it
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxIKGoACgkQeDweFqgUGxdh7wCgh2OaLM8hg2gNK/Z27Vgjth3B
3DwAoLIMZ/3TWM+SOrh5pgDciznshsvX
=pSSX
-----END PGP SIGNATURE-----

Roel Spilker

unread,

Jul 22, 2010, 7:16:25 AM7/22/10

to java...@googlegroups.com

And, Victor, I do agree with most of your suggestions, but mainly for a different reason. I prefer to optimize towards readability.

For instance:

- Eliminate all virtual calls -> prefer composition over inheritance

- Avoid volatile read and writes -> prefer immutable objects

That said, I definately would stay away from thinking too much about branch-misprediction.

Roel

Van: viktor...@gmail.com [mailto:java...@googlegroups.com] Namens Viktor Klang
Verzonden: 22 July 2010 13:04
Aan: java...@googlegroups.com
Onderwerp: Re: [The Java Posse] Re: Fastest way to parse data out of array of bytes?

Roel Spilker

unread,

Jul 22, 2010, 7:35:48 AM7/22/10

to java...@googlegroups.com

So: If you are woried about performance, make sure you use a state-of-the-art VM.

Van: viktor...@gmail.com [mailto:java...@googlegroups.com] Namens Viktor Klang

Verzonden: 22 July 2010 13:21

Aan: java...@googlegroups.com
Onderwerp: Re: [The Java Posse] Re: Fastest way to parse data out of array of bytes?

On Thu, Jul 22, 2010 at 1:11 PM, Roel Spilker <R.Sp...@topdesk.com> wrote:

Maybe we should stop trying to give advice, apart from "Benchmark it in a real world scenario". The virtual calls could (and probably would) still be inlined if the VM can determine that it's possible. If not now, then probably in the next VM update.

Absolutely, however, how many Java devs are developing their software for future VMs?
In the future perhaps the entire code will be after-compiled by an artificial intelligence in the cloud...

What I listed are things that _today_ affect performance (if we're talking about millions of operations per second), not a look that gets completely unrolled.

There is merit to your words, but I think we should take a minute to reflect on the VMs that are out there, running code, today.
A lot of them are still on Java 1.4 or 5.

Alan Kent

unread,

Jul 28, 2010, 12:56:05 AM7/28/10

to java...@googlegroups.com

In case anyone cares, did a bit of testing and found:

(1) Running in Eclipse using "Run" was about 3 times slower than running
"java" on command line (on my particular machine) on the same class
files. So yes, environment, JVM version etc make a BIG difference
straight up.

(2) If you want to write a class reading from a stream with a field
holding the current index being read from and you want to read 4 bytes,
its faster to update index once than updating 4 times.

class ByteBuf {
private int i = 0;
private byte[] buf = ...;

int slowerReadInt32() {

byte b1 = buf[i++];
byte b2 = buf[i++];
byte b3 = buf[i++];
byte b4 = buf[i++];

return (b1 << 24) | ((b2 & 0xff) << 16) | ((b3 & 0xff) <<
8) | (b4 & 0xff);
}

int fasterReadInt32() {
byte b1 = buf[i];
byte b2 = buf[i+1];
byte b3 = buf[i+2];
byte b4 = buf[i+3];
i += 4;
return (b1 << 24) | ((b2 & 0xff) << 16) | ((b3 & 0xff) <<
8) | (b4 & 0xff);
}
}

Upon reflection of course it will be faster. The first function has to
update the field 'i' per increment in case there is an index out of
bounds exception. The second only updates the field once.

Doing Google searches on Java vs C++ performance I found some Wikipedia
etc pages, that I don't trust much. There were a few subjective
sounding statements in there, and some comments talked about the new JIT
that might speed Java up. The only comments I almost trust were Java
probably does new/delete faster than C++ but tends to consume more
memory, and Java JIT might get code to same raw execution speed to same
as C++ (but 4 times slower is not unexpected).

So my next meaningless microbenchmark will be to try Scala with lazy
vals pulled from a byte[] (and so potentially avoiding the need for
converting bytes in a buffer into int's in the first place).

Alan

Fabrizio Giudici

unread,

Jul 28, 2010, 1:20:58 AM7/28/10

to java...@googlegroups.com, Alan Kent

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 7/28/10 06:56 , Alan Kent wrote:
> In case anyone cares, did a bit of testing and found:
>
> (1) Running in Eclipse using "Run" was about 3 times slower than
> running "java" on command line (on my particular machine) on the
> same class files. So yes, environment, JVM version etc make a BIG
> difference straight up.

This sounds pretty strange, indeed.

- --
Fabrizio Giudici - Java Architect, Project Manager
Tidalwave s.a.s. - "We make Java work. Everywhere."
java.net/blog/fabriziogiudici - www.tidalwave.it/people
Fabrizio...@tidalwave.it
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxPvjkACgkQeDweFqgUGxfQpgCgsIGwOLMrv4Bal0x30moNH8Eh
p9UAniJEnhGpaWd0YlzwrVVAdo2Tmc/c
=OTB7
-----END PGP SIGNATURE-----

Casper Bang

unread,

Jul 28, 2010, 3:38:52 AM7/28/10

to The Java Posse

> Doing Google searches on Java vs C++ performance I found some Wikipedia
> etc pages, that I don't trust much. There were a few subjective
> sounding statements in there, and some comments talked about the new JIT
> that might speed Java up. The only comments I almost trust were Java
> probably does new/delete faster than C++ but tends to consume more
> memory, and Java JIT might get code to same raw execution speed to same
> as C++ (but 4 times slower is not unexpected).

There's a reason why some core parts of a system is done in a non-
managed language like C/C++, i.e. even Android ships with an NDK for
doing the heavy lifting. No matter how good a JIT can be, there's just
overhead associated - especially with the heavy method-based JIT used
by HotSpot. In essence we trade in responsiveness, predictability and
memory with [eventual] speed. I would imaging that missing value-types
from the JVM’s core typesystem could also be a problem, regardless of
how careful the developer tries to avoid boxing.

Christian Catchpole

unread,

Jul 28, 2010, 6:18:39 AM7/28/10

to The Java Posse

I'v been avoiding this thread because I know that as much as I know I
know about optimisation and the JIT I know there is probably more that
I don't know.

1. regardless of how fast the fasterReadInt32 method works on a
particular JVM, what is the ratio of this to the time spent operating
on that data?

2. if the read time really matters and you are moving lots of chunks
of memory around, look at NIO / Buffers.

3. yes, java can use more memory than a program is C++ can
theoretically use. but there are many many reasons you use a VM for
enterprise / server applications. so if performance matters that much,
optimise your C++ or assembler and welcome to a world of spending 10
times as much time getting the same stuff done (and then not sleeping
at night wondering if your server has come down with a general
protection fault)

4. having said that, VMs just get better and better and you don't have
to lift a finger (other than installing a new one).

5. as Casper mentions, there are some things that are suited to pre-
compiled code. I'm thinking things like media codecs and compression
algorithms as they seem to work on fixed, pre-allocated memory blocks.
and they can be "fail proof" as well (no out of memory) and somewhat
time predictable. but business applications have large, complex and
dynamic object models with lots of strings, lists, maps blah blah
blah. In this case, go the JVM.

6. Using "java" on the command line to start Eclipse may have used a
bunch of defaults that were not suitable. Max memory the obvious one.
So despite how much coolness is in the JVM, we still have the defaults
that bring things undone from the consumers point of view.

7. A "linux" friend of mine doesn't like Java because a "Hello World"
uses so much memory... My response: Duh!

pforhan

unread,

Jul 28, 2010, 9:14:52 AM7/28/10

to The Java Posse

On Jul 27, 11:56 pm, Alan Kent <ALAN.J.K...@saic.com> wrote:
> Upon reflection of course it will be faster. The first function has to
> update the field 'i' per increment in case there is an index out of
> bounds exception. The second only updates the field once.

One quick thing: Variable i should not be a field, it should be local
to the method. This class in not thread-safe with i being used in
this manner.

Anyway, it would be interesting to compare those results. I suspect
you'll see much less difference between the methods, as stack
references are always faster than field references.

Pat.

Alexey Zinger

unread,

Jul 28, 2010, 12:26:44 PM7/28/10

to java...@googlegroups.com

Regarding point 2, saw a reference to the following article on Slashdot today claiming java.io to be faster than java.nio: http://developers.slashdot.org/story/10/07/27/1925209/Java-IO-Faster-Than-NIO Don't have much experience with NIO and I've never been in a position of having to optimize every last bit out of an IO algorithm, so not sure quite what to make of this.

Alexey

From: Christian
 Catchpole <chri...@catchpole.net>
To: The Java Posse <java...@googlegroups.com>
Sent: Wed, July 28, 2010 6:18:39 AM
Subject: [The Java Posse] Re: Fastest way to parse data out of array of bytes?

-- 
You
 received this message because you are subscribed to the Google Groups "The Java Posse" group.
To post to this group, send email to java...@googlegroups.com.

To unsubscribe from this group, send email to javaposse+unsub...@googlegroups.com.

Alexey Zinger

unread,

Jul 28, 2010, 12:28:26 PM7/28/10

to java...@googlegroups.com

Have you tried inlining those methods in your benchmark (no field references, as was mentioned -- just local variables).

Alexey
2001 Honda CBR600F4i (CCS)
2002 Suzuki Bandit 1200S
1992 Kawasaki EX500
http://azinger.blogspot.com
http://bsheet.sourceforge.net
http://wcollage.sourceforge.net

From: Alan Kent <ALAN....@saic.com>

To: java...@googlegroups.com
Sent: Wed, July 28, 2010 12:56:05 AM
Subject: Re: [The Java Posse] Re: Fastest way to parse data out of array of  bytes?

-- You received this message because you are subscribed to the Google Groups "The Java Posse" group.
To post to this group, send email to java...@googlegroups.com.

To unsubscribe from this group, send email to javaposse+unsub...@googlegroups.com.

Kirk

unread,

Jul 28, 2010, 12:48:35 PM7/28/10

to java...@googlegroups.com

On Jul 28, 2010, at 6:28 PM, Alexey Zinger wrote:

Have you tried inlining those methods in your benchmark (no field references, as was mentioned -- just local variables).

hotspot aggressively inlines. No need to inline but you must warmup the bench. Variables will be cached regardless of scoping.

Eclipse (or all IDE's for that matter) instrument the JVM that they launch so they can control it. It does have an effect on performance but 3x isn't normal.

I doubt that there is no real difference in the performance of the two methods in a single threaded bench. In fact this code is not thread safe so.. the value will be hoisted into a register and it will sit there. And, I doubt there is much of a difference if any, in the performance of this code written in C++ or Java.

Regards,

Kirk

To unsubscribe from this group, send email to javaposse+...@googlegroups.com.

Reinier Zwitserloot

unread,

Jul 28, 2010, 2:01:59 PM7/28/10

to The Java Posse

This is not a valid performance analysis.

You didn't mention "and that's when the hotspot compiler kicked in"
even once, for example.

Casper Bang

unread,

Jul 28, 2010, 2:02:43 PM7/28/10

to The Java Posse

> I was looking into array bounds checks, and what I found via Google
> indicated that hotspot leaves in array bounds checks as there was only a
> minor performance improvement found in practice. This lead me to wonder
> if there is a faster way to do the code since I would be doing lots of
> array accesses, each with a bounds check.

Apropos avoiding array bounds check, you could also venture into how
NIO is implemented and get a bit more low-level. Something along the
line of this (cooked up UDP packet decoding as proof-of-concept):

byte[] datagram = new byte[]{ 0x50, 0x00, (byte)0xbb, 0x1,
0xC, 0x0, 0x0, 0x0,
0x74, 0x65, 0x73, 0x74};

Field field = Unsafe.class.getDeclaredField("theUnsafe");
field.setAccessible(true);
Unsafe $ = (Unsafe) field.get(null);

long shortByteSize = Short.SIZE >> 3;
long offset = $.arrayBaseOffset(byte[].class);

short source = $.getShort(datagram, offset );
short dest = $.getShort(datagram, offset += shortByteSize);
short length = $.getShort(datagram, offset += shortByteSize);
short checksum = $.getShort(datagram, offset += shortByteSize);
String payload = new String( copyOfRange(datagram, UDP_HEADER_SIZE,
length) );

out.println("UDP packed content");
out.println("Source: " + source);
out.println("Destination: " + dest);
out.println("Length: " + length);
out.println("Checksum: " + checksum);
out.println("Payload: " +payload);

It's hard to know without your benchmarking suite, but it strikes me
that the above, though unsafe, has a good chance of mapping to
efficient native code.

/Casper

Fabrizio Giudici

unread,

Jul 28, 2010, 2:12:32 PM7/28/10

to java...@googlegroups.com, Casper Bang

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 7/28/10 20:02 , Casper Bang wrote:
>
> It's hard to know without your benchmarking suite, but it strikes
> me that the above, though unsafe, has a good chance of mapping to
> efficient native code.

BTW - if I'm not wrong, a few months ago Kohsuke blogged about
something related with the performance and by using some tool he
attached the generated native code (of course, at a given stage of the
run)... Am I wrong? Does such a tool exist? Or did I just dream about it?

- --
Fabrizio Giudici - Java Architect, Project Manager
Tidalwave s.a.s. - "We make Java work. Everywhere."
java.net/blog/fabriziogiudici - www.tidalwave.it/people
Fabrizio...@tidalwave.it
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxQcxAACgkQeDweFqgUGxfeSQCfbB2m32lq0WpU2mBdOapkKv8X
DssAnR7R2A+oF52tSWW2A+MVc7KzmeZi
=D4mh
-----END PGP SIGNATURE-----

Stuart McCulloch

unread,

Jul 28, 2010, 3:37:37 PM7/28/10

to java...@googlegroups.com

On Jul 29, 2010, at 2:12, Fabrizio Giudici <fabrizio...@tidalwave.it
> wrote:

>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 7/28/10 20:02 , Casper Bang wrote:
>>
>> It's hard to know without your benchmarking suite, but it strikes
>> me that the above, though unsafe, has a good chance of mapping to
>> efficient native code.
> BTW - if I'm not wrong, a few months ago Kohsuke blogged about
> something related with the performance and by using some tool he
> attached the generated native code (of course, at a given stage of the
> run)... Am I wrong? Does such a tool exist? Or did I just dream
> about it?

http://weblogs.java.net/blog/kohsuke/archive/2008/03/
deep_dive_into.html ?

>
> - --
> Fabrizio Giudici - Java Architect, Project Manager
> Tidalwave s.a.s. - "We make Java work. Everywhere."
> java.net/blog/fabriziogiudici - www.tidalwave.it/people
> Fabrizio...@tidalwave.it
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAkxQcxAACgkQeDweFqgUGxfeSQCfbB2m32lq0WpU2mBdOapkKv8X
> DssAnR7R2A+oF52tSWW2A+MVc7KzmeZi
> =D4mh
> -----END PGP SIGNATURE-----
>

> --
> You received this message because you are subscribed to the Google
> Groups "The Java Posse" group.
> To post to this group, send email to java...@googlegroups.com.

> To unsubscribe from this group, send email to javaposse+...@googlegroups.com

Fabrizio Giudici

unread,

Jul 28, 2010, 4:15:49 PM7/28/10

to java...@googlegroups.com, Stuart McCulloch

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 7/28/10 21:37 , Stuart McCulloch wrote:
>>
> http://weblogs.java.net/blog/kohsuke/archive/2008/03/deep_dive_into.html

That's
>
it.

PS My sense of time is getting completely distorted. I believed it was
only a few months ago, and it was more than two years!

- --
Fabrizio Giudici - Java Architect, Project Manager
Tidalwave s.a.s. - "We make Java work. Everywhere."
java.net/blog/fabriziogiudici - www.tidalwave.it/people
Fabrizio...@tidalwave.it
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxQj/QACgkQeDweFqgUGxe7EgCffvautmkDY1tWd5hWN2sG+mRZ
HukAniM+eRbuElfQJfXb6Ns95zn8oZaO
=j0kf
-----END PGP SIGNATURE-----

Reinier Zwitserloot

unread,

Jul 29, 2010, 1:05:54 AM7/29/10

to The Java Posse

Link just seems to take me to his frontpage for me.

This is a good start, though: the -XX:-PrintCompilation option will
print the name of each method as it gets jit-compiled. It doesn't
actually show what native code is produced, but at least you can work
on your benchmark a little bit more.

That's not to say what you're doing is at all useful, really. You
can't microbenchmark code for the JVM. End of discussion. Stop doing
it.

On Jul 28, 10:15 pm, Fabrizio Giudici <fabrizio.giud...@tidalwave.it>
wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 7/28/10 21:37 , Stuart McCulloch wrote:
>
>
>
> >http://weblogs.java.net/blog/kohsuke/archive/2008/03/deep_dive_into.html
>
> That's
>
> it.
>
> PS My sense of time is getting completely distorted. I believed it was
> only a few months ago, and it was more than two years!
>
> - --
> Fabrizio Giudici - Java Architect, Project Manager
> Tidalwave s.a.s. - "We make Java work. Everywhere."
> java.net/blog/fabriziogiudici -www.tidalwave.it/people

> Fabrizio.Giud...@tidalwave.it

> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.14 (Darwin)

> Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/

Alan Kent

unread,

Jul 29, 2010, 1:47:03 AM7/29/10

to java...@googlegroups.com

On 29/07/2010 3:05 PM, Reinier Zwitserloot wrote:
> That's not to say what you're doing is at all useful, really. You
> can't microbenchmark code for the JVM. End of discussion. Stop doing
> it.
>

Agreed completely. Microbenchmarks are interesting, but dangerous to
rely on. It was a loop run 100,000,000 times so I would have hoped the
JIT would have jumped in, but it is one small part of an overall problem.

I have a choice between writing something in C++ or Java (maybe Scala if
I can swing it). My take on the state of the universe is C++ is safer
from a performance perspective (for my project which has high throughput
requirements), Java is safer from a speed of code development and
maintainability perspective. I don't need lots of memory allocations,
so the GC benefits of Java are not relevant. Java also has issues being
a less friendly citizen on a box shared with other processes (more
memory hungry).

But thanks Casper, sun.misc.Unsafe was an interesting find that I never
knew about!

Thanks to everyone else for contributions. Interesting stuff.
Alan

Fabrizio Giudici

unread,

Jul 29, 2010, 2:03:17 AM7/29/10

to java...@googlegroups.com, Alan Kent

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 7/29/10 07:47 , Alan Kent wrote:
> On 29/07/2010 3:05 PM, Reinier Zwitserloot wrote:
>> That's not to say what you're doing is at all useful, really.
>> You can't microbenchmark code for the JVM. End of discussion.
>> Stop doing it.
>>
>
> Agreed completely. Microbenchmarks are interesting, but dangerous
> to rely on. It was a loop run 100,000,000 times so I would have
> hoped the JIT would have jumped in, but it is one small part of an
> overall problem.

Agreed on the uselessness of microbenchmarking. My point with the
- -XX:Print... stuff is that it should at least solve some common doubts
such as "the code should be inlined" etc... that are recurring and
seems to be never solved by a discussion.

- --
Fabrizio Giudici - Java Architect, Project Manager
Tidalwave s.a.s. - "We make Java work. Everywhere."
java.net/blog/fabriziogiudici - www.tidalwave.it/people

Fabrizio...@tidalwave.it

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxRGaQACgkQeDweFqgUGxdp8wCfYQ3FjDJJtuokRT7RsjfiOXfd
1fwAn1uRcY1GlQvvZIuARCIZ7QvfpnD9
=qjRW
-----END PGP SIGNATURE-----

Kirk

unread,

Jul 29, 2010, 2:27:23 AM7/29/10

to java...@googlegroups.com

Microbenchmarks are as useful as any other type of benchmarking. The problem is, they are very very very difficult to get right. You need to do a lot of work to validate the results you get from any benchmark, large or small. I've missed the beginning of the conversation so I never saw all of the code but then maybe it was never published. My instinct tells me that this is yet another naive attempt at benchmarking.... so be it, this is how we learn. BTW, even the experts get it wrong.. for example, SpecJMS is a benchmark for JMS implementations. It's a great way to measure the disk speed on your machines ;-) IOW, it's yet another busted benchmark published by an organization that is dedicated to providing our industry is a set of standard benchmarks. Lesson learned, know where your bottleneck is and understand if it's where it should be 'cos you're always going to have one.

I can show you plenty of problems where the language of implementation can never be an issue w.r.t. performance. Choosing a language based on "performance safeness" is the ultimate premature optimization. There are just so many other factors that need to be considered.

Regards,
Kirk

Casper Bang

unread,

Jul 29, 2010, 3:03:15 AM7/29/10

to The Java Posse

> My instinct tells me that this is yet another naive attempt at benchmarking....

But if Alan identified a performance trouble area, then surely he's
can't do anything else than what he is doing... trial-n-error under
simulated conditions (acceptance tests). We can't all be Brian Goetz
and walking around with intricate detail of the internals of the JVM.
This is where the poor interoperability story of the JVM comes in;
even if you choose to step out of the bounds of the managed
environment for a specific corner case, in favor of determinism,
you'll have a very hard time doing it. You can kinda do it with JNI
and sun.misc.Unsafe, but it's not portable. I like to think that if it
was possible, we'd see Java within a wider spectrum of applications
(closer to systems programming) but I also know Sun has always been
very religious about this... there is no spoon.

Alan Kent

unread,

Jul 29, 2010, 3:07:03 AM7/29/10

to java...@googlegroups.com

On 29/07/2010 4:27 PM, Kirk wrote:
> Microbenchmarks are as useful as any other type of benchmarking. The problem is, they are very very very difficult to get right. You need to do a lot of work to validate the results you get from any benchmark, large or small. I've missed the beginning of the conversation so I never saw all of the code but then maybe it was never published.

Just a bit of back fill (happy for this thread to die off now) - I had
some raw C struct like data in an array of bytes. I am trying to put
forward a case for using Java (or maybe Scala) instead of C/C++ in a
project. Performance is critical. In C/C++, one argument is you can
cast the pointeer to the array of bytes and volia! you can access all
the int's etc. Very performant. Obviously cannot do this in Java, so
was trying to work out how close I could get Java to squash this
argument (if possible). Obviously the overall application makes a big
difference too. Right now C++ is safe from a performance perspective,
Java safer from a code maintainability perspective. There is a hard
performance requirement on the project (harder than the code
maintainability requirement).

Thanks
Alan

Christian Catchpole

unread,

Jul 29, 2010, 3:45:26 AM7/29/10

to The Java Posse

Sounds like a case for NIO.

Kirk

unread,

Jul 29, 2010, 4:15:07 AM7/29/10

to java...@googlegroups.com

On Jul 29, 2010, at 9:03 AM, Casper Bang wrote:

>> My instinct tells me that this is yet another naive attempt at benchmarking....
>
> But if Alan identified a performance trouble area, then surely he's
> can't do anything else than what he is doing... trial-n-error under
> simulated conditions (acceptance tests). We can't all be Brian Goetz
> and walking around with intricate detail of the internals of the JVM.

Nonsense.. :-) You may not understand what is going on but that doesn't mean it's not deterministic, very very deterministic. One doesn't need to have all the internal details of the JVM to write a decent benchmark. You just need to know a few simple concepts. There is a profiler that execution model or our application. When a portion of the model is properly developed, the profiler tells the JIT to compile the code. In the process, it looks for common patterns in the execution model that it can optimize. Optimizations often involve a reorganization of the code. Do I know what those patterns are? Nope! Do I care? I might but.... often I don't. I just want it to happen and I want to make sure that I'm making measurements when it's all done. But that's common with every bench... I want to work through the startup phase before I measure. To do that I can tell the JIT to log compilation. If the JIT stops logging, you're finished warming up and it's time to measure. If I *need* to know how the JIT has treated my code, I'll ask the it to dump assembler. If you don't know how to do these things, you need to investigate and learn (like Brian has) or you simply shouldn't be benchmarking. And thats ok, benchmarking is not for everyone. I don't write GUI code and probably never should ;-)

I wouldn't use Unsafe but for corner cases in specific instances. However one could use AtomicReference which uses Unsafe or the equivalent in other environments. But that's typically not so easy so you're right, there is no spoon.

Regards,
Kirk

Kirk

unread,

Jul 29, 2010, 4:19:10 AM7/29/10

to java...@googlegroups.com

Sorry, I just don't buy the standard line that C/C++ is a safe from a performance perspective. Array access in Java will be every bit as fast. Range checking will most likely be jit'ed out of the code. Direct access will most likely be jit'ed into the code....

Regards,
Kirk

Ben Schulz

unread,

Jul 29, 2010, 4:45:23 AM7/29/10

to The Java Posse

On 29 Jul., 10:15, Kirk <kirk.pepperd...@gmail.com> wrote:
> Nonsense.. :-) You may not understand what is going on but that doesn't mean it's not deterministic, very very deterministic.

A small excerpt from Josh Bloch's "Mind the Semantic Gap"[1]:
> [W]hen it does come time to optimize, the process is greatly complicated by the semantic gap. Consider this: Suppose you carefully write a well-designed microbenchmark that does everything right (e.g., warms up the VM, ensures that computation is not optimized away, times a sufficient amount of computation, does multiple timings to ensure repeatability). You run the benchmark, and see that after warmup, every run takes nearly the same amount of time. Happiness? You run the benchmark again, and again every run takes the same amount of time, but it's a different amount! You run the program twenty times, and see the results clustering into several groups, but always consistent within a program-run. What is going on?
>
> In modern VMs such as HotSpot, the task of deciding what to inline and when to inline it is performed by a background thread (at runtime). This process is called "compilation planning." Because it's multithreaded, it's nondeterministic.

[1] http://wiki.jvmlangsummit.com/MindTheSemanticGap

With kind regards
Ben

Roel Spilker

unread,

Jul 29, 2010, 4:55:40 AM7/29/10

to java...@googlegroups.com

And if array access boundary checks are not jitted out, I'm guessing most modern processors will use branch prediction (http://en.wikipedia.org/wiki/Branch_predictor) to execute the array access at the same time it's checking the bounds. Since usually the array index is within bounds, there is little chance in misprediction and little overhead in leaving the boundary checks in.

Roel

> -----Oorspronkelijk bericht-----
> Van: kirk.pe...@gmail.com
> [mailto:java...@googlegroups.com] Namens Kirk
> Verzonden: 29 July 2010 10:19
> Aan: java...@googlegroups.com
> Onderwerp: Re: [The Java Posse] Bad benchmarks was fastest
> way to parse

Kirk

unread,

Jul 29, 2010, 4:58:03 AM7/29/10

to java...@googlegroups.com

This isn't my experience.. I've found HotSpot to be very predictable. Case in point, I recently helped a client diagnose a luke-warm method problem related to hotspot compilation. We were able to completely predict if and when a key method would be compiled along with how it would be compiled. The effect that Josh is talking about is common when the execution profile changes over time. HotSpot decompiles, remeasures and then recompiles.

One other point, microbenchmarking almost always involves creating code that confuses HotSpot so that it cannot apply the optimizations that would normally be applied. IOWs, it's not production code and the effects in production may be different than those found in the benchmark.

To Josh's point, figuring out when these changes may happen isn't easy but it's not impossible.

Regards,
Kirk

Casper Bang

unread,

Jul 29, 2010, 5:37:40 AM7/29/10

to The Java Posse

I'll still claim it's non-deterministic, you can not possibly know
certain aspects of the hardware (word length, branch predictor, hyper-
treading/pipeline-swap support, CAS-support etc.) and software (which
gen an object is in, what will be inlined, which compacting strategy,
where there are safe-points, context-switches and write-barriers).

Don't get me wrong, that's the wonder of the JVM/JIT. But sometimes it
would be nice to be able to just rely on AOT compilation so you'd
truly know what you are getting. Btw. I never really understood why
the JVM doesn't just cache an already JIT'ed memory image which can
then just be loaded next time without verifier and profiler running,
especially when running stuff in client-mode.

/Casper

Fabrizio Giudici

unread,

Jul 29, 2010, 6:05:10 AM7/29/10

to java...@googlegroups.com, Casper Bang

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 7/29/10 11:37 , Casper Bang wrote:
> I'll still claim it's non-deterministic, you can not possibly know
> certain aspects of the hardware (word length, branch predictor,
> hyper- treading/pipeline-swap support, CAS-support etc.) and
> software (which gen an object is in, what will be inlined, which
> compacting strategy, where there are safe-points, context-switches
> and write-barriers).
>
> Don't get me wrong, that's the wonder of the JVM/JIT. But
> sometimes it would be nice to be able to just rely on AOT
> compilation so you'd truly know what you are getting. Btw. I never
> really understood why the JVM doesn't just cache an already JIT'ed
> memory image which can then just be loaded next time without
> verifier and profiler running, especially when running stuff in
> client-mode.

Let me give my interpretation, as a totally non-expert of JIT. Even if
it's deterministic, it is in controlled conditions (the microbenchmark
context). In the real world, a lot of different things will happen at
the same time, and determinism goes away. Still, I find that the JIT
native code dump could be useful for guessing the upper bound, in some
cases. I mean, if you have to compute a FFT and are discussing with a
C/C++ guy, it would be nice to see the dump. If it's comparable with
the C/C++ code, you could tell the C/C++ guys that, at least in
optimal cases, there are no big differences (this is already a strong
point in a discussion, as still many people completely lacking the JIT
culture don't get the point). So, at this point it makes sense a
broader-horizon benchmark. If the code is much worse than C/C++, you
already know that you'll have a possible performance hit in that
section and would probably make sense to evaluate, together with all
the other requirements and constraints, to use native code or such. I
would run a broader-horizon benchmark all the same, but probably spend
less time with it. I
mean, I expect that there will be cases in which the code is good and
other in which isn't, and that could be an architectural hint.
Furthermore, it would be interesting to repeat the same comparison
after an upgrade of the runtime, e.g. when they say that there are JIT
improvements. Maybe you discover that the code in the optimistic
scenario has improved since the past and you can change your mind.

- --
Fabrizio Giudici - Java Architect, Project Manager
Tidalwave s.a.s. - "We make Java work. Everywhere."
java.net/blog/fabriziogiudici - www.tidalwave.it/people

Fabrizio...@tidalwave.it

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxRUlYACgkQeDweFqgUGxenWwCdF82gU3Z94avvYjmxb0Pnt7Eh
MLoAn2qh8IlwU8WvqJtc8aUzx30SnocE
=pvdD
-----END PGP SIGNATURE-----

Alan Kent

unread,

Jul 29, 2010, 7:46:08 PM7/29/10

to java...@googlegroups.com

On 29/07/2010 6:19 PM, Kirk wrote:
> Sorry, I just don't buy the standard line that C/C++ is a safe from a performance perspective. Array access in Java will be every bit as fast. Range checking will most likely be jit'ed out of the code. Direct access will most likely be jit'ed into the code....
>
> Regards,
> Kirk
>

I can only base it on one personal mid-sized program that I have done
myself (involving lots of XML parsing in Java and C++) which went around
10 times faster in C++ (a year or two back), and articles I can find
doing Google searches. Anyone who has concrete info would be great.
But it really needs the same program implemented in Java and C++ to do
direct comparisons. Some of the articles have so much subjective
sounding text in them I simply do not believe them. (One more
realisitic one involved was I think Quake being ported to Java and
performing just as well - but I don't know if most of the time is really
spent in the GPU rather than Java itself.)

Based on personal experience (10+ years of C++ programming and 5+ years
of Java programming), the discussions I have had writing performant C++
code involve talking about memory alignment, cache lines, avoiding
memory copies, templating, inlining, looking at the resultant assembler
etc. You can write multiple classes in C++, then use them to build up a
more complex data structure where the whole structure takes a single
malloc to create. Java if you use multiple classes you get multiple
memory allocations (one per object instance). In C++ I can write an
array of classes (or structs) and all the objects are inlined in the
array - in Java I have to have an array of references to objects, with a
new object for each value in the array. In writing Java you don't have
the sort of control as in C++. However in Java memory management is
cheaper (if you have the same number of mallocs!). In one large
(multi-million line C++ code) multi-threaded program, we found changing
the memory allocation library had a 20% difference (or more) in overall
performance. No other change the C++ code - just link in a different
malloc library and major difference in performance. We have had to
worry about things like which threads data structures were allocated
from as the malloc library had a pool per thread. If a different thread
ends up doing the frees, you end up with lots more lock contention in
the malloc library.

I don't want to get carried away here (its all been said before and I
did not mean to start Yet Another Language War), but I have read on the
web numerous opinions (not much evidence) saying Java can generate code
around the same performance as C++ code. I have never seen anything I
trust saying it can be much faster. I have heard (and experienced)
cases where its definitely much slower. I have heard many people I
trust in different forums all say C++ code executes faster - use it when
you want to control performance. Its backed up by personal experience.
I have not heard of significant sized projects where Java saved the day
over C++ in terms of performance (that are backed by believable
evidence). What I do believe is Java is much more productive for
programmers, pretty good in performance, and does have harder to measure
benefits in the more modern garbage collectors that only come up when
you have a large running system. Talk to our sysadm admins about Java
and they want to know if anything else needs to run on the same box as
if you have a few Java processes memory consumption goes through the
roof (compared to equivalent C/C++ programs) - making it harder to share
a box without problems. Not trying to be argumentative here, but I have
not seen any evidence that can change my mental model of Java = easier
to write and maintain, C++ = higher performance. (I guess I should add
C# = Microsoft.)

Final word in micro benchmarks, I know their limitiations (have done
lots of performance analysis over the years), but in the words of
Charles Babbage http://en.wikipedia.org/wiki/Charles_Babbage "Errors
using inadequate data are much less than those using no data at all".

Thanks all!
Alan

Reinier Zwitserloot

unread,

Jul 30, 2010, 3:19:48 AM7/30/10

to The Java Posse

nio is slower than io.

On Jul 29, 9:45 am, Christian Catchpole <christ...@catchpole.net>
wrote:

jitesh dundas

unread,

Jul 30, 2010, 9:37:19 AM7/30/10

to java...@googlegroups.com

RMI and other advanced memory concepts in java might interest you..

your ability to measure the performance of C/C++ and Java integration
might also be worth looking at..

Which is the best way to implement pointer-like functionalities in pure java..?

I like this concept, but java is not meant for all this..its more
distributed & network oriented rather than machine/memory functions
handling ..

jd

Kevin Wright

unread,

Jul 30, 2010, 9:51:10 AM7/30/10

to java...@googlegroups.com

For Raw throughput, on a single thread, with no blocking involved, nio is slower than io

But for highly concurrent multi-threaded applications with blocking, connection pooling, concern for both throughput and latency, etc...

You really have to clarify how you're defining "slower" :)

Unless you're talking about memory-mapped nio filechannels, that's often faster than the alternatives, even in single-threaded operation.

--
You received this message because you are subscribed to the Google Groups "The Java Posse" group.
To post to this group, send email to java...@googlegroups.com.
To unsubscribe from this group, send email to javaposse+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/javaposse?hl=en.

--
Kevin Wright

mail/google talk: kev.lee...@gmail.com
wave: kev.lee...@googlewave.com
skype: kev.lee.wright
twitter: @thecoda

Stuart McCulloch

unread,

Jul 30, 2010, 9:52:09 AM7/30/10

to java...@googlegroups.com

On 29 July 2010 17:37, Casper Bang <caspe...@gmail.com> wrote:

I'll still claim it's non-deterministic, you can not possibly know
certain aspects of the hardware (word length, branch predictor, hyper-
treading/pipeline-swap support, CAS-support etc.) and software (which
gen an object is in, what will be inlined, which compacting strategy,
where there are safe-points, context-switches and write-barriers).

Don't get me wrong, that's the wonder of the JVM/JIT. But sometimes it
would be nice to be able to just rely on AOT compilation so you'd
truly know what you are getting. Btw. I never really understood why
the JVM doesn't just cache an already JIT'ed memory image which can
then just be loaded next time without verifier and profiler running,
especially when running stuff in client-mode.

you mean something like this?

http://publib.boulder.ibm.com/infocenter/realtime/v1r0/topic/com.ibm.rt.doc.10/realtime/sample_running_aot.html

AOT compilation has actually been around for several years in specialized JVMs, but has not yet gone mainstream

some of the major issues are security, maintaining the cache/versioning, and runtime linking (sorting out offsets, etc.)

/Casper

On Jul 29, 10:58 am, Kirk <kirk.pepperd...@gmail.com> wrote:
> On Jul 29, 2010, at 10:45 AM, Ben Schulz wrote:
>
> > On 29 Jul., 10:15, Kirk <kirk.pepperd...@gmail.com> wrote:
> >> Nonsense.. :-) You may not understand what is going on but that doesn't mean it's not deterministic, very very deterministic.
> > A small excerpt from Josh Bloch's "Mind the Semantic Gap"[1]:
> >> [W]hen it does come time to optimize, the process is greatly complicated by the semantic gap. Consider this: Suppose you carefully write a well-designed microbenchmark that does everything right (e.g., warms up the VM, ensures that computation is not optimized away, times a sufficient amount of computation, does multiple timings to ensure repeatability). You run the benchmark, and see that after warmup, every run takes nearly the same amount of time. Happiness? You run the benchmark again, and again every run takes the same amount of time, but it's a different amount! You run the program twenty times, and see the results clustering into several groups, but always consistent within a program-run. What is going on?
>
> >> In modern VMs such as HotSpot, the task of deciding what to inline and when to inline it is performed by a background thread (at runtime). This process is called "compilation planning." Because it's multithreaded, it's nondeterministic.
>
> This isn't my experience.. I've found HotSpot to be very predictable. Case in point, I recently helped a client diagnose a luke-warm method problem related to hotspot compilation. We were able to completely predict if and when a key method would be compiled along with how it would be compiled. The effect that Josh is talking about is common when the execution profile changes over time. HotSpot decompiles, remeasures and then recompiles.
>
> One other point, microbenchmarking almost always involves creating code that confuses HotSpot so that it cannot apply the optimizations that would normally be applied. IOWs, it's not production code and the effects in production may be different than those found in the benchmark.
>
> To Josh's point, figuring out when these changes may happen isn't easy but it's not impossible.
>
> Regards,
> Kirk

--
You received this message because you are subscribed to the Google Groups "The Java Posse" group.
To post to this group, send email to java...@googlegroups.com.
To unsubscribe from this group, send email to javaposse+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/javaposse?hl=en.

--
Cheers, Stuart

jitesh dundas

unread,

Jul 30, 2010, 11:19:04 AM7/30/10

to java...@googlegroups.com

http://www.google.co.in/gwt/x?source=m&u=http%3A%2F%2Fpublib.boulder.ibm.com/infocenter/realtime/v2r0/topic/com.ibm.rt.doc.20/diag/tools/jitpd_aot_fail.html&wsi=89ce4db225e3d6ff&ei=aetSTKyOFZGguQPu8LGmAw&wsc=ti

have a look at this too...
jd

>> javaposse+...@googlegroups.com<javaposse%2Bunsu...@googlegroups.com>

Kirk

unread,

Jul 30, 2010, 12:02:02 PM7/30/10

to java...@googlegroups.com

Agreed, yet another language war isn't useful. I've written plenty of code for super computing environments in which C/C++ was the primary choice. In that environment I did exactly what you seem to be describing, I altered my coding style to fit the hardware. Teams that are writing low latency tolerant systems (trading) in Java also do the same. They alter their coding style to fit the JVM/Hardware platform they are deploying to. Although we've had more than a decade of rhetoric admonishing the practice, I personally see nothing wrong with that. In those cases the resulting systems are very competitive vs. their C++ counter parts. This isn't to say that Java should be used for all problems. Startup time is a problem for example so short running applications...

As for micro benchmarks, I've seem some really bad stuff done in the name of micro benchmarking and so I'm not sure that Mr. Babbage's great quote applies. He lived in a much different world than we do today ;-)

Regards,
Kirk

Steven Herod

unread,

Jul 31, 2010, 5:45:18 AM7/31/10

to The Java Posse

So, can I ask, do you actually *have* a performance problem right now?

What the size of this problem (Things take 1 minute when you need them
in 20 seconds?)

On Jul 22, 12:18 am, Alan Kent <ALAN.J.K...@saic.com> wrote:
> I was wondering what the fastest way (most highly performant) was to
> parse data structures serialized as an array of bytes. In my case its
> like a network packet (a true array of bytes where I need to peel of 1,
> 2, 4, and 8 byte integers, or variable length ASCII (8-bit) strings, etc.)
>
> Note I am after the FASTEST way to do this in Java (and/or Scala). Is
> it better to use the stream based classes, or is it better to do direct
> array accesses and do bit shift operations and masks with 0xff etc (to
> strip sign extension Java will do otherwise)? I suspect the stream
> based approaches would be slower. Sample code that sneaks in:
>
> byte b1 = buf[i++];
> byte b2 = buf[i++];
> byte b3 = buf[i++];
> byte b4 = buf[i++];
> int n = (b1 << 24) | ((b2 & 0xff) << 16) | ((b3 & 0xff) << 8) | (b4
> & 0xff); // Must mask with 0xff or else sign extension will mess up the
> result. Java does not have unsigned bytes or ints!

>
> I was looking into array bounds checks, and what I found via Google
> indicated that hotspot leaves in array bounds checks as there was only a
> minor performance improvement found in practice. This lead me to wonder
> if there is a faster way to do the code since I would be doing lots of
> array accesses, each with a bounds check.
>

> Just curious!
>
> Thanks!
> Alan

Reply all

Reply to author

Forward