Java generated code: faster equals() and hashCode()

848 views
Skip to first unread message

Daniel Augustin Pavel

unread,
Sep 25, 2009, 9:41:31 AM9/25/09
to prot...@googlegroups.com
Hello,

I've worked a bit on improving the perfomance of equals() and
hashCode() for the generated message clases.

The default implementations (from AbstractMessage) are (relatively)
slow; equals() consumes additional memory by allocating four throwaway
Maps per call, and hashCode() two Maps per call.
This might not be relevant for many people, but in my case doing stuff
like List.contains() is severly impacted.
The above does not apply to LITE messages from what I can tell, they
just use the standard Object implementation of these methods (which I
find quite inappropiate for the kind of data protobuf objects are
supposed to represent).

My implementation of equals() compares the two messages
field-by-field. In tests, for equal objects it is about an order of
magnitude faster than the default.

The hashCode() value is cached (the objects are supposed to be
immutable anyway), speeding up equals() in the not-equal case by
another order of magnitude.
For LITE messages, which don't use the AbstractMessage
implementations, the hashCode is computed by going through all the
fields.

The only downside is that the size of the generated classes increases
(additional bytecode), proportional to the number of fields in the
message.

Would love to hear feedback on the patch -- anyone finding it useful /
appropiate?

Cheers,
-pwr

P.S. The patch is for protobuf-2.2.0.

--
Time is an illusion. Lunchtime doubly so.
-- Douglas Adams

improved_equals.diff

Brice Figureau

unread,
Sep 25, 2009, 11:12:20 AM9/25/09
to Daniel Augustin Pavel, prot...@googlegroups.com
Hi,

On Fri, 2009-09-25 at 16:41 +0300, Daniel Augustin Pavel wrote:
> I've worked a bit on improving the perfomance of equals() and
> hashCode() for the generated message clases.
>

> [snipped]


>
> My implementation of equals() compares the two messages
> field-by-field. In tests, for equal objects it is about an order of
> magnitude faster than the default.
>
> The hashCode() value is cached (the objects are supposed to be
> immutable anyway), speeding up equals() in the not-equal case by
> another order of magnitude.

I think it would be safer to do this:
+ printer->Print(
+ "if (otherObject == this) return true;\n"
+ "if (!(otherObject instanceof $classname$)) return false;\n"
+ "if (hashCode() != otherObject.hashCode()) return false;\n"
+ "$classname$ other = ($classname$) otherObject;\n",
+ "classname", descriptor_->name());

(I reversed the hashCode test and the instanceof test)
This way if I happen to pbObject.equals(null) it won't throw a
NullPointerException, but just say they're not equals (which they are,
right?)

> For LITE messages, which don't use the AbstractMessage
> implementations, the hashCode is computed by going through all the
> fields.
>
> The only downside is that the size of the generated classes increases
> (additional bytecode), proportional to the number of fields in the
> message.
>
> Would love to hear feedback on the patch -- anyone finding it useful /
> appropiate?

This is a good addition. I really like it.
It'd be really great if it would be merged in the next revision.
--
Brice Figureau
My Blog: http://www.masterzen.fr/

Daniel Augustin Pavel

unread,
Sep 25, 2009, 12:07:05 PM9/25/09
to prot...@googlegroups.com
Aaand forgot to reply all. Apologies.

Cheers,
-pwr

---------- Forwarded message ----------
From: Daniel Augustin Pavel <daniel...@gmail.com>
Date: Fri, Sep 25, 2009 at 19:04
Subject: Re: Java generated code: faster equals() and hashCode()
To: Brice Figureau <bric...@daysofwonder.com>


On Fri, Sep 25, 2009 at 18:12, Brice Figureau <bric...@daysofwonder.com> wrote:
> I think it would be safer to do this:
> +  printer->Print(
> +    "if (otherObject == this) return true;\n"
> +    "if (!(otherObject instanceof $classname$)) return false;\n"
> +    "if (hashCode() != otherObject.hashCode()) return false;\n"
> +    "$classname$ other = ($classname$) otherObject;\n",
> +    "classname", descriptor_->name());
>
> (I reversed the hashCode test and the instanceof test)
> This way if I happen to pbObject.equals(null) it won't throw a
> NullPointerException, but just say they're not equals (which they are,
> right?)

Quite right.
I was under the impression that most equals() implementation happily
throw NPE when comparinga against a null, so did not think much of it.
 Checked now more closely a few classes in the JDK, and I was
obviously wrong.

I've attached the fixed patch.

Cheers,
-pwr

improved_equals_2.diff

Kenton Varda

unread,
Sep 25, 2009, 1:02:07 PM9/25/09
to Daniel Augustin Pavel, prot...@googlegroups.com
On Fri, Sep 25, 2009 at 6:41 AM, Daniel Augustin Pavel <daniel...@gmail.com> wrote:
Hello,

I've worked a bit on improving the perfomance of equals() and
hashCode() for the generated message clases.

Cool!

Unfortunately, increasing the generated code size really is a big problem.  Protobuf generated code is *huge*, and we're always looking for ways to reduce it.  We made a conscious decision not to implement equals() and hashCode() in generated code because, in our experience, these methods are rarely actually desired outside of tests, so the benefit of reducing the generated code size outweighs the cost of these methods being slow.

Can you tell us a bit about your use case?  Again, in my experience, people who are comparing message objects by content outside of tests are almost always doing something they don't need to be (usually they actually want to compare by identity instead).  But, I'm willing to be proven wrong.

BTW, another way you could possibly speed up equals() and hashCode() is to serialize the objects and compare raw bytes.  This can be done pretty easily in a wrapper class, without modifying the protobuf implementation.  You could also implement your hash-caching trick in a wrapper class, which alone could probably make a huge difference for List.contains().  Or we could perhaps implement the hash caching in the official implementation, since it wouldn't require new generated code.
 
The above does not apply to LITE messages from what I can tell, they
just use the standard Object implementation of these methods (which I
find quite inappropiate for the kind of data protobuf objects are
supposed to represent).

The goal of "lite" messages is to shed features in order to reduce code size.  equals() and hashCode() were two of the casualties.  Since lite messages don't support reflection, our options were either to generate more code (bad -- lite users are exactly the users who don't want more code) or not implement them.

David Yu

unread,
Sep 25, 2009, 1:56:30 PM9/25/09
to Kenton Varda, Daniel Augustin Pavel, prot...@googlegroups.com
I do agree that this should not be applied for LITE_RUNTIME because of the lite runtime goal.
However an extra option could be added where these equals() and hashCode() are implemented.
LITE_RUNTIME_EXTRA maybe. (any name is fine)

Those 2 methods are definitely needed in most cases.

Cheers
Reply all
Reply to author
Forward
0 new messages