Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

hashCode

208 views
Skip to first unread message

bob smith

unread,
Aug 10, 2012, 11:47:12 AM8/10/12
to
Is it always technically correct to override the hashCode function like so:

@Override
public int hashCode() {
return 1;
}

Would it be potentially better if that was Object's implementation?

Arne Vajhøj

unread,
Aug 10, 2012, 12:13:11 PM8/10/12
to
On 8/10/2012 11:47 AM, bob smith wrote:
> Is it always technically correct to override the hashCode function like so:
>
> @Override
> public int hashCode() {
> return 1;
> }

It meets the minimum contractual obligation for that method.

But performance of anything using the hash capability (like HashMap<>)
would be very bad.

> Would it be potentially better if that was Object's implementation?

It has a different behavior that may not be valid if you override equals.

Arne


Eric Sosman

unread,
Aug 10, 2012, 12:34:28 PM8/10/12
to
On 8/10/2012 11:47 AM, bob smith wrote:
Define "better."

--
Eric Sosman
eso...@ieee-dot-org.invalid

markspace

unread,
Aug 10, 2012, 1:13:14 PM8/10/12
to
I think at this point we are doing Bob's homework for him.



Arne Vajhøj

unread,
Aug 10, 2012, 1:38:19 PM8/10/12
to
Could be.

But I think the question whether returning a constant from
hashCode is a valid Java question for understanding the
contract for that method.

And I am pretty sure that I have seen other similar
examples (just with 42 as constant).

Arne


bob smith

unread,
Aug 10, 2012, 2:39:05 PM8/10/12
to
From: bob smith <b...@coolfone.comze.com>

Is it always technically correct to override the hashCode function like so:

@Override
public int hashCode() {
return 1;
}

Would it be potentially better if that was Object's implementation?

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

Arne Vajhøj

unread,
Aug 10, 2012, 2:39:05 PM8/10/12
to
To: bob smith
From: Arne Vajhoj <ar...@vajhoej.dk>

On 8/10/2012 11:47 AM, bob smith wrote:
> Is it always technically correct to override the hashCode function like so:
>
> @Override
> public int hashCode() {
> return 1;
> }

It meets the minimum contractual obligation for that method.

But performance of anything using the hash capability (like HashMap<>) would be
very bad.

> Would it be potentially better if that was Object's implementation?

It has a different behavior that may not be valid if you override equals.

Arne

Eric Sosman

unread,
Aug 10, 2012, 2:39:05 PM8/10/12
to
To: bob smith
From: Eric Sosman <eso...@ieee-dot-org.invalid>

On 8/10/2012 11:47 AM, bob smith wrote:
> Is it always technically correct to override the hashCode function like so:
>
> @Override
> public int hashCode() {
> return 1;
> }
>
> Would it be potentially better if that was Object's implementation?

Define "better."

--
Eric Sosman
eso...@ieee-dot-org.invalid

markspace

unread,
Aug 10, 2012, 2:39:06 PM8/10/12
to
To: Arne Vajhøj
From: markspace <-@.>

On 8/10/2012 9:13 AM, Arne Vajhoj wrote:
> On 8/10/2012 11:47 AM, bob smith wrote:
>> Is it always technically correct to override the hashCode function
>> like so:
>>
>> @Override
>> public int hashCode() {
>> return 1;
>> }
>
> It meets the minimum contractual obligation for that method.
>
> But performance of anything using the hash capability (like HashMap<>)
> would be very bad.
>
>> Would it be potentially better if that was Object's implementation?
>
> It has a different behavior that may not be valid if you override equals.


I think at this point we are doing Bob's homework for him.

Arne Vajhøj

unread,
Aug 10, 2012, 2:39:06 PM8/10/12
to
To: markspace
From: Arne Vajhoj <ar...@vajhoej.dk>

On 8/10/2012 1:13 PM, markspace wrote:
> On 8/10/2012 9:13 AM, Arne Vajhoj wrote:
>> On 8/10/2012 11:47 AM, bob smith wrote:
>>> Is it always technically correct to override the hashCode function
>>> like so:
>>>
>>> @Override
>>> public int hashCode() {
>>> return 1;
>>> }
>>
>> It meets the minimum contractual obligation for that method.
>>
>> But performance of anything using the hash capability (like HashMap<>)
>> would be very bad.
>>
>>> Would it be potentially better if that was Object's implementation?
>>
>> It has a different behavior that may not be valid if you override equals.
>
> I think at this point we are doing Bob's homework for him.

Could be.

But I think the question whether returning a constant from hashCode is a valid
Java question for understanding the contract for that method.

And I am pretty sure that I have seen other similar examples (just with 42 as
constant).

Arne

Roedy Green

unread,
Aug 10, 2012, 3:17:29 PM8/10/12
to
On Fri, 10 Aug 2012 08:47:12 -0700 (PDT), bob smith
<b...@coolfone.comze.com> wrote, quoted or indirectly quoted someone
who said :

> @Override
> public int hashCode() {
> return 1;
> }

that's about the worst possible hashCode function.

See http://mindprod.com/jgloss/hashcode.html for tips on how to write
good ones.
--
Roedy Green Canadian Mind Products http://mindprod.com
A new scientific truth does not triumph by convincing its opponents and making them see the light,
but rather because its opponents eventually die, and a new generation grows up that is familiar with it.
~ Max Planck 1858-04-23 1947-10-04


Lew

unread,
Aug 10, 2012, 3:45:07 PM8/10/12
to Roedy Green
Roedy Green wrote:
> bob smith wrote, quoted or indirectly quoted someone
> who said :
>
>> @Override
> > public int hashCode() {
> > return 1;
> > }
>
> that's about the worst possible hashCode function.

Normally that's correct, but it's conceivable that one might do it for
some hackish reason. In most situations where one might do such
an override as this, one would do better not to override hashCode().

> See http://mindprod.com/jgloss/hashcode.html for tips on how to write
> good ones.

The default of assembling it via the mix-in of attribute hash codes
using the Knuth constants is usually good enough.

h(object) = Sum(i ∈ 0.. n-1) of ( attribute[i] * pow(31, n - 1 - i) );

or

public static int calculateHash(Foo arg) {
int h = 0;

for ( each attribute of Foo that contributes to 'equals()' )
{
h = 31 * h + attribute.hashCode();
}
return h;
}

http://en.wikipedia.org/wiki/Hash_function
http://en.wikipedia.org/wiki/Java_hashCode()
http://en.wikipedia.org/wiki/Java_hashCode()#Java

--
Lew

bob smith

unread,
Aug 10, 2012, 6:22:34 PM8/10/12
to
Better in the sense that you would never HAVE to override hashCode.

Now, there are cases where you HAVE to override it, or your code is very broken.

Lew

unread,
Aug 10, 2012, 6:32:36 PM8/10/12
to
bob smith wrote:
> Eric Sosman wrote:
>> bob smith wrote:
>>> Is it always technically correct to override the hashCode function like so:

It complies with the contract for 'hashCode()'. Is that all it takes to be correct?

>>> Would it be potentially better if that was Object's implementation?

Would what be better if what were Object's implementation of what?

>> Define "better."

> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very broken.

No.

No matter what 'Object''s 'hashCode()' implementation were, it would need to
be overridden when you want value equality instead of object identity for
'equals()'.

See Joshua Bloch's seminal work /Effective Java/, which has items that pertain to this.

Bottom line: 'hashCode()', 'equals()', and when present, 'compareTo()' must be consistent.

'toString()' should be consistent with those.

As long as 'hashCode()' fulfills the contract, your code will work - functionally. But a bad
'hashCode()' could and likely will noticeably affect performance. There is more to correctness
than mere functional conformance.

--
Lew

Arne Vajhøj

unread,
Aug 10, 2012, 7:25:27 PM8/10/12
to
On 8/10/2012 3:17 PM, Roedy Green wrote:
> On Fri, 10 Aug 2012 08:47:12 -0700 (PDT), bob smith
> <b...@coolfone.comze.com> wrote, quoted or indirectly quoted someone
> who said :
>
>> @Override
>> public int hashCode() {
>> return 1;
>> }
>
> that's about the worst possible hashCode function.

Yes, but the posted asked "Is it always technically correct to ..."
not whether is was "best possible".

Arne


Arne Vajhøj

unread,
Aug 10, 2012, 7:27:00 PM8/10/12
to
On 8/10/2012 6:22 PM, bob smith wrote:
> On Friday, August 10, 2012 11:34:28 AM UTC-5, Eric Sosman wrote:
>> On 8/10/2012 11:47 AM, bob smith wrote:
>>> Is it always technically correct to override the hashCode function like so:
>>> @Override
>>> public int hashCode() {
>>> return 1;
>>> }
>>> Would it be potentially better if that was Object's implementation?
>>
>> Define "better."
>
> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very broken.

It is not broken.

It will perform poorly in many cases.

Arne

Arne Vajhøj

unread,
Aug 10, 2012, 7:30:33 PM8/10/12
to
On 8/10/2012 6:32 PM, Lew wrote:
> bob smith wrote:
>> Now, there are cases where you HAVE to override it, or your code is very broken.
>
> No.

> As long as 'hashCode()' fulfills the contract, your code will work - functionally. But a bad
> 'hashCode()' could and likely will noticeably affect performance. There is more to correctness
> than mere functional conformance.

If the code per specs is guaranteed to work then it is correct.

Good (or just decent) performance is not necessary for code to
be correct.

At least not in the traditional programming terminology.

In plain English maybe.

Arne


rossum

unread,
Aug 11, 2012, 5:36:15 AM8/11/12
to
On Fri, 10 Aug 2012 13:38:19 -0400, Arne Vajhøj <ar...@vajhoej.dk>
wrote:

>And I am pretty sure that I have seen other similar
>examples (just with 42 as constant).
The magic word is "Bloch".

rossum

Roedy Green

unread,
Aug 11, 2012, 7:54:09 AM8/11/12
to
On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewb...@gmail.com>
wrote, quoted or indirectly quoted someone who said :

> h =3D 31 * h + attribute.hashCode();
> }
In my essay I recommend XOR which is an inherentely faster operation
than multiply. I wonder which actually works out better. If you had a
large number of fields, the multiply effect could fall off the left
hand end. It is the algorithm used for String which could have very
long strings, so Sun must have thought of that.

Eric Sosman

unread,
Aug 11, 2012, 7:58:20 AM8/11/12
to
On 8/10/2012 6:22 PM, bob smith wrote:
[... many blank lines removed for legibility's sake ...]
> On Friday, August 10, 2012 11:34:28 AM UTC-5, Eric Sosman wrote:
>> On 8/10/2012 11:47 AM, bob smith wrote:
>>
>>> Is it always technically correct to override the hashCode function like so:
>>>
>>> @Override
>>> public int hashCode() {
>>> return 1;
>>> }
>>>
>>> Would it be potentially better if that was Object's implementation?
>>
>> Define "better."
>
> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very broken.

I cannot think of a case where you HAVE to override hashCode(),
except as a consequence of other choices that you didn't HAVE to
make. You don't HAVE to invent classes where distinct instances
are considered equal, and even if you do you don't HAVE to put those
instances in HashMaps or HashSets or whatever.

But that's a bit specious: All it says is that you don't HAVE
to override hashCode() because you don't HAVE to use things that
call it. It's like "You don't HAVE to pay taxes, because you don't
HAVE to live outside prison." So, let's take it as a given that
you will often need to write classes that override equals() and
hashCode() -- I imagine you understand that they go together.

Okay: Then returning a constant 1 (or 42 or 0 or whatever)
would in fact satisfy the letter of the law regarding hashCode():
Whenever x.equals(y) is true, x.hashCode() == y.hashCode(). In
your example this would be trivially true because x,y,z,... all
have the same hashCode() value, whether they're equal or not --
You have lived up to the letter of the law.

Of course, such a hashCode() would make all those hash-based
containers pretty much useless: They would work in the sense that
they would get the Right Answer, but they'd be abominably slow,
with expected performance of O(N) instead of O(1). See
<http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/>
for a survey of some denial-of-service attacks that work by driving
hash tables from O(1) to O(N), resulting in catastrophic failure
of the attacked system.

In other words, the letter of the law on hashCode() is a bare
minimum that guarantees correct functioning, but it is not enough
to guarantee usability. Why isn't the law more specific? Because
nobody knows how to write "hashCode() must be correct *and* usable"
in terms that would cover all the classes all the Java programmers
have dreamed up and will dream up. Your hashCode() meets the bare
minimum requirement, but is not "usable." The actual hashCode()
provided by Object also meets the bare minimum requirement, and *is*
usable as it stands, until (and unless; you don't HAVE to) you
choose to implement other equals() semantics, and a hashCode() to
match them.


--
Eric Sosman
eso...@ieee-dot-org.invalid

Eric Sosman

unread,
Aug 11, 2012, 8:00:01 AM8/11/12
to
He also asked whether it would "be potentially better."

--
Eric Sosman
eso...@ieee-dot-org.invalid

Jan Burse

unread,
Aug 11, 2012, 9:33:05 AM8/11/12
to
bob smith schrieb:
Maybe it would make sense to spell out what the contract
for hashCode() is. Well the contract is simply, the
following invariant should hold:

/* invariant that should hold */
if a.equals(b) then a.hashCode()==b.hashCode()

It should be noted that this does not imply:

/* not implied and thus not required by the invariant */
if a.hashCode()==b.hashCode() then a.equals(b)

It is also quite unlikely that a hashCode() would satisfy
the later, although the closer it comes to the later, the
better it works for HashMap, etc..

The default objects implementation of hashCode() matches
the default objects impementation of equals(). The default
objcts implementation of equals() is ==. And the default
objects implementation of hashCode() is
System.identityHashCode().

The System identity hash code is stored in the object
and generated by the system. It does not change during
GC although the internal object address might change
during GC. It is only 32bit although internal object
addresses might by 64bit with a corresponding JVM.

Returning a constant, any constant c not only 1, would be
technically correct correct for the default implementation
of the class object. Since it trivially satisfies the
invariant:

if a.equals(b) then c==c

is trivially true, since c==c is true. But it is not
better. Since you would get very degenerated HashMaps,
etc..

You need to override the hashhCode() when there is danger
that the invariant is not anymore satisified. This is
not the case when equals() is not overridden. So overriding
hashCode() just for fun when equals() is not overriden,
usually doesn't make sense. It will probably only slow
down the hashCode() calculation. So the following:

hashCode() = sum attr_i * c^i

Is not necessary. But it would be a possible way to go
when equals() were overriden in the following way:

equals(other) = and_i attr_i.equals(other.attr_i)

The above happens when you turn your object into a container
of other objects irrespective of the own object identity.
But beware if the container contains itself somewhere. This
is why we find in the code for Hashtable the following
complication:

public synchronized int hashCode() {
/*
* This code detects the recursion caused by computing the hash code
* of a self-referential hash table and prevents the stack overflow
* that would otherwise result. This allows certain 1.1-era
* applets with self-referential hash tables to work. This code
* abuses the loadFactor field to do double-duty as a hashCode
* in progress flag, so as not to worsen the space performance.
* A negative load factor indicates that hash code computation is
* in progress.
*/

Interestingly it will return a constant for the object when
it detects a loop. Maybe one could do better... Dunno

Bye


Jan Burse

unread,
Aug 11, 2012, 9:34:16 AM8/11/12
to
Jan Burse schrieb:
> during GC. It is only 32bit although internal object
> addresses might by 64bit with a corresponding JVM.

Typically even less bits, since the same space
is used for some object flags.

Arne Vajhøj

unread,
Aug 11, 2012, 9:49:02 AM8/11/12
to
On 8/11/2012 8:00 AM, Eric Sosman wrote:
> On 8/10/2012 7:25 PM, Arne Vajh�j wrote:
>> On 8/10/2012 3:17 PM, Roedy Green wrote:
>>> On Fri, 10 Aug 2012 08:47:12 -0700 (PDT), bob smith
>>> <b...@coolfone.comze.com> wrote, quoted or indirectly quoted someone
>>> who said :
>>>
>>>> @Override
>>>> public int hashCode() {
>>>> return 1;
>>>> }
>>>
>>> that's about the worst possible hashCode function.
>>
>> Yes, but the posted asked "Is it always technically correct to ..."
>> not whether is was "best possible".
>
> He also asked whether it would "be potentially better."

"better to use Object hashCode" which again should bring the
correctness question before the performance question.

Arne


Joerg Meier

unread,
Aug 11, 2012, 12:25:11 PM8/11/12
to
On Sat, 11 Aug 2012 04:54:09 -0700, Roedy Green wrote:

> On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewb...@gmail.com>
> wrote, quoted or indirectly quoted someone who said :
>> h =3D 31 * h + attribute.hashCode();
>> }
> In my essay I recommend XOR which is an inherentely faster operation
> than multiply.

Hasn't that been wrong since about the invention of the 80386 processor
family ? Pretty sure by now MUL and XOR both take one cycle and that's it.


Liebe Gruesse,
Joerg

--
Ich lese meine Emails nicht, replies to Email bleiben also leider
ungelesen.

Peter Duniho

unread,
Aug 11, 2012, 12:56:50 PM8/11/12
to
On Sat, 11 Aug 2012 04:54:09 -0700, Roedy Green wrote:

> On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewb...@gmail.com>
> wrote, quoted or indirectly quoted someone who said :
>
>> h =3D 31 * h + attribute.hashCode();
>> }
> In my essay I recommend XOR which is an inherentely faster operation
> than multiply. I wonder which actually works out better. If you had a
> large number of fields, the multiply effect could fall off the left
> hand end. It is the algorithm used for String which could have very
> long strings, so Sun must have thought of that.

Lew's example is better than XOR. I think you should fix your essay.

There are of course even better approaches if one knows something specific
about the input data. But the "multiply previous by prime, add next value"
is a reasonably standard, "pretty good" composition function.

XOR a well-known problems in the context of hash codes: in many situations,
large numbers of combinations of values wind up mapping to the same result.
Consider a pair of integers: any time those values are identical, the
result of XOR is 0. On the other hand, with the multiply-and-add approach,
it's much less common for "typical" integer values to combine to a hash
value of 0.

(Integers are an important scenario, because a 32-bit integer naturally is
its own hash value...there's no reason to try to mix the bits any further
as is done with other data types, so they don't).

Now (unless I've done my math wrong) if your component hash codes are
actually well-distributed across the entire range of a 32-bit integer (i.e.
are completely uncorrelated to each other and have entirely random
distribution), the XOR should work fine. It's just that it's very
susceptible to creating hash value collisions when the input values aren't
themselves well-distributed and so as a general technique it's not nearly
as good as multiply-and-add.

As far as performance goes, even if the multiply-and-add costs more (and as
Joerg points out, this may only be a 1-cycle vs 2-cycle difference anyway
on that specific operation), your code should not be spending a lot of time
computing hash values in the first place. If that's a significant
bottleneck, there are better ways to improve performance than worrying
about XOR vs multiply-and-add, especially since a poor hash function can
hurt performance much worse than using the wrong math operation would.

Pete

Jan Burse

unread,
Aug 11, 2012, 12:58:04 PM8/11/12
to
Roedy Green schrieb:
> If you had a
> large number of fields, the multiply effect could fall off the left
> hand end.

Actually this does not happen, since you multiply with 31,
which is 1+2+4+8+16. So that:

a*31+b = a*16+a*8+a*4+a*2+a+b

So for a HashMap that uses an index = hash & (2^n - 1) (which is
the same as hash mod 2^n), the impact of a will be still seen,
even when it occurs at the very left hand side.

There is some Microsoft C# HashMap implementation which does
not use mod 2^n, but instead some primes. In case the
implementation choses 31 as the designated prime, all
information but for the first field will be lost. But since
mod 2^32 is also applied, this might not be completely true.

For 2^n I don't know exactly how the impact could be
described. I guess in a HashMap with index = hash mod 2^1 the
hash amounts to a parity bit, since the sum in a+b acts like
an xor on the first right hand bit. But 2^n with n>1 the
31 multiplication is a little more crude.

rossum

unread,
Aug 11, 2012, 1:47:55 PM8/11/12
to
On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewb...@gmail.com>
wrote:

>public static int calculateHash(Foo arg) {
> int h = 0;
>
> for ( each attribute of Foo that contributes to 'equals()' )
> {
> h = 31 * h + attribute.hashCode();
> }
> return h;
>}
Bloch starts with:

int h = 17;

He says that works beter in cases where the first one or more
attribute.hashCode() values are zero, and hence will not register.

He suggessts any constant non-zero value.

rossum

Mike Winter

unread,
Aug 11, 2012, 1:53:31 PM8/11/12
to
On 11/08/2012 17:25, Joerg Meier wrote:
> On Sat, 11 Aug 2012 04:54:09 -0700, Roedy Green wrote:
>[...]
>> In my essay I recommend XOR which is an inherentely faster operation
>> than multiply.
>
> Hasn't that been wrong since about the invention of the 80386 processor
> family ?

Not that far back: the Pentium required 9-11 cycles to complete a MUL
instruction compared to 1-3 for XOR (and the like), depending on operand
locations and widths.

> Pretty sure by now MUL and XOR both take one cycle and that's it.

More-or-less, but the former is still slower for wider operands.
However, your point is well-taken: it needn't be as much a concern in
most cases.

--
Mike Winter
Replace ".invalid" with ".uk" to reply by e-mail.

Roedy Green

unread,
Aug 11, 2012, 2:17:49 PM8/11/12
to
To: bob smith
From: Roedy Green <see_w...@mindprod.com.invalid>

On Fri, 10 Aug 2012 08:47:12 -0700 (PDT), bob smith
<b...@coolfone.comze.com> wrote, quoted or indirectly quoted someone
who said :

> @Override
> public int hashCode() {
> return 1;
> }

that's about the worst possible hashCode function.

See http://mindprod.com/jgloss/hashcode.html for tips on how to write good
ones.
--
Roedy Green Canadian Mind Products http://mindprod.com A new scientific truth
does not triumph by convincing its opponents and making them see the light,
but rather because its opponents eventually die, and a new generation grows up
that is familiar with it.
~ Max Planck 1858-04-23 1947-10-04

Lew

unread,
Aug 11, 2012, 2:17:50 PM8/11/12
to
To: Roedy Green
From: Lew <lewb...@gmail.com>

Roedy Green wrote:
> bob smith wrote, quoted or indirectly quoted someone
> who said :
>
>> @Override
> > public int hashCode() {
> > return 1;
> > }
>
> that's about the worst possible hashCode function.

Normally that's correct, but it's conceivable that one might do it for some
hackish reason. In most situations where one might do such an override as this,
one would do better not to override hashCode().

> See http://mindprod.com/jgloss/hashcode.html for tips on how to write
> good ones.

The default of assembling it via the mix-in of attribute hash codes using the
Knuth constants is usually good enough.

h(object) = Sum(i rnn 0.. n-1) of ( attribute[i] * pow(31, n - 1 - i) );

or

public static int calculateHash(Foo arg) {
int h = 0;

for ( each attribute of Foo that contributes to 'equals()' )
{
h = 31 * h + attribute.hashCode();
}
return h;
}

bob smith

unread,
Aug 11, 2012, 2:17:51 PM8/11/12
to
To: Eric Sosman
From: bob smith <b...@coolfone.comze.com>

On Friday, August 10, 2012 11:34:28 AM UTC-5, Eric Sosman wrote:
> On 8/10/2012 11:47 AM, bob smith wrote:
>
> > Is it always technically correct to override the hashCode function like so:
>
> >
>
> > @Override
>
> > public int hashCode() {
>
> > return 1;
>
> > }
>
> >
>
> > Would it be potentially better if that was Object's implementation?
>
>
>
> Define "better."
>
>
>
> --
>
> Eric Sosman
>
> eso...@ieee-dot-org.invalid

Better in the sense that you would never HAVE to override hashCode.

Now, there are cases where you HAVE to override it, or your code is very
broken.

Lew

unread,
Aug 11, 2012, 2:17:51 PM8/11/12
to
To: bob smith
From: Lew <lewb...@gmail.com>

bob smith wrote:
> Eric Sosman wrote:
>> bob smith wrote:
>>> Is it always technically correct to override the hashCode function like so:

It complies with the contract for 'hashCode()'. Is that all it takes to be
correct?

>>> Would it be potentially better if that was Object's implementation?

Would what be better if what were Object's implementation of what?

>> Define "better."

> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very
broken.

No.

No matter what 'Object''s 'hashCode()' implementation were, it would need to be
overridden when you want value equality instead of object identity for
'equals()'.

See Joshua Bloch's seminal work /Effective Java/, which has items that pertain
to this.

Bottom line: 'hashCode()', 'equals()', and when present, 'compareTo()' must be
consistent.

'toString()' should be consistent with those.

As long as 'hashCode()' fulfills the contract, your code will work -
functionally. But a bad
'hashCode()' could and likely will noticeably affect performance. There is more
to correctness
than mere functional conformance.

--
Lew

Arne Vajh�j

unread,
Aug 11, 2012, 2:17:53 PM8/11/12
to
To: Roedy Green
From: Arne Vajhoj <ar...@vajhoej.dk>

On 8/10/2012 3:17 PM, Roedy Green wrote:
> On Fri, 10 Aug 2012 08:47:12 -0700 (PDT), bob smith
> <b...@coolfone.comze.com> wrote, quoted or indirectly quoted someone
> who said :
>
>> @Override
>> public int hashCode() {
>> return 1;
>> }
>
> that's about the worst possible hashCode function.

Yes, but the posted asked "Is it always technically correct to ..." not whether
is was "best possible".

Arne

Arne Vajhøj

unread,
Aug 11, 2012, 2:17:53 PM8/11/12
to
To: Lew
From: Arne Vajhoj <ar...@vajhoej.dk>

On 8/10/2012 6:32 PM, Lew wrote:
> bob smith wrote:
>> Now, there are cases where you HAVE to override it, or your code is very
broken.
>
> No.

> As long as 'hashCode()' fulfills the contract, your code will work -
functionally. But a bad
> 'hashCode()' could and likely will noticeably affect performance. There is
more to correctness
> than mere functional conformance.

If the code per specs is guaranteed to work then it is correct.

Good (or just decent) performance is not necessary for code to be correct.

At least not in the traditional programming terminology.

In plain English maybe.

Arne

rossum

unread,
Aug 11, 2012, 2:17:54 PM8/11/12
to
To: Arne Vajhøj
From: rossum <ross...@coldmail.com>

On Fri, 10 Aug 2012 13:38:19 -0400, Arne Vajh-,j <ar...@vajhoej.dk> wrote:

>And I am pretty sure that I have seen other similar
>examples (just with 42 as constant).
The magic word is "Bloch".

rossum

Roedy Green

unread,
Aug 11, 2012, 2:17:54 PM8/11/12
to
To: Lew
From: Roedy Green <see_w...@mindprod.com.invalid>

On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewb...@gmail.com> wrote,
quoted or indirectly quoted someone who said :

> h =3D 31 * h + attribute.hashCode();
> }
In my essay I recommend XOR which is an inherentely faster operation than
multiply. I wonder which actually works out better. If you had a large number
of fields, the multiply effect could fall off the left hand end. It is the
algorithm used for String which could have very long strings, so Sun must have
thought of that.
--
Roedy Green Canadian Mind Products http://mindprod.com A new scientific truth
does not triumph by convincing its opponents and making them see the light,
but rather because its opponents eventually die, and a new generation grows up
that is familiar with it.
~ Max Planck 1858-04-23 1947-10-04

Eric Sosman

unread,
Aug 11, 2012, 2:17:54 PM8/11/12
to
To: bob smith
From: Eric Sosman <eso...@ieee-dot-org.invalid>

On 8/10/2012 6:22 PM, bob smith wrote: [... many blank lines removed for
legibility's sake ...]
> On Friday, August 10, 2012 11:34:28 AM UTC-5, Eric Sosman wrote:
>> On 8/10/2012 11:47 AM, bob smith wrote:
>>
>>> Is it always technically correct to override the hashCode function like so:
>>>
>>> @Override
>>> public int hashCode() {
>>> return 1;
>>> }
>>>
>>> Would it be potentially better if that was Object's implementation?
>>
>> Define "better."
>
> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very
broken.

Eric Sosman

unread,
Aug 11, 2012, 2:17:55 PM8/11/12
to
To: Arne Vajhøj
From: Eric Sosman <eso...@ieee-dot-org.invalid>

On 8/10/2012 7:25 PM, Arne Vajhoj wrote:
> On 8/10/2012 3:17 PM, Roedy Green wrote:
>> On Fri, 10 Aug 2012 08:47:12 -0700 (PDT), bob smith
>> <b...@coolfone.comze.com> wrote, quoted or indirectly quoted someone
>> who said :
>>
>>> @Override
>>> public int hashCode() {
>>> return 1;
>>> }
>>
>> that's about the worst possible hashCode function.
>
> Yes, but the posted asked "Is it always technically correct to ..."
> not whether is was "best possible".

He also asked whether it would "be potentially better."

--
Eric Sosman
eso...@ieee-dot-org.invalid

Jan Burse

unread,
Aug 11, 2012, 2:17:55 PM8/11/12
to
To: bob smith
From: Jan Burse <janb...@fastmail.fm>

bob smith schrieb:
> Is it always technically correct to override the hashCode function like so:
>
> @Override
> public int hashCode() {
> return 1;
> }
>
> Would it be potentially better if that was Object's implementation?
>

Maybe it would make sense to spell out what the contract for hashCode() is.
Well the contract is simply, the following invariant should hold:

/* invariant that should hold */
if a.equals(b) then a.hashCode()==b.hashCode()

It should be noted that this does not imply:

/* not implied and thus not required by the invariant */
if a.hashCode()==b.hashCode() then a.equals(b)

It is also quite unlikely that a hashCode() would satisfy the later, although
the closer it comes to the later, the better it works for HashMap, etc..

The default objects implementation of hashCode() matches the default objects
impementation of equals(). The default objcts implementation of equals() is ==.
And the default objects implementation of hashCode() is
System.identityHashCode().

The System identity hash code is stored in the object and generated by the
system. It does not change during GC although the internal object address might
change during GC. It is only 32bit although internal object addresses might by
64bit with a corresponding JVM.

Jan Burse

unread,
Aug 11, 2012, 2:17:55 PM8/11/12
to
To: Jan Burse
From: Jan Burse <janb...@fastmail.fm>

Jan Burse schrieb:
> during GC. It is only 32bit although internal object
> addresses might by 64bit with a corresponding JVM.

Typically even less bits, since the same space is used for some object flags.

Arne Vajhøj

unread,
Aug 11, 2012, 2:17:55 PM8/11/12
to
To: Eric Sosman
From: Arne Vajhoj <ar...@vajhoej.dk>

On 8/11/2012 8:00 AM, Eric Sosman wrote:
> On 8/10/2012 7:25 PM, Arne Vajhoj wrote:
>> On 8/10/2012 3:17 PM, Roedy Green wrote:
>>> On Fri, 10 Aug 2012 08:47:12 -0700 (PDT), bob smith
>>> <b...@coolfone.comze.com> wrote, quoted or indirectly quoted someone
>>> who said :
>>>
>>>> @Override
>>>> public int hashCode() {
>>>> return 1;
>>>> }
>>>
>>> that's about the worst possible hashCode function.
>>
>> Yes, but the posted asked "Is it always technically correct to ..."
>> not whether is was "best possible".
>
> He also asked whether it would "be potentially better."

"better to use Object hashCode" which again should bring the correctness
question before the performance question.

Arne

Joerg Meier

unread,
Aug 11, 2012, 2:17:56 PM8/11/12
to
To: Roedy Green
From: Joerg Meier <joerg...@arcor.de>

On Sat, 11 Aug 2012 04:54:09 -0700, Roedy Green wrote:

> On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewb...@gmail.com>
> wrote, quoted or indirectly quoted someone who said :
>> h =3D 31 * h + attribute.hashCode();
>> }
> In my essay I recommend XOR which is an inherentely faster operation
> than multiply.

Hasn't that been wrong since about the invention of the 80386 processor family
? Pretty sure by now MUL and XOR both take one cycle and that's it.


Liebe Gruesse,
Joerg

--
Ich lese meine Emails nicht, replies to Email bleiben also leider ungelesen.

Arne Vajhøj

unread,
Aug 11, 2012, 2:17:53 PM8/11/12
to
To: bob smith
From: Arne Vajhoj <ar...@vajhoej.dk>

On 8/10/2012 6:22 PM, bob smith wrote:
> On Friday, August 10, 2012 11:34:28 AM UTC-5, Eric Sosman wrote:
>> On 8/10/2012 11:47 AM, bob smith wrote:
>>> Is it always technically correct to override the hashCode function like so:
>>> @Override
>>> public int hashCode() {
>>> return 1;
>>> }
>>> Would it be potentially better if that was Object's implementation?
>>
>> Define "better."
>
> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very
broken.

It is not broken.

It will perform poorly in many cases.

Arne

Peter Duniho

unread,
Aug 11, 2012, 2:17:56 PM8/11/12
to
To: Roedy Green
From: Peter Duniho <NpOeS...@NnOwSlPiAnMk.com>

On Sat, 11 Aug 2012 04:54:09 -0700, Roedy Green wrote:

> On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewb...@gmail.com>
> wrote, quoted or indirectly quoted someone who said :
>
>> h =3D 31 * h + attribute.hashCode();
>> }
> In my essay I recommend XOR which is an inherentely faster operation
> than multiply. I wonder which actually works out better. If you had a
> large number of fields, the multiply effect could fall off the left
> hand end. It is the algorithm used for String which could have very
> long strings, so Sun must have thought of that.

Jan Burse

unread,
Aug 11, 2012, 2:17:56 PM8/11/12
to
To: Roedy Green
From: Jan Burse <janb...@fastmail.fm>

Roedy Green schrieb:
> If you had a
> large number of fields, the multiply effect could fall off the left
> hand end.

Actually this does not happen, since you multiply with 31, which is 1+2+4+8+16.
So that:

a*31+b = a*16+a*8+a*4+a*2+a+b

So for a HashMap that uses an index = hash & (2^n - 1) (which is the same as
hash mod 2^n), the impact of a will be still seen, even when it occurs at the
very left hand side.

There is some Microsoft C# HashMap implementation which does not use mod 2^n,
but instead some primes. In case the implementation choses 31 as the designated
prime, all information but for the first field will be lost. But since mod 2^32
is also applied, this might not be completely true.

For 2^n I don't know exactly how the impact could be described. I guess in a
HashMap with index = hash mod 2^1 the hash amounts to a parity bit, since the
sum in a+b acts like an xor on the first right hand bit. But 2^n with n>1 the
31 multiplication is a little more crude.

Mike Winter

unread,
Aug 11, 2012, 3:19:38 PM8/11/12
to
To: Joerg Meier
From: Mike Winter <use...@michael-winter.me.invalid>

On 11/08/2012 17:25, Joerg Meier wrote:
> On Sat, 11 Aug 2012 04:54:09 -0700, Roedy Green wrote:
>[...]
>> In my essay I recommend XOR which is an inherentely faster operation
>> than multiply.
>
> Hasn't that been wrong since about the invention of the 80386 processor
> family ?

Not that far back: the Pentium required 9-11 cycles to complete a MUL
instruction compared to 1-3 for XOR (and the like), depending on operand
locations and widths.

> Pretty sure by now MUL and XOR both take one cycle and that's it.

More-or-less, but the former is still slower for wider operands. However, your
point is well-taken: it needn't be as much a concern in most cases.

--
Mike Winter
Replace ".invalid" with ".uk" to reply by e-mail.

rossum

unread,
Aug 11, 2012, 3:19:38 PM8/11/12
to
To: Lew
From: rossum <ross...@coldmail.com>

On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewb...@gmail.com> wrote:

>public static int calculateHash(Foo arg) {
> int h = 0;
>
> for ( each attribute of Foo that contributes to 'equals()' )
> {
> h = 31 * h + attribute.hashCode();
> }
> return h;
>}
Bloch starts with:

int h = 17;

He says that works beter in cases where the first one or more
attribute.hashCode() values are zero, and hence will not register.

He suggessts any constant non-zero value.

rossum

Lew

unread,
Aug 11, 2012, 7:24:37 PM8/11/12
to
On 08/10/2012 04:30 PM, Arne Vajhøj wrote:
> On 8/10/2012 6:32 PM, Lew wrote:
>> bob smith wrote:
>>> Now, there are cases where you HAVE to override it, or your code is very
>>> broken.
>>
>> No.
>
>> As long as 'hashCode()' fulfills the contract, your code will work -
>> functionally. But a bad
>> 'hashCode()' could and likely will noticeably affect performance. There is
>> more to correctness
>> than mere functional conformance.
>
> If the code per specs is guaranteed to work then it is correct.
>
> Good (or just decent) performance is not necessary for code to
> be correct.
>
> At least not in the traditional programming terminology.
>
> In plain English maybe.

I see your point, but that is not to say that the specs exclude performance
considerations.

In the case of 'hashCode()', the Javadocs do say, "This method is supported
for the benefit of hash tables such as those provided by HashMap."
<http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()>

The key question here is how you define "benefit". I argue that a hash code
that is constant does not benefit, say, a 'HashMap' because one of our desired
uses is constant-order retrieval.

"This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the elements
properly among the buckets."
<http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html>

Each specification refers to the other. Ergo they are meant to be considered
together. Taken together, the documentation clearly specifies that "correct"
or "proper" includes performance considerations. Therefore, by what you say,
the simple "return 1;" is not correct.

It certainly would not be correct for the 'Object' implementation.
"As much as is reasonably practical, the hashCode method defined by class
Object does return distinct integers for distinct objects." [op. cit.]

As you say, Arne, "correct" means it follows the spec. The OP's suggested
implementation violates the spec on two fronts.

--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg

Lew

unread,
Aug 11, 2012, 7:29:25 PM8/11/12
to
Eric Sosman wrote:
> Okay: Then returning a constant 1 (or 42 or 0 or whatever)
> would in fact satisfy the letter of the law regarding hashCode():

Not if you consider all aspects of what the Javadocs promise.

See my post upthread.

> Whenever x.equals(y) is true, x.hashCode() == y.hashCode(). In
> your example this would be trivially true because x,y,z,... all
> have the same hashCode() value, whether they're equal or not --
> You have lived up to the letter of the law.

No, because the law requires that the method support 'HashMap', which in turn
calls for "properly" hashed objects.

> Of course, such a hashCode() would make all those hash-based
> containers pretty much useless: They would work in the sense that
> they would get the Right Answer, but they'd be abominably slow,

Indeed.

> with expected performance of O(N) instead of O(1). See
> <http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/>
> for a survey of some denial-of-service attacks that work by driving
> hash tables from O(1) to O(N), resulting in catastrophic failure
> of the attacked system.
>
> In other words, the letter of the law on hashCode() is a bare
> minimum that guarantees correct functioning, but it is not enough
> to guarantee usability. Why isn't the law more specific? Because

Actually, if you consider all that the Javadocs tell you, this "letter of the
law" to which you refer is like saying the sequence "ABC" constitutes all of
"the ABCs".

> nobody knows how to write "hashCode() must be correct *and* usable"
> in terms that would cover all the classes all the Java programmers
> have dreamed up and will dream up. Your hashCode() meets the bare
> minimum requirement, but is not "usable." The actual hashCode()
> provided by Object also meets the bare minimum requirement, and *is*
> usable as it stands, until (and unless; you don't HAVE to) you
> choose to implement other equals() semantics, and a hashCode() to
> match them.

As Arne states, "correct" means "fulfills the specification". The
specification for Java API methods is the standard Javadocs, which do impose
performance considerations on 'hashCode()'.

One understands that the spec isn't always fully enforceable by the compiler.
[1] It is correct that the compiler will allow 'return 1;'. It is not correct
that that fulfills the specification.

[1] Doesn't one?

Lew

unread,
Aug 11, 2012, 7:34:52 PM8/11/12
to
Jan Burse wrote:
> Maybe it would make sense to spell out what the contract
> for hashCode() is. Well the contract is simply, the
> following invariant should hold:
>
> /* invariant that should hold */
> if a.equals(b) then a.hashCode()==b.hashCode()

True, but if you read the specification for 'hashCode()' fully, that is not
the entire contract, only the compiler-enforceable part of it.

The entire specification requires that as much as feasible, the 'Object'
implementation distinguish distinct instances, and that the method generally
support 'HashMap', which promises O(1) 'get()' and 'put()' with a "proper"
(i.e., compliant) 'hashCode()'.

Lew

unread,
Aug 11, 2012, 7:37:00 PM8/11/12
to
Lew wrote:
> Jan Burse wrote:
>> Maybe it would make sense to spell out what the contract
>> for hashCode() is. Well the contract is simply, the
>> following invariant should hold:
>>
>> /* invariant that should hold */
>> if a.equals(b) then a.hashCode()==b.hashCode()
>
> True, but if you read the specification for 'hashCode()' fully, that is not
> the entire contract, only the compiler-enforceable part of it.

Oooops!

I made a mistake.

Not even that is compiler-enforceable.

Arne Vajhøj

unread,
Aug 11, 2012, 10:15:07 PM8/11/12
to
Object having the method defined to support effective hashing
does not imply that it has to it just means that the potential
is there.

> "This implementation provides constant-time performance for the basic
> operations (get and put), assuming the hash function disperses the
> elements properly among the buckets."

Yes. And here it makes an assumption. Not that hashCode is implemented
correct, but that it is implemented in a certain way.

> Each specification refers to the other. Ergo they are meant to be
> considered together. Taken together, the documentation clearly specifies
> that "correct" or "proper" includes performance considerations.
> Therefore, by what you say, the simple "return 1;" is not correct.

> As you say, Arne, "correct" means it follows the spec. The OP's
> suggested implementation violates the spec on two fronts.

No it does not.

It follows exactly the explicit stated contract in the Java doc:

<quote>
The general contract of hashCode is:

Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. This integer need not remain
consistent from one execution of an application to another execution of
the same application.
If two objects are equal according to the equals(Object) method,
then calling the hashCode method on each of the two objects must produce
the same integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results. However,
the programmer should be aware that producing distinct integer results
for unequal objects may improve the performance of hashtables.
</quote>

The ability to support something does not make it part of the contract.

This is a classic test question in basic Java SE. And that returning
a constant is correct but not smart should be in most Java SE
text books.

Arne





Arne Vajhøj

unread,
Aug 11, 2012, 10:16:43 PM8/11/12
to
It fulfills the spec.

It does not fulfill you bizarre interpretation of "support".

Arne


Arne Vajhøj

unread,
Aug 11, 2012, 10:19:19 PM8/11/12
to
On 8/11/2012 7:34 PM, Lew wrote:
> Jan Burse wrote:
>> Maybe it would make sense to spell out what the contract
>> for hashCode() is. Well the contract is simply, the
>> following invariant should hold:
>>
>> /* invariant that should hold */
>> if a.equals(b) then a.hashCode()==b.hashCode()
>
> True, but if you read the specification for 'hashCode()' fully, that is
> not the entire contract, only the compiler-enforceable part of it.
>
> The entire specification requires that as much as feasible, the 'Object'
> implementation distinguish distinct instances, and that the method
> generally support 'HashMap', which promises O(1) 'get()' and 'put()'
> with a "proper" (i.e., compliant) 'hashCode()'.

Two wrong statements.

It says that the method is defined to support HashMap

And HashMap does not guarantee O(1) with a correct
hashCode - it guarantee that for one that return
good distributed values.

Arne


Arne Vajhøj

unread,
Aug 11, 2012, 10:29:02 PM8/11/12
to
On 8/11/2012 10:15 PM, Arne Vajhøj wrote:
> This is a classic test question in basic Java SE. And that returning
> a constant is correct but not smart should be in most Java SE
> text books.

Effective Java / Joshua Bloch:

<quote>
// The worst possible legal hash function - never use!
public int hashCode() { return 42; }

It is legal because it ensures that equal objects have the
same hash code. It's atrocious because ...
</quote>

Java 2 SUN Certified Programmer & Developer / Kathy Sierra & Bert Bates:

<quote>
A hashCode() that returns the same value for all instances whether
they're equal or not is still a legal - even appropriate - hashCode()
method! For example,
public int hashCode() {
return 1492;
}
would not violate the contract
...
This hashCode() method is horrible inefficient, ...
...
Nontheless, this one-hash-fits-all method would be
considered appropriate and even correct because it
doesn't violate the contract. Once more, correct does
not necessarily mean good.
</quote>

Arne




Arne Vajhøj

unread,
Aug 11, 2012, 10:40:32 PM8/11/12
to
On 8/11/2012 7:54 AM, Roedy Green wrote:
> On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewb...@gmail.com>
> wrote, quoted or indirectly quoted someone who said :
>
>> h =3D 31 * h + attribute.hashCode();
>> }
> In my essay I recommend XOR which is an inherentely faster operation
> than multiply. I wonder which actually works out better.

Multiply.

XOR has several problems:
- many small values give small result
- same values in different fields give same result
- two identical values give result zero
+ all those I did not think of.

> If you had a
> large number of fields, the multiply effect could fall off the left
> hand end. It is the algorithm used for String which could have very
> long strings, so Sun must have thought of that.

The multiply effect does not fall off the left with a value like 31
(it would with 32).

Arne




Eric Sosman

unread,
Aug 11, 2012, 10:43:31 PM8/11/12
to
All this means is that people know how to describe a "correct"
hashCode(), but nobody knows how to describe a "usable" hashCode()
in terms that apply testably to all circumstances.

The O.P. asked whether it would "be potentially better" if
Object's hashCode() returned a constant. He did *not* ask whether
such an implementation would be correct; he only asked if it would
"be potentially better." Upon prompting he explained what he
meant by "better," and in light of that explanation the answer
to his original question is NO. Discussions about "Oh, but it's
CORRECT" are just red herrings; it's still not "better."

--
Eric Sosman
eso...@ieee-dot-org.invalid

Arne Vajhøj

unread,
Aug 11, 2012, 10:54:35 PM8/11/12
to
On 8/11/2012 10:43 PM, Eric Sosman wrote:
> The O.P. asked whether it would "be potentially better" if
> Object's hashCode() returned a constant. He did *not* ask whether
> such an implementation would be correct; he only asked if it would
> "be potentially better." Upon prompting he explained what he
> meant by "better," and in light of that explanation the answer
> to his original question is NO. Discussions about "Oh, but it's
> CORRECT" are just red herrings; it's still not "better."


The original questions were:

#Is it always technically correct to override the hashCode function
#like so:
#
# @Override
# public int hashCode() {
# return 1;
# }

For which the answer is YES. Per documentation.

But with really poor performance in many relevant cases.

#Would it be potentially better if that was Object's implementation?

Which was clarified to:

#Better in the sense that you would never HAVE to override hashCode.

For which the answer is also YES. Per the previous.

But with the same performance note. And a big sigh because it
seems to want to broaden bad performance from a single class
to the entire programming style (multiple classes).

Arne




Lew

unread,
Aug 12, 2012, 12:46:44 AM8/12/12
to
Arne Vajhøj wrote:
> The original questions were:
>
> #Is it always technically correct to override the hashCode function #like so:
> #
> # @Override
> # public int hashCode() {
> # return 1;
> # }
>
> For which the answer is YES. Per documentation.
>
> But with really poor performance in many relevant cases.
>
> #Would it be potentially better if that was Object's implementation?
>
> Which was clarified to:
>
> #Better in the sense that you would never HAVE to override hashCode.
>
> For which the answer is also YES. Per the previous.

No, that's not true. Value-equality maps, for example, would not work if you
didn't override 'hashCode()' in the key type to match value equality on the keys.

> But with the same performance note. And a big sigh because it
> seems to want to broaden bad performance from a single class
> to the entire programming style (multiple classes).

Overriding 'hashCode()' is done for functional reasons, not performance
reasons. If you fail to override the method, you'll get incorrect behavior,
for example failing to find a collection member that is actually present.

Lew

unread,
Aug 12, 2012, 12:48:41 AM8/12/12
to
Granted.

Jan Burse

unread,
Aug 12, 2012, 6:08:52 AM8/12/12
to
Jan Burse schrieb:
> Maybe it would make sense to spell out what the contract
> for hashCode() is.

Those who are interested can read the original text.
It is found here:

http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode%28%29

---------------------------------------------------------------

"If two objects are equal according to the equals(Object)
method, then calling the hashCode method on each of the
two objects must produce the same integer result."

That is:

/* invariant that should hold */
if a.equals(b) then a.hashCode()==b.hashCode()

----------------------------------------------------------------

"It is not required that if two objects are unequal according
to the equals(java.lang.Object) method, then calling the
hashCode method on each of the two objects must produce
distinct integer results. However, the programmer should be
aware that producing distinct integer results for unequal
objects may improve the performance of hash tables."

That is:

/* not implied and thus not required by the invariant */
if a.hashCode()==b.hashCode() then a.equals(b)

It is also quite unlikely that a hashCode() would satisfy
the later, although the closer it comes to the later, the
better it works for HashMap, etc..

----------------------------------------------------------------

There is no reference to Comparable and the compareTo. If I am not
using HashMap or HashSet in my application, and still override
equals, I do not need to implement hashCode(). I could for example
use TreeMap or TreeSet and implement the Comparable interface. There
is a reference to equals() back from Comparable though.

http://docs.oracle.com/javase/6/docs/api/java/lang/Comparable.html

Bye

Jan Burse

unread,
Aug 12, 2012, 6:18:45 AM8/12/12
to
Jan Burse schrieb:
> /* not implied and thus not required by the invariant */
> if a.hashCode()==b.hashCode() then a.equals(b)
This is logically equivalent to what is formulated
in the Javadoc by the so called contraposition:

if !a.equals(b) then a.hashCode()!=b.hashCode()

http://en.wikipedia.org/wiki/Contraposition

Patricia Shanahan

unread,
Aug 12, 2012, 6:46:17 AM8/12/12
to
On 8/10/2012 3:22 PM, bob smith wrote:
...
> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very broken.
>

I've decided to go back to this message, because I feel the key issue is
the circumstances under which hashCode would need to be overridden if
Object's version returned a constant, compared to the current situation.

If Object's hashCode returned a constant, in practice anyone using an
object as a key in a hash structure would want it overridden with one
that has at least some chance of using multiple buckets. Without that
property, a HashMap is an over-complicated, space-wasting cousin of a
linked list.

The problem with this is that the programmer who knows that Widget
instances are going to be used as HashMap keys does not necessarily
control the Widget implementation. The programmer writing Widget has no
idea whether it will ever be used as a HashMap key, and therefore no way
of knowing whether it is safe, assuming Widget inherits the Object
equals, to also inherit the Object hashCode.

Now compare to the current situation. The programmer implementing Widget
decides whether to inherit a superclass equals or to write a
Widget-specific equals. That programmer can assume the superclass has a
hashCode that would be effective for a HashMap key, and only has to
override hashCode if they are overriding equals.

In practice, it is a long time since I've written a hashCode manually.
Generally, when I decide to override equals I tell Eclipse to generate
an equals/hashCode pair based on the fields that control whether two
instances are equal. Overriding hashCode is no additional work given
that I would be telling Eclipse to generate an equals based on those
fields anyway.

To me, the current situation seems "better".

Patricia

Robert Klemme

unread,
Aug 12, 2012, 11:06:07 AM8/12/12
to
On 11.08.2012 01:27, Arne Vajhøj wrote:
> On 8/10/2012 6:22 PM, bob smith wrote:
>> On Friday, August 10, 2012 11:34:28 AM UTC-5, Eric Sosman wrote:
>>> On 8/10/2012 11:47 AM, bob smith wrote:
>>>> Is it always technically correct to override the hashCode function
>>>> like so:
>>>> @Override
>>>> public int hashCode() {
>>>> return 1;
>>>> }
>>>> Would it be potentially better if that was Object's implementation?
>>>
>>> Define "better."
>>
>> Better in the sense that you would never HAVE to override hashCode.
>>
>> Now, there are cases where you HAVE to override it, or your code is
>> very broken.
>
> It is not broken.
>
> It will perform poorly in many cases.

Well, I would go as far as to say that it will perform poorly in all
cases where hashCode() is actually needed - and that makes it broken.
The whole idea of hashing is based on the fact that the hash code
_somehow_ represents the item to be hashed. If all items have the same
constant hash code there is no point in hashing at all. So while it
does work, it does not work as intended.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Joshua Cranmer

unread,
Aug 12, 2012, 12:23:24 PM8/12/12
to
On 8/10/2012 6:22 PM, bob smith wrote:
> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very broken.

Returning a constant hash code is correct in the same sense that
answering "yes" to the question "Can you tell me the correct way to do
this?" would be--syntactically and semantically correct, but completely
contrary to the actual intent of the question.

The point of the hash code is to provide a cheap way to quickly
distinguish inputs (in the sense that Pr(a.hashCode() == b.hashCode()
and !a.equals(b)) should be as small as possible [1]). A constant-value
hash completely negates the purpose of the hash code, and this renders
the hashCode again completely unusable for anything that actually wants
to use it.

In the default case, a.hashCode() == b.hashCode() only when a == b (this
definitely holds true with 32-bit machines and I'm pretty sure it still
holds true with 64-bit machines, but I'd have to reverify the JVM source
code to be certain). It is thus correct so long as identity equals is
correct. It is also potentially correct in a limited set of cases where
a.equals(b) and a != b. In all of these cases, it would not only be
correct but also extremely useful, having pretty strong guarantees about
the distribution of hash values.

[1] Actually, for good performance, hash codes should go one step
further and make slightly stronger guarantees about independence with
respect to the size of the hash table. But I digress.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Patricia Shanahan

unread,
Aug 12, 2012, 12:40:28 PM8/12/12
to
I think there are two reasonably usable ways of handling this issue. One
is the current arrangement, in which every class has a hashCode that is
expected to be usable for selecting a hash table bucket.

Keeping hashCode as an Object method but making it useless for bucket
selection unless overridden would not be a good alternative.

A more reasonable alternative would be to have hashCode as the only
member of a HashKey interface that would be implemented by every class
whose objects are intended to be suitable for use as has keys. Those
objects that have a hashCode would still have to have a usable one, but
some classes would not implement HashKey and not have a hashCode at all.

Patricia

Jan Burse

unread,
Aug 12, 2012, 1:03:02 PM8/12/12
to
Joshua Cranmer schrieb:
>
> In the default case, a.hashCode() == b.hashCode() only when a == b (this
> definitely holds true with 32-bit machines and I'm pretty sure it still
> holds true with 64-bit machines, but I'd have to reverify the JVM source
> code to be certain).

Not at all. Even not for 32-bit machines. Not only because
the hashCode() usually uses less than 32 bits. But also for
other reasons, namely the internal algorithm how the
hash is produced (although the below bug doesn't reveal
much internals):

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6321873

But in the bottom of the above bug you can find code that
searches for a clash. It can take quite a while, but
it is not excluded. This is what a sample run produced
according to the bug:

62028120: java.lang.Object@9cab16 - java.lang.Object@9cab16

So the 62028120-th new Object produced the same hashcode.

So

Eric Sosman

unread,
Aug 12, 2012, 1:59:07 PM8/12/12
to
On 8/12/2012 12:40 PM, Patricia Shanahan wrote:
>[...]
> I think there are two reasonably usable ways of handling this issue. One
> is the current arrangement, in which every class has a hashCode that is
> expected to be usable for selecting a hash table bucket.
>
> Keeping hashCode as an Object method but making it useless for bucket
> selection unless overridden would not be a good alternative.
>
> A more reasonable alternative would be to have hashCode as the only
> member of a HashKey interface that would be implemented by every class
> whose objects are intended to be suitable for use as has keys. Those
> objects that have a hashCode would still have to have a usable one, but
> some classes would not implement HashKey and not have a hashCode at all.

Ugh. So if J. Random Programmer is too lazy or unimaginative to
write hashCode(), that means I can't use his class as a HashMap key,
or even put instances in a HashSet? Ugh, again.

(And, no: I don't think a HashCalculator interface along the lines
of Comparable would save the day.)

--
Eric Sosman
eso...@ieee-dot-org.invalid

Patricia Shanahan

unread,
Aug 12, 2012, 2:17:55 PM8/12/12
to
I'm not saying that it would be better than the current situation, just
better than having hashCode implementations that appear to be there, but
in practice must not be used for hash bucket selection.

Patricia

Lew

unread,
Aug 12, 2012, 2:27:38 PM8/12/12
to
Jan Burse wrote:
> There is no reference to Comparable and the compareTo. If I am not

True.

> using HashMap or HashSet in my application, and still override
> equals, I do not need to implement hashCode(). I could for example

True.

> use TreeMap or TreeSet and implement the Comparable interface. There
> is a reference to equals() back from Comparable though.

This requires a detailed knowledge of the implementation of the particular
'Map' or 'Set'. If its searches do not employ hash codes, then you do not need
to implement 'hashCode()' for value-equal types.

In the general case one prefers to underspecify the mechanics, that is, allow
a client of the type to deploy either 'equals()'-based or hash-based
algorithms, by ensuring the methods are consistent with each other per Joshua
Bloch and other notables.

> http://docs.oracle.com/javase/6/docs/api/java/lang/Comparable.html

Even for cases where one predicts the use will not require one or another of
the consistency practices, it is harmless to enforce them.

There are four methods a type might use to represent equality or identity and
deviations therefrom: 'equals()' of course, and 'hashCode()', 'compareTo()',
and 'toString()'. There may be external 'Comparator's on that type.

All these methods on or of a type, where they exist, should be consistent,
absent an overwhelming and sound reason not to be.

The default case from 'Object' is that 'equals()' implements ==, 'hashCode()'
returns something sort of like an address of the instance which is nearly
always unique within a given JVM run, 'toString()' returns a decorated string
version of the hash code, and 'compareTo()' doesn't exist.

To express value equality in a type, one must override 'equals()'. The rest
are optional in Arne's strictest and most correct use of that notion. It is
also harmless to keep the rest consistent and nearly always (as in you might
encounter one instance per career otherwise) useful.

One generally chooses 'TreeMap' for its sorting capability, not its reliance
on 'equals()' vs. 'hashCode()' to achieve that. (Assuming it has such a
reliance.) Unless there's a compelling argument to rely on 'TreeMap''s
undocumented non-use of hash codes, why do so?

Yes, I am aware that the 'TreeMap' documentation makes it clear that
'hashCode()' shouldn't be involved. Without promising it isn't. But a
'Comparator' might use hash codes under the hood in a naive attempt to
shortcut a comparison, not knowing that the base type assumes no one would do
such a thing. Or some later client might need only equality and not order, and
throw the base type into a 'HashMap' with surprising results.

Should you design anything that violates the consistency rule, then please do
both Javadoc and code-comment it properly.

Lew

unread,
Aug 12, 2012, 2:58:16 PM8/12/12
to
To: Arne Vajhøj
From: Lew <no...@lewscanon.com>

On 08/10/2012 04:30 PM, Arne Vajh-,j wrote:
> On 8/10/2012 6:32 PM, Lew wrote:
>> bob smith wrote:
>>> Now, there are cases where you HAVE to override it, or your code is very
>>> broken.
>>
>> No.
>
>> As long as 'hashCode()' fulfills the contract, your code will work -
>> functionally. But a bad
>> 'hashCode()' could and likely will noticeably affect performance. There is
>> more to correctness
>> than mere functional conformance.
>
> If the code per specs is guaranteed to work then it is correct.
>
> Good (or just decent) performance is not necessary for code to
> be correct.
>
> At least not in the traditional programming terminology.
>
> In plain English maybe.

I see your point, but that is not to say that the specs exclude performance
considerations.

In the case of 'hashCode()', the Javadocs do say, "This method is supported for
the benefit of hash tables such as those provided by HashMap."
<http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()>

The key question here is how you define "benefit". I argue that a hash code
that is constant does not benefit, say, a 'HashMap' because one of our desired
uses is constant-order retrieval.

"This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the elements
properly among the buckets."
<http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html>

Each specification refers to the other. Ergo they are meant to be considered
together. Taken together, the documentation clearly specifies that "correct" or
"proper" includes performance considerations. Therefore, by what you say, the
simple "return 1;" is not correct.

It certainly would not be correct for the 'Object' implementation. "As much as
is reasonably practical, the hashCode method defined by class Object does
return distinct integers for distinct objects." [op. cit.]

As you say, Arne, "correct" means it follows the spec. The OP's suggested
implementation violates the spec on two fronts.

Lew

unread,
Aug 12, 2012, 2:58:16 PM8/12/12
to
To: Eric Sosman
From: Lew <no...@lewscanon.com>
[1] Doesn't one?

Lew

unread,
Aug 12, 2012, 2:58:16 PM8/12/12
to
To: Jan Burse
From: Lew <no...@lewscanon.com>

Jan Burse wrote:
> Maybe it would make sense to spell out what the contract
> for hashCode() is. Well the contract is simply, the
> following invariant should hold:
>
> /* invariant that should hold */
> if a.equals(b) then a.hashCode()==b.hashCode()

True, but if you read the specification for 'hashCode()' fully, that is not the
entire contract, only the compiler-enforceable part of it.

The entire specification requires that as much as feasible, the 'Object'
implementation distinguish distinct instances, and that the method generally
support 'HashMap', which promises O(1) 'get()' and 'put()' with a "proper"
(i.e., compliant) 'hashCode()'.

Lew

unread,
Aug 12, 2012, 2:58:16 PM8/12/12
to
To: Lew
From: Lew <no...@lewscanon.com>

Lew wrote:
> Jan Burse wrote:
>> Maybe it would make sense to spell out what the contract
>> for hashCode() is. Well the contract is simply, the
>> following invariant should hold:
>>
>> /* invariant that should hold */
>> if a.equals(b) then a.hashCode()==b.hashCode()
>
> True, but if you read the specification for 'hashCode()' fully, that is not
> the entire contract, only the compiler-enforceable part of it.

Oooops!

I made a mistake.

Not even that is compiler-enforceable.

Arne Vajhøj

unread,
Aug 12, 2012, 2:58:17 PM8/12/12
to
To: Lew
From: =?UTF-8?B?QXJuZSBWYWpow7hq?= <ar...@vajhoej.dk>

On 8/11/2012 7:24 PM, Lew wrote:
Object having the method defined to support effective hashing does not imply
that it has to it just means that the potential is there.

> "This implementation provides constant-time performance for the basic
> operations (get and put), assuming the hash function disperses the
> elements properly among the buckets."

Yes. And here it makes an assumption. Not that hashCode is implemented correct,
but that it is implemented in a certain way.

> Each specification refers to the other. Ergo they are meant to be
> considered together. Taken together, the documentation clearly specifies
> that "correct" or "proper" includes performance considerations.
> Therefore, by what you say, the simple "return 1;" is not correct.

> As you say, Arne, "correct" means it follows the spec. The OP's
> suggested implementation violates the spec on two fronts.

No it does not.

It follows exactly the explicit stated contract in the Java doc:

<quote>
The general contract of hashCode is:

Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently return
the same integer, provided no information used in equals comparisons on the
object is modified. This integer need not remain consistent from one execution
of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method,
then calling the hashCode method on each of the two objects must produce the
same integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on each of
the two objects must produce distinct integer results. However, the programmer
should be aware that producing distinct integer results for unequal objects may
improve the performance of hashtables.
</quote>

The ability to support something does not make it part of the contract.

This is a classic test question in basic Java SE. And that returning a constant
is correct but not smart should be in most Java SE text books.

Arne

Arne Vajhøj

unread,
Aug 12, 2012, 2:58:17 PM8/12/12
to
To: Lew
From: =?UTF-8?B?QXJuZSBWYWpow7hq?= <ar...@vajhoej.dk>

It fulfills the spec.

It does not fulfill you bizarre interpretation of "support".

Arne

Arne Vajhøj

unread,
Aug 12, 2012, 2:58:17 PM8/12/12
to
To: Lew
From: =?UTF-8?B?QXJuZSBWYWpow7hq?= <ar...@vajhoej.dk>

On 8/11/2012 7:34 PM, Lew wrote:
> Jan Burse wrote:
>> Maybe it would make sense to spell out what the contract
>> for hashCode() is. Well the contract is simply, the
>> following invariant should hold:
>>
>> /* invariant that should hold */
>> if a.equals(b) then a.hashCode()==b.hashCode()
>
> True, but if you read the specification for 'hashCode()' fully, that is
> not the entire contract, only the compiler-enforceable part of it.
>
> The entire specification requires that as much as feasible, the 'Object'
> implementation distinguish distinct instances, and that the method
> generally support 'HashMap', which promises O(1) 'get()' and 'put()'
> with a "proper" (i.e., compliant) 'hashCode()'.

Two wrong statements.

It says that the method is defined to support HashMap

And HashMap does not guarantee O(1) with a correct hashCode - it guarantee that
for one that return good distributed values.

Arne

Arne Vajhøj

unread,
Aug 12, 2012, 2:58:17 PM8/12/12
to
To: =?UTF-8?B?QXJuZSBWYWpow7hq?=
From: =?UTF-8?B?QXJuZSBWYWpow7hq?= <ar...@vajhoej.dk>

On 8/11/2012 10:15 PM, Arne Vajh-,j wrote:
> This is a classic test question in basic Java SE. And that returning
> a constant is correct but not smart should be in most Java SE
> text books.

Effective Java / Joshua Bloch:

<quote>
// The worst possible legal hash function - never use!
public int hashCode() { return 42; }

It is legal because it ensures that equal objects have the same hash code. It's
atrocious because ...
</quote>

Java 2 SUN Certified Programmer & Developer / Kathy Sierra & Bert Bates:

<quote>
A hashCode() that returns the same value for all instances whether they're
equal or not is still a legal - even appropriate - hashCode() method! For
example,
public int hashCode() {
return 1492;
}
would not violate the contract
...
This hashCode() method is horrible inefficient, ... ...
Nontheless, this one-hash-fits-all method would be considered appropriate and
even correct because it doesn't violate the contract. Once more, correct does
not necessarily mean good.
</quote>

Eric Sosman

unread,
Aug 12, 2012, 2:58:18 PM8/12/12
to
To: =?UTF-8?B?QXJuZSBWYWpow7hq?=
From: Eric Sosman <eso...@ieee-dot-org.invalid>
All this means is that people know how to describe a "correct"
hashCode(), but nobody knows how to describe a "usable" hashCode() in terms
that apply testably to all circumstances.

The O.P. asked whether it would "be potentially better" if
Object's hashCode() returned a constant. He did *not* ask whether such an
implementation would be correct; he only asked if it would "be potentially
better." Upon prompting he explained what he meant by "better," and in light
of that explanation the answer to his original question is NO. Discussions
about "Oh, but it's CORRECT" are just red herrings; it's still not "better."

--
Eric Sosman
eso...@ieee-dot-org.invalid

Arne Vajhøj

unread,
Aug 12, 2012, 2:58:18 PM8/12/12
to
To: Roedy Green
From: Arne Vajhoj <ar...@vajhoej.dk>

Lew

unread,
Aug 12, 2012, 2:58:18 PM8/12/12
to
To: =?UTF-8?B?QXJuZSBWYWpow7hq?=
From: Lew <no...@lewscanon.com>

Arne Vajhøj

unread,
Aug 12, 2012, 2:58:18 PM8/12/12
to
To: Eric Sosman
From: =?UTF-8?B?QXJuZSBWYWpow7hq?= <ar...@vajhoej.dk>

On 8/11/2012 10:43 PM, Eric Sosman wrote:
> The O.P. asked whether it would "be potentially better" if
> Object's hashCode() returned a constant. He did *not* ask whether
> such an implementation would be correct; he only asked if it would
> "be potentially better." Upon prompting he explained what he
> meant by "better," and in light of that explanation the answer
> to his original question is NO. Discussions about "Oh, but it's
> CORRECT" are just red herrings; it's still not "better."


The original questions were:

#Is it always technically correct to override the hashCode function
#like so:
#
# @Override
# public int hashCode() {
# return 1;
# }

For which the answer is YES. Per documentation.

But with really poor performance in many relevant cases.

#Would it be potentially better if that was Object's implementation?

Which was clarified to:

#Better in the sense that you would never HAVE to override hashCode.

For which the answer is also YES. Per the previous.

But with the same performance note. And a big sigh because it seems to want to
broaden bad performance from a single class to the entire programming style
(multiple classes).

Arne

Lew

unread,
Aug 12, 2012, 2:58:19 PM8/12/12
to
To: =?UTF-8?B?QXJuZSBWYWpow7hq?=
From: Lew <no...@lewscanon.com>

Arne Vajh-,j wrote:
> Lew wrote:
>> Jan Burse wrote:
>>> Maybe it would make sense to spell out what the contract
>>> for hashCode() is. Well the contract is simply, the
>>> following invariant should hold:
>>>
>>> /* invariant that should hold */
>>> if a.equals(b) then a.hashCode()==b.hashCode()
>>
>> True, but if you read the specification for 'hashCode()' fully, that is
>> not the entire contract, only the compiler-enforceable part of it.
>>
>> The entire specification requires that as much as feasible, the 'Object'
>> implementation distinguish distinct instances, and that the method
>> generally support 'HashMap', which promises O(1) 'get()' and 'put()'
>> with a "proper" (i.e., compliant) 'hashCode()'.
>
> Two wrong statements.
>
> It says that the method is defined to support HashMap
>
> And HashMap does not guarantee O(1) with a correct
> hashCode - it guarantee that for one that return
> good distributed values.

Granted.

Jan Burse

unread,
Aug 12, 2012, 2:58:19 PM8/12/12
to
To: Jan Burse
From: Jan Burse <janb...@fastmail.fm>

Jan Burse schrieb:
> Maybe it would make sense to spell out what the contract
> for hashCode() is.

Those who are interested can read the original text. It is found here:

http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode%28%29

---------------------------------------------------------------

"If two objects are equal according to the equals(Object)
method, then calling the hashCode method on each of the
two objects must produce the same integer result."

That is:

/* invariant that should hold */
if a.equals(b) then a.hashCode()==b.hashCode()

----------------------------------------------------------------

"It is not required that if two objects are unequal according
to the equals(java.lang.Object) method, then calling the
hashCode method on each of the two objects must produce
distinct integer results. However, the programmer should be
aware that producing distinct integer results for unequal
objects may improve the performance of hash tables."

That is:

/* not implied and thus not required by the invariant */
if a.hashCode()==b.hashCode() then a.equals(b)

It is also quite unlikely that a hashCode() would satisfy the later, although
the closer it comes to the later, the better it works for HashMap, etc..

----------------------------------------------------------------

There is no reference to Comparable and the compareTo. If I am not using
HashMap or HashSet in my application, and still override equals, I do not need
to implement hashCode(). I could for example use TreeMap or TreeSet and
implement the Comparable interface. There is a reference to equals() back from
Comparable though.

http://docs.oracle.com/javase/6/docs/api/java/lang/Comparable.html

Bye

Jan Burse

unread,
Aug 12, 2012, 2:58:19 PM8/12/12
to
To: Jan Burse
From: Jan Burse <janb...@fastmail.fm>

Jan Burse schrieb:
> /* not implied and thus not required by the invariant */
> if a.hashCode()==b.hashCode() then a.equals(b)
This is logically equivalent to what is formulated in the Javadoc by the so
called contraposition:

if !a.equals(b) then a.hashCode()!=b.hashCode()

http://en.wikipedia.org/wiki/Contraposition

Patricia Shanahan

unread,
Aug 12, 2012, 2:58:19 PM8/12/12
to
To: bob smith
From: Patricia Shanahan <pa...@acm.org>

On 8/10/2012 3:22 PM, bob smith wrote: ...
> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very
broken.
>

Robert Klemme

unread,
Aug 12, 2012, 2:58:20 PM8/12/12
to
To: Arne Vajhøj
From: Robert Klemme <short...@googlemail.com>

On 11.08.2012 01:27, Arne Vajhoj wrote:
> On 8/10/2012 6:22 PM, bob smith wrote:
>> On Friday, August 10, 2012 11:34:28 AM UTC-5, Eric Sosman wrote:
>>> On 8/10/2012 11:47 AM, bob smith wrote:
>>>> Is it always technically correct to override the hashCode function
>>>> like so:
>>>> @Override
>>>> public int hashCode() {
>>>> return 1;
>>>> }
>>>> Would it be potentially better if that was Object's implementation?
>>>
>>> Define "better."
>>
>> Better in the sense that you would never HAVE to override hashCode.
>>
>> Now, there are cases where you HAVE to override it, or your code is
>> very broken.
>
> It is not broken.
>
> It will perform poorly in many cases.

Well, I would go as far as to say that it will perform poorly in all cases
where hashCode() is actually needed - and that makes it broken. The whole idea
of hashing is based on the fact that the hash code _somehow_ represents the
item to be hashed. If all items have the same constant hash code there is no
point in hashing at all. So while it does work, it does not work as intended.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Joshua Cranmer

unread,
Aug 12, 2012, 2:58:20 PM8/12/12
to
To: bob smith
From: Joshua Cranmer <Pidg...@verizon.invalid>

On 8/10/2012 6:22 PM, bob smith wrote:
> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very
broken.

Returning a constant hash code is correct in the same sense that answering
"yes" to the question "Can you tell me the correct way to do this?" would
be--syntactically and semantically correct, but completely contrary to the
actual intent of the question.

The point of the hash code is to provide a cheap way to quickly distinguish
inputs (in the sense that Pr(a.hashCode() == b.hashCode() and !a.equals(b))
should be as small as possible [1]). A constant-value hash completely negates
the purpose of the hash code, and this renders the hashCode again completely
unusable for anything that actually wants to use it.

In the default case, a.hashCode() == b.hashCode() only when a == b (this
definitely holds true with 32-bit machines and I'm pretty sure it still holds
true with 64-bit machines, but I'd have to reverify the JVM source code to be
certain). It is thus correct so long as identity equals is correct. It is also
potentially correct in a limited set of cases where a.equals(b) and a != b. In
all of these cases, it would not only be correct but also extremely useful,
having pretty strong guarantees about the distribution of hash values.

[1] Actually, for good performance, hash codes should go one step further and
make slightly stronger guarantees about independence with respect to the size
of the hash table. But I digress.

--
Beware of bugs in the above code; I have only proved it correct, not tried it.
-- Donald E. Knuth

Patricia Shanahan

unread,
Aug 12, 2012, 2:58:20 PM8/12/12
to
To: Joshua Cranmer
From: Patricia Shanahan <pa...@acm.org>

On 8/12/2012 9:23 AM, Joshua Cranmer wrote:
I think there are two reasonably usable ways of handling this issue. One is the
current arrangement, in which every class has a hashCode that is expected to be
usable for selecting a hash table bucket.

Keeping hashCode as an Object method but making it useless for bucket selection
unless overridden would not be a good alternative.

A more reasonable alternative would be to have hashCode as the only member of a
HashKey interface that would be implemented by every class whose objects are
intended to be suitable for use as has keys. Those objects that have a hashCode
would still have to have a usable one, but some classes would not implement
HashKey and not have a hashCode at all.

Patricia

Jan Burse

unread,
Aug 12, 2012, 2:58:20 PM8/12/12
to
To: Joshua Cranmer
From: Jan Burse <janb...@fastmail.fm>

Joshua Cranmer schrieb:
>
> In the default case, a.hashCode() == b.hashCode() only when a == b (this
> definitely holds true with 32-bit machines and I'm pretty sure it still
> holds true with 64-bit machines, but I'd have to reverify the JVM source
> code to be certain).

Not at all. Even not for 32-bit machines. Not only because the hashCode()
usually uses less than 32 bits. But also for other reasons, namely the internal
algorithm how the hash is produced (although the below bug doesn't reveal much
internals):

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6321873

But in the bottom of the above bug you can find code that searches for a clash.
It can take quite a while, but it is not excluded. This is what a sample run
produced according to the bug:

62028120: java.lang.Object@9cab16 - java.lang.Object@9cab16

So the 62028120-th new Object produced the same hashcode.

So

Eric Sosman

unread,
Aug 12, 2012, 4:00:36 PM8/12/12
to
To: Patricia Shanahan
From: Eric Sosman <eso...@ieee-dot-org.invalid>

On 8/12/2012 12:40 PM, Patricia Shanahan wrote:
>[...]
> I think there are two reasonably usable ways of handling this issue. One
> is the current arrangement, in which every class has a hashCode that is
> expected to be usable for selecting a hash table bucket.
>
> Keeping hashCode as an Object method but making it useless for bucket
> selection unless overridden would not be a good alternative.
>
> A more reasonable alternative would be to have hashCode as the only
> member of a HashKey interface that would be implemented by every class
> whose objects are intended to be suitable for use as has keys. Those
> objects that have a hashCode would still have to have a usable one, but
> some classes would not implement HashKey and not have a hashCode at all.

Ugh. So if J. Random Programmer is too lazy or unimaginative to
write hashCode(), that means I can't use his class as a HashMap key, or even
put instances in a HashSet? Ugh, again.

(And, no: I don't think a HashCalculator interface along the lines
of Comparable would save the day.)

--
Eric Sosman
eso...@ieee-dot-org.invalid

Patricia Shanahan

unread,
Aug 12, 2012, 4:00:37 PM8/12/12
to
To: Eric Sosman
From: Patricia Shanahan <pa...@acm.org>

On 8/12/2012 10:59 AM, Eric Sosman wrote:
> On 8/12/2012 12:40 PM, Patricia Shanahan wrote:
>> [...]
>> I think there are two reasonably usable ways of handling this issue. One
>> is the current arrangement, in which every class has a hashCode that is
>> expected to be usable for selecting a hash table bucket.
>>
>> Keeping hashCode as an Object method but making it useless for bucket
>> selection unless overridden would not be a good alternative.
>>
>> A more reasonable alternative would be to have hashCode as the only
>> member of a HashKey interface that would be implemented by every class
>> whose objects are intended to be suitable for use as has keys. Those
>> objects that have a hashCode would still have to have a usable one, but
>> some classes would not implement HashKey and not have a hashCode at all.
>
> Ugh. So if J. Random Programmer is too lazy or unimaginative to
> write hashCode(), that means I can't use his class as a HashMap key,
> or even put instances in a HashSet? Ugh, again.
>
> (And, no: I don't think a HashCalculator interface along the lines
> of Comparable would save the day.)
>

I'm not saying that it would be better than the current situation, just better
than having hashCode implementations that appear to be there, but in practice
must not be used for hash bucket selection.

Patricia

Lew

unread,
Aug 12, 2012, 4:00:37 PM8/12/12
to
To: Jan Burse
From: Lew <no...@lewscanon.com>

Jan Burse wrote:
> There is no reference to Comparable and the compareTo. If I am not

True.

> using HashMap or HashSet in my application, and still override
> equals, I do not need to implement hashCode(). I could for example

True.

> use TreeMap or TreeSet and implement the Comparable interface. There
> is a reference to equals() back from Comparable though.

Arne Vajhøj

unread,
Aug 12, 2012, 4:53:11 PM8/12/12
to
On 8/12/2012 12:46 AM, Lew wrote:
> Arne Vajhøj wrote:
>> The original questions were:
>>
>> #Is it always technically correct to override the hashCode function
>> #like so:
>> #
>> # @Override
>> # public int hashCode() {
>> # return 1;
>> # }
>>
>> For which the answer is YES. Per documentation.
>>
>> But with really poor performance in many relevant cases.
>>
>> #Would it be potentially better if that was Object's implementation?
>>
>> Which was clarified to:
>>
>> #Better in the sense that you would never HAVE to override hashCode.
>>
>> For which the answer is also YES. Per the previous.
>
> No, that's not true. Value-equality maps, for example, would not work if
> you didn't override 'hashCode()' in the key type to match value equality
> on the keys.

That was almost exactly what I answered in my first answer to
the original poster.

Then I read it as "would it be better if my class used Object's
implementation".

But after reading the clarification then I think it should be
read as "would it be better if Object used the implementation
shown".

Very different question!

Arne

Arne Vajhøj

unread,
Aug 12, 2012, 4:59:05 PM8/12/12
to
On 8/12/2012 11:06 AM, Robert Klemme wrote:
> On 11.08.2012 01:27, Arne Vajhøj wrote:
>> On 8/10/2012 6:22 PM, bob smith wrote:
>>> On Friday, August 10, 2012 11:34:28 AM UTC-5, Eric Sosman wrote:
>>>> On 8/10/2012 11:47 AM, bob smith wrote:
>>>>> Is it always technically correct to override the hashCode function
>>>>> like so:
>>>>> @Override
>>>>> public int hashCode() {
>>>>> return 1;
>>>>> }
>>>>> Would it be potentially better if that was Object's implementation?
>>>>
>>>> Define "better."
>>>
>>> Better in the sense that you would never HAVE to override hashCode.
>>>
>>> Now, there are cases where you HAVE to override it, or your code is
>>> very broken.
>>
>> It is not broken.
>>
>> It will perform poorly in many cases.
>
> Well, I would go as far as to say that it will perform poorly in all
> cases where hashCode() is actually needed

More or less.

> - and that makes it broken.

This thread started about whether it is correct. The term correct
has a very specific meaning in programming and that always return 1
code is correct.

Now you talk about broken. If broken means not good, then you are right.
If broken means not correct, then you are wrong. I suspect
that broken is not very well defined.

> The whole idea of hashing is based on the fact that the hash code
> _somehow_ represents the item to be hashed. If all items have the same
> constant hash code there is no point in hashing at all. So while it
> does work, it does not work as intended.

It disable the entire hashing functionality and a HashMap get
characteristics of ArrayList.

But the code should actually work.

Arne



Arne Vajhøj

unread,
Aug 12, 2012, 5:00:12 PM8/12/12
to
On 8/12/2012 12:46 AM, Lew wrote:
> Arne Vajhøj wrote:
>> But with the same performance note. And a big sigh because it
>> seems to want to broaden bad performance from a single class
>> to the entire programming style (multiple classes).
>
> Overriding 'hashCode()' is done for functional reasons, not performance
> reasons. If you fail to override the method, you'll get incorrect
> behavior, for example failing to find a collection member that is
> actually present.

Correct.

But the return constant is a special case. It functions as it
should but performs very poorly.

Arne


Arne Vajhøj

unread,
Aug 12, 2012, 5:02:36 PM8/12/12
to
On 8/12/2012 12:40 PM, Patricia Shanahan wrote:
> Keeping hashCode as an Object method but making it useless for bucket
> selection unless overridden would not be a good alternative.
>
> A more reasonable alternative would be to have hashCode as the only
> member of a HashKey interface that would be implemented by every class
> whose objects are intended to be suitable for use as has keys. Those
> objects that have a hashCode would still have to have a usable one, but
> some classes would not implement HashKey and not have a hashCode at all.

That would avoid using the hash for classes where it does not make
much sense.

I like it.

Given that there is a gazillion lines of code out there that
stores Object, then it can not be changed.

Arne


Arne Vajhøj

unread,
Aug 12, 2012, 5:11:46 PM8/12/12
to
On 8/12/2012 2:27 PM, Lew wrote:
> Even for cases where one predicts the use will not require one or
> another of the consistency practices, it is harmless to enforce them.
>
> There are four methods a type might use to represent equality or
> identity and deviations therefrom: 'equals()' of course, and
> 'hashCode()', 'compareTo()', and 'toString()'. There may be external
> 'Comparator's on that type.
>
> All these methods on or of a type, where they exist, should be
> consistent, absent an overwhelming and sound reason not to be.

> Should you design anything that violates the consistency rule, then
> please do both Javadoc and code-comment it properly.

I agree.

(well toString is not high on my priority list for what needs to
behave certain ways, but ...)

To make it easier to get them consistent, then I think it would
be nice with syntactic sugar.

Like:

public class Data {
@ValueId
private int iv;
@ValueId
private String sv;
// the rest of the class
}

which would cause the compiler to emit equals and hashCode methods
that are test of the @ValueId fields are equal and creates a "decent"
hash.

public class Data implements Comparable<Data> {
@ValueId
private int iv;
@ValueId
private String sv;
// the rest of the class
}

could cause the compiler to output a compareTo as well.

It is invisible code, but we already got that with the
default constructor.

And it will help make those methods consistent.

Of course explicit override should still be possible.

Arne




Robert Klemme

unread,
Aug 13, 2012, 1:17:38 PM8/13/12
to
Actually it wasn't me who brought up the term. :-)

> If broken means not good, then you are right.
> If broken means not correct, then you are wrong. I suspect
> that broken is not very well defined.

Right. :-)

>> The whole idea of hashing is based on the fact that the hash code
>> _somehow_ represents the item to be hashed. If all items have the same
>> constant hash code there is no point in hashing at all. So while it
>> does work, it does not work as intended.
>
> It disable the entire hashing functionality and a HashMap get
> characteristics of ArrayList.

An ArrayList allows multiple occurrences of the same instance - and does
not store key value pairs. That won't be the case with HashMap as
equals() (if properly implemented) will prevent that (albeit slowly, or
more correct: slower than with a proper implementation of hashCode()).
Also, a HashMap will degenerate more to a LinkedList via the chaining of
a bucket's entries.

> But the code should actually work.

Yes, sort of - depending on whether O(1) is a requirement or not.
Still, the idea to implement hashCode() in Object in a way to return a
constant to avoid necessity of overriding it is crazy - especially since
you then cannot efficiently use Object as hash key - which you can today.
It is loading more messages.
0 new messages