how to use maps and multimaps with byte arrays as keys and values?

1,038 views
Skip to first unread message

ib84

unread,
Sep 16, 2012, 6:49:14 AM9/16/12
to haze...@googlegroups.com
Hello,

I'm trying to use Multimaps and Maps with already-serialized data, i.e. with both keys and values being byte arrays. ContainsXY and remove don't work.
I saw the Gotchas in javadoc, but they suggest to me that the hashing is done on the serialized form by hazelcast and do not rely on implementations of hashCode and equals (which are not provided in case of byte arrays). So, I don't understand how there can be any problem.

Could please somebody tell me how to use hazelcast maps with byte arrays?

thanks for any help.

regards,
ingvar


Problem description:

scala> val mm:MultiMap[Array[Byte], Array[Byte]] = hi.getMultiMap("test-baMuMa")
mm: com.hazelcast.core.MultiMap[Array[Byte],Array[Byte]] = MultiMap [test-baMuMa]

scala> mm.put("a".getBytes,"b".getBytes)
res29: Boolean = true

scala> mm.put("a".getBytes,"b".getBytes)
res30: Boolean = true
// => should return false..

scala> mm.remove("a".getBytes,"b".getBytes)
res31: Boolean = false
// =>  should return true

scala> mm.containsEntry("a".getBytes,"b".getBytes)
res32: Boolean = false
// =>  should return true

Mehmet Dogan

unread,
Sep 17, 2012, 8:23:14 AM9/17/12
to haze...@googlegroups.com
Hazelcast always uses serialized (binary) form's hashCode and equals for keys. 

But operations rely on value argument – such as remove(key, value),  containsEntry(key, value), replace(key, oldValue, newValue) – use hashCode and equals implementations of value. In this case value class is byte[] and equals of byte[] class (which is actually Object.equals()) is used.

@mmdogan






--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/BPPNKFIB1b8J.
To post to this group, send email to haze...@googlegroups.com.
To unsubscribe from this group, send email to hazelcast+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hazelcast?hl=en.

Ingvar Bogdahn

unread,
Sep 17, 2012, 12:43:26 PM9/17/12
to haze...@googlegroups.com
Thank you Mehmet for the answer.
So, in order to store byte [] as value, it is necessary to store in a
wrapper class which implements hashCode and equals?


2012/9/17 Mehmet Dogan <meh...@hazelcast.com>:

Mehmet Dogan

unread,
Sep 17, 2012, 1:28:50 PM9/17/12
to haze...@googlegroups.com
Right, that will serve.

@mmdogan

Ingvar Bogdahn

unread,
Sep 17, 2012, 2:04:00 PM9/17/12
to haze...@googlegroups.com
ok thanks. Hence, it is not possible to store BLOBs directly. Any
database-like library should be able to do that. Filed an issue about
that here:
https://github.com/hazelcast/hazelcast/issues/273

2012/9/17 Mehmet Dogan <meh...@hazelcast.com>:

Tim Peierls

unread,
Sep 17, 2012, 3:29:29 PM9/17/12
to haze...@googlegroups.com
I don't think this is a good idea. Hazelcast might be "database-like" in some ways, but I wouldn't want to see special treatment of byte[] on the basis of such a tenuous connection. 

It's easy enough to create your own wrapper type around byte[]. If you do, consider contributing it to the Hazelcast community.

--tim

Ingvar Bogdahn

unread,
Sep 17, 2012, 5:14:54 PM9/17/12
to haze...@googlegroups.com
Hi Tim,

I'm not sure I understand well what you mean by "tenuous connection".
Are you denying that byte arrays are an important? In my opinion, it
should be supported natively. Raw data should not need to be wrapped
and unwrapped unnecessarily.

I'm suggesting a simple thing: in the map implementation there should
just be one more if type check. If value is a byte array (or any
primitive array by that matter), then calculate hashcode using
java.util.Arrays.hashCode. Is that a big deal?
And also, if it's a byte[], can't serialization be omitted all together?

Creating the wrapper is indeed very simple, that is not the problem,
but rather it's an annoying indirection. Forking is not an option,
because I planing to use in another library, and I don't want it to
depend on a soon outdated fork.

I don' see a solid argument against making such a trivial change for
better supporting a fundamental datatype.

regards,
ingvar


2012/9/17 Tim Peierls <t...@peierls.net>:

Tim Peierls

unread,
Sep 17, 2012, 6:51:40 PM9/17/12
to haze...@googlegroups.com
On Mon, Sep 17, 2012 at 5:14 PM, Ingvar Bogdahn <ingvar....@googlemail.com> wrote:
I'm not sure I understand well what you mean by "tenuous connection".

Tenuous connection between Hazelcast and databases. I don't think of Hazelcast as being a database or being primarily about supporting database-like activity. While I think it's wonderful that people have been successful in using Hazelcast as part of database-like applications, I feel strongly that such applications should not dictate the shape of the core API, in particular, the semantics of IMaps.


Are you denying that byte arrays are an important? In my opinion, it
should be supported natively. Raw data should not need to be wrapped
and unwrapped unnecessarily.

Not saying that byte arrays aren't important, just that they don't deserve special support for use as keys or values in maps. I don't expect this kind of support in plain Java maps, so I wouldn't expect it in Hazelcast IMaps.

Write one wrapper type -- Blob, say -- that implements DataSerializable with writeBytes, equals with Arrays.equals, and hashCode with Arrays.hashCode, and you can eat your cake and have it, too. I see nothing wrong with providing such a type as part of core Hazelcast, btw.

 
I'm suggesting a simple thing: in the map implementation there should
just be one more if type check. If value is a byte array (or any
primitive array by that matter), then calculate hashcode using
java.util.Arrays.hashCode. Is that a big deal?

I think it is, for several reasons:
  • I think people who didn't ask for this extra behavior would be very annoyed to have the extra instanceof check foisted upon them. 
  • Why should byte[] get special treatment and not int[], long[], and double[]. What about short[] and float[]?
  • There are already some gotchas in the way that IMaps (intentionally) violate the Map contract; there's no need to widen the gap.
 
And also, if it's a byte[], can't serialization be omitted all together?

That's a potential performance optimization to consider. Right now it looks as though byte[] gets treated like Object in SerializationHelper.writeObject, but maybe there's an advantage to testing for it outside of the call to writeObject, in the way that MAP and COLLECTION are tested for. That's something the Hazelcast team will know more about.

But this is a separate issue: serialization handling of basic types is not visible to users. 

 
Creating the wrapper is indeed very simple, that is not the problem,
but rather it's an annoying indirection. Forking is not an option,
because I planing to use in another library, and I don't want it to
depend on a soon outdated fork.

I wasn't suggesting forking anything. Are you saying that you are using another library that works in terms of Map<K, byte[]> and depends on element equality semantics of values? 

 
I don't see a solid argument against making such a trivial change for

better supporting a fundamental datatype.

The most solid argument I can make is that what you're proposing is not how regular Java maps behave. You might be able to get away with identity semantics in a single JVM, but in the distributed world you can't rely on a reference you retrieve from a map at one point being equal to the reference retrieved the same way a moment later, even if the Map hasn't changed.

--tim

Ingvar Bogdahn

unread,
Sep 18, 2012, 3:43:21 AM9/18/12
to haze...@googlegroups.com
Hi again,

I can understand that changing existing IMap methods might be out of
question, already for backwards-compatibility. I also understand that
a Map is not expected to calculate the hashs of its values(*), but it
should call hashCode on the key / value objects (#).

However, Java is plain flawed in that central datastructures like
arrays don't provide adequate hashCode, that's why we are having
trouble. Hazelcast is already double inconsistent, since for keys it
does * (inconsistent with map), whereas for values it does #
(inconsistent with itself). Personally, I can live with Hazelcast
being more liberal with the map contract for the mentioned reasons,
but I found it bad and I'm not the first, that it is inconsistent with
itself.

Suggestions:
1. provide additional variants of all relevant map methods such that
the hash code can be manually provided for key and value. This maybe
interesting for several reasons:
- it would allow solve the problem of storing primitive arrays without
wrappers and without Map braking silently the contract
- opens the possibility to use other hash algos, for example the more
performant Murmur. This may be particularly relevant for reducing the
overhead of deep hashing as I'd need it.
- if I understand correctly, hazelcast is a DHT which can be conceived
as a ring of hashvalues -> key ownership. So specifying the hash could
allow to influence where keys are stored in the cluster, by using
adequate starting positions of that hash. This inturn would allow to
have near-cache-like performance without the downsides.

2. new IMap variant with hashcode's being consistently overridden for
both key and value, and additionally providing the possibility of
specifying hashs manually.

I repeat that it is not a problem of implementing it myself, if it was
for my personal usage, but I'm planning to use Hazelcast as a storage
backend for another library (hypergraphdb). Therefore, forking is out
of question for me.

best,
Ingvar

2012/9/18 Tim Peierls <t...@peierls.net>:

Jason Clawson

unread,
Sep 18, 2012, 11:35:36 AM9/18/12
to haze...@googlegroups.com
Hazelcast is inconsistent with itself in several places. Here, for one, and Multimap in many places. Sometimes it uses the serialized form for equals and hashCode, sometimes it doesn't. And the docs are wrong in a few places about when this switch happens.

I am a fan of always using the serialized form for equals and hashCode. It's consistent, and more performant because it can reduce the amount of deserialization operations needed to run comparisons.

Tim, you state that it doesn't make sense to compare references in a distributed system, but then say it should function like Java Maps. You can't have both. While this behavior isn't surprising to someone using IMap like a plain old java Map, it is surprising to those who've read the hazelcast documentation O.o

remove(key, value) : "This method uses hashCode and equals of binary form of the key, not the actual implementations of hashCode and equals defined in key's class."

-.- le sigh

So far in the few months I have been on this list, this has bitten 4 people including myself. Either fix the docs, or fix the code.

Mehmet Dogan

unread,
Sep 18, 2012, 11:43:04 AM9/18/12
to haze...@googlegroups.com

Docs here is not wrong,

remove(key, value) : "This method uses hashCode and equals of binary form of the key, not the actual implementations of hashCode and equals defined in key's class."

That expression is about key, not the value. And before in this thread what I said was also the same ;

"Hazelcast always uses serialized (binary) form's hashCode and equals for keys. But operations rely on value argument – such as remove(key, value), containsEntry(key, value), replace(key, oldValue, newValue) – use hashCode and equals implementations of value."

@mmdogan

~ Sent from mobile

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/c7V9J_7wIzYJ.

Tim Peierls

unread,
Sep 18, 2012, 1:44:45 PM9/18/12
to haze...@googlegroups.com
On Tue, Sep 18, 2012 at 11:35 AM, Jason Clawson <jcla...@qualys.com> wrote:
remove(key, value) : "This method uses hashCode and equals of binary form of the key, not the actual implementations of hashCode and equals defined in key's class."

So far in the few months I have been on this list, this has bitten 4 people including myself.  Either fix the docs, or fix the code.

I'm in favor of fixing the docs.

--tim

Ingvar Bogdahn

unread,
Sep 19, 2012, 3:21:54 AM9/19/12
to haze...@googlegroups.com
Fixing the doc, does only reduce confusion of the status quo. It does
not help with the fact, that byte arrays have to be wrapped into
wrapper classes in order to be stored as values in maps.

Nobody willing to comment on the suggestions below? Other suggestions?
Reply all
Reply to author
Forward
0 new messages