Some initial feedback about using custom serialization in Hazelcast 3.0

689 views
Skip to first unread message

mongonix

unread,
Mar 1, 2013, 5:09:46 PM3/1/13
to haze...@googlegroups.com
Hi,

I checked it out and played a bit with the HZ 3.0 branch.  Specifically, I tested the new and shiny support for custom serialization.

The classes that I took as a basis for my experiments are MainPortable & Co from the PortableTest.java. I basically create MainPortable object (which is pretty big and complex) and then serialize it 1000000 times using HZ built-in mechanisms and using an alternative custom serializer.

So, here comes my initial feedback about the custom serialization support in HZ 3.0:

1) It works!!! HZ finally supports this feature. Very well done! No need for wrapper classes anymore.

    I was able to specify my own serializer for some of my types. To do it, I used the following syntax in the config file:

    <serialization>
        <portable-version>0</portable-version>
        <serializers>
             <type-serializer type-class="name.of.class.to.be.serialized">name.of.serializer.class</type-serializer>
        </serializers>
    </serialization>

    BTW, a few questions about this:
    - Is it possible to use wildcards for specifying the class to be serialized? It could be useful, if the same serializer is supposed to serialize a lot of different classes.
    - Another minor thing that I noticed is: Even if I specify my custom serializer for a certain class, but this class implements DataSerializable or Portable, then HZ would ignore my custom serializer from the config file and use the one described by the interface. I think the configuration file setting, if provided, should override those interfaces.

2) I picked Kryo as an alternative serialization framework, just to see how it would compare to HZ built-in serialization. The outcome on those tests: Kryo is 2 times faster than HZs DataSerializable serialization.

3) When it comes to the implementation of custom serializers, I implemented TypeSerializer as prescribed by HZ. The implementation is pretty easy and straight forward, just a few lines of code. But I think it is not as efficient as it could be due to the current limitations of TypeSerializer interface. Let me explain. 
  
   The "write" method that I override expects that I write a binary representation into a DataObjectOutput stream. 
  
   Since I use a totally different serialization framework, which does not support DataObjectOutput stream out of the box, I first need to serialize my objects using Kryo and produce a byte array with a binary representation (this is a first pass over the binary representation). Then I have to write it into DataObjectOutput using write(byte[]) (this is a second pass over the binary representation). Later, when the "write" method returns, HZ would perform DataObjectOutput.toBytes() or something like this, thus allocating a new byte array and copying the binary representation to it (this is a third pass). So, we copy our binary representation 2 times more than required...
   
  I think the reason for this is the fact that there is no way just to return a byte array from the write method. Therefore we have to write byte arrays to streams first and then do the same in the opposite direction. And this kills performance. Therefore, I'd suggest to extend the TypeSerializer interface with methods that return byte arrays (and may be ByteBuffers) or take them as parameters (for read methods). What do you think of it?

4) Another minor issue I noticed is: I think HZ assumes that serializers are always thread-safe, i.e. the same instance of a serializer can be used by multiple threads. While it is true for many serialization frameworks, it is not always the case. Kryo, for example, is not thread-safe. May be there are others. Of course, it is not a big problem to implement a workaround in a custom serializer class derived from a TypeSerializer. I just mentioned it here for the sake of completeness.

Cheers,
  Leo

Peter Veentjer

unread,
Mar 2, 2013, 12:49:18 AM3/2/13
to haze...@googlegroups.com
On Sat, Mar 2, 2013 at 12:09 AM, mongonix <romi...@gmail.com> wrote:
> Hi,
>
> I checked it out and played a bit with the HZ 3.0 branch. Specifically, I
> tested the new and shiny support for custom serialization.
>
> The classes that I took as a basis for my experiments are MainPortable & Co
> from the PortableTest.java. I basically create MainPortable object (which is
> pretty big and complex) and then serialize it 1000000 times using HZ
> built-in mechanisms and using an alternative custom serializer.
>
> So, here comes my initial feedback about the custom serialization support in
> HZ 3.0:
>
> 1) It works!!! HZ finally supports this feature. Very well done! No need for
> wrapper classes anymore.
>
> I was able to specify my own serializer for some of my types. To do it,
> I used the following syntax in the config file:
>
> <serialization>
> <portable-version>0</portable-version>
> <serializers>
> <type-serializer
> type-class="name.of.class.to.be.serialized">name.of.serializer.class</type-serializer>
> </serializers>
> </serialization>

This is how you register global serializers; a serializer that is
being used if no other serialize mechanism is found.

What you also can do is to make use of a portable factory

<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config

http://www.hazelcast.com/schema/config/hazelcast-config-3.0.xsd"
xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<serialization>
<portable-factory-class>PortableFactoryImpl</portable-factory-class>
</serialization>
</hazelcast>

import com.hazelcast.nio.serialization.Portable;
import com.hazelcast.nio.serialization.PortableFactory;

public class PortableFactoryImpl implements PortableFactory {
public final static int PERSON_CLASS_ID = 1;

public Portable create(int classId) {
switch (classId) {
case PERSON_CLASS_ID:
return new Person();
}
return null;
}
}

import com.hazelcast.nio.serialization.Portable;
import com.hazelcast.nio.serialization.PortableReader;
import com.hazelcast.nio.serialization.PortableWriter;

import java.io.IOException;

public class Person implements Portable {
private String name;

Person() {
}

public Person(String name) {
this.name = name;
}

@Override
public int getClassId() {
return PortableFactoryImpl.PERSON_CLASS_ID;
}

@Override
public void writePortable(PortableWriter writer) throws IOException {
writer.writeUTF("name", name);
}

@Override
public void readPortable(PortableReader reader) throws IOException {
this.name = reader.readUTF("name");
}

@Override
public String toString() {
return String.format("Person(name=%s)", name);
}
}



>
> BTW, a few questions about this:
> - Is it possible to use wildcards for specifying the class to be
> serialized? It could be useful, if the same serializer is supposed to
> serialize a lot of different classes.

+1

> - Another minor thing that I noticed is: Even if I specify my custom
> serializer for a certain class, but this class implements DataSerializable
> or Portable, then HZ would ignore my custom serializer from the config file
> and use the one described by the interface. I think the configuration file
> setting, if provided, should override those interfaces.

It is because you are now configuring the default mechanism. Check the previous
example.


> 2) I picked Kryo as an alternative serialization framework, just to see how
> it would compare to HZ built-in serialization. The outcome on those tests:
> Kryo is 2 times faster than HZs DataSerializable serialization.

Since serialization afaik is the bottleneck in Hazelcast, this is very
good to know.

I'm sure the team will improve performance over time.

> 3) When it comes to the implementation of custom serializers, I implemented
> TypeSerializer as prescribed by HZ. The implementation is pretty easy and
> straight forward, just a few lines of code. But I think it is not as
> efficient as it could be due to the current limitations of TypeSerializer
> interface. Let me explain.
>
> The "write" method that I override expects that I write a binary
> representation into a DataObjectOutput stream.
>
> Since I use a totally different serialization framework, which does not
> support DataObjectOutput stream out of the box, I first need to serialize my
> objects using Kryo and produce a byte array with a binary representation
> (this is a first pass over the binary representation). Then I have to write
> it into DataObjectOutput using write(byte[]) (this is a second pass over the
> binary representation). Later, when the "write" method returns, HZ would
> perform DataObjectOutput.toBytes() or something like this, thus allocating a
> new byte array and copying the binary representation to it (this is a third
> pass). So, we copy our binary representation 2 times more than required...

Sounds like a lot of room for optimizations.

>
> I think the reason for this is the fact that there is no way just to
> return a byte array from the write method. Therefore we have to write byte
> arrays to streams first and then do the same in the opposite direction. And
> this kills performance. Therefore, I'd suggest to extend the TypeSerializer
> interface with methods that return byte arrays (and may be ByteBuffers) or
> take them as parameters (for read methods). What do you think of it?
>
> 4) Another minor issue I noticed is: I think HZ assumes that serializers are
> always thread-safe, i.e. the same instance of a serializer can be used by
> multiple threads. While it is true for many serialization frameworks, it is
> not always the case. Kryo, for example, is not thread-safe. May be there are
> others. Of course, it is not a big problem to implement a workaround in a
> custom serializer class derived from a TypeSerializer. I just mentioned it
> here for the sake of completeness.

I'll add this to the hz book :) Thanks.

>
> Cheers,
> Leo
>
> --
> You received this message because you are subscribed to the Google Groups
> "Hazelcast" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hazelcast+...@googlegroups.com.
> To post to this group, send email to haze...@googlegroups.com.
> Visit this group at http://groups.google.com/group/hazelcast?hl=en-US.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

mongonix

unread,
Mar 2, 2013, 1:21:53 AM3/2/13
to haze...@googlegroups.com
Hi,
The factory approach is nice, I haven't realized it before you mentioned it. I'm wondering if it is only possible to have PortableFactories?
Would it be possible to have factories of TypeSerializers serving the same purpose and with a similar API? The only difference would be that they use a different format than portable serializers.

-Leo

Peter Veentjer

unread,
Mar 2, 2013, 2:15:15 AM3/2/13
to haze...@googlegroups.com
I made a mistake in my previous comment:

Hazelcast has 2 TypeSerializer registration mechanisms:

<serialization>
<serializers>
<type-serializer
type-class="Person">PersonTypeSerializer</type-serializer>
</serializers>
</serialization>

And

<serialization>
<serializers>
<global-serializer>PersonTypeSerializer</global-serializer>
</serializers>
</serialization>

The last is used as a default mechanism in case no other serialization
mechanism is found.

The first solution can be used if you want to rely on explicitly
registered type serializers.

Now getting back to your question:

What would be the advantage of using a factory based approach? The
factory will not help in finding the correct
TypeSerializers; it only creates instances of objects that need to be
deserialized.

mongonix

unread,
Mar 2, 2013, 2:38:07 AM3/2/13
to haze...@googlegroups.com
Ah, OK. Now it is clear.
 
Now getting back to your question:

What would be the advantage of using a factory based approach? The
factory will not help in finding the correct
TypeSerializers; it only creates instances of objects that need to be
deserialized.

It seems I misunderstood the intention of portable factories. You are right, there would be no advantage with their current semantics.

Mehmet Dogan

unread,
Mar 4, 2013, 5:14:00 AM3/4/13
to haze...@googlegroups.com

Thanks for the feedback!

I checked it out and played a bit with the HZ 3.0 branch.  Specifically, I tested the new and shiny support for custom serialization.

The classes that I took as a basis for my experiments are MainPortable & Co from the PortableTest.java. I basically create MainPortable object (which is pretty big and complex) and then serialize it 1000000 times using HZ built-in mechanisms and using an alternative custom serializer.

So, here comes my initial feedback about the custom serialization support in HZ 3.0:

1) It works!!! HZ finally supports this feature. Very well done! No need for wrapper classes anymore.

    I was able to specify my own serializer for some of my types. To do it, I used the following syntax in the config file:

    <serialization>
        <portable-version>0</portable-version>
        <serializers>
             <type-serializer type-class="name.of.class.to.be.serialized">name.of.serializer.class</type-serializer>
        </serializers>
    </serialization>

    BTW, a few questions about this:
    - Is it possible to use wildcards for specifying the class to be serialized? It could be useful, if the same serializer is supposed to serialize a lot of different classes.

At the moment wildcards are not supported. Instead type class can be a super-class or an interface. Hazelcast will pick the most specific one (first scan super-classes then interfaces etc). 

 
    - Another minor thing that I noticed is: Even if I specify my custom serializer for a certain class, but this class implements DataSerializable or Portable, then HZ would ignore my custom serializer from the config file and use the one described by the interface. I think the configuration file setting, if provided, should override those interfaces.

This is intentional. For DataSerializable and Portable, we want to skip serializer lookup phase immediately. And also making a class DataSerializable or Portable and also registering a different serializer for that type seems a bit ambiguous.

 

2) I picked Kryo as an alternative serialization framework, just to see how it would compare to HZ built-in serialization. The outcome on those tests: Kryo is 2 times faster than HZs DataSerializable serialization.

How did you compare these two? Implementing a Kryo type serializer for Hazelcast or .. ?
 

3) When it comes to the implementation of custom serializers, I implemented TypeSerializer as prescribed by HZ. The implementation is pretty easy and straight forward, just a few lines of code. But I think it is not as efficient as it could be due to the current limitations of TypeSerializer interface. Let me explain. 
  
   The "write" method that I override expects that I write a binary representation into a DataObjectOutput stream. 
  
   Since I use a totally different serialization framework, which does not support DataObjectOutput stream out of the box, I first need to serialize my objects using Kryo and produce a byte array with a binary representation (this is a first pass over the binary representation). Then I have to write it into DataObjectOutput using write(byte[]) (this is a second pass over the binary representation). Later, when the "write" method returns, HZ would perform DataObjectOutput.toBytes() or something like this, thus allocating a new byte array and copying the binary representation to it (this is a third pass). So, we copy our binary representation 2 times more than required...
   
  I think the reason for this is the fact that there is no way just to return a byte array from the write method. Therefore we have to write byte arrays to streams first and then do the same in the opposite direction. And this kills performance. Therefore, I'd suggest to extend the TypeSerializer interface with methods that return byte arrays (and may be ByteBuffers) or take them as parameters (for read methods). What do you think of it?


Yes, we are aware of this problem. Generally most of serialization framework are able to write/read to/from streams, some of are not. Do you suggest an additional serializer interface that returns byte[] and read from byte[] or adding these write/read methods to TypeSerializer? If latter, how we will know which method to call?
 

4) Another minor issue I noticed is: I think HZ assumes that serializers are always thread-safe, i.e. the same instance of a serializer can be used by multiple threads. While it is true for many serialization frameworks, it is not always the case. Kryo, for example, is not thread-safe. May be there are others. Of course, it is not a big problem to implement a workaround in a custom serializer class derived from a TypeSerializer. I just mentioned it here for the sake of completeness.


Yes, TypeSerializers must be thread-safe. I think, it should not be a problem to implement a type serializer thread safe (as you already mentioned). Otherwise api will become a bit complex; implement a TypeSerializerFactory, implement a TypeSerializer .. etc.


@mmdogan
 

mongonix

unread,
Mar 4, 2013, 7:46:53 AM3/4/13
to haze...@googlegroups.com


On Monday, March 4, 2013 11:14:00 AM UTC+1, Mehmet Dogan wrote:

Thanks for the feedback!

I checked it out and played a bit with the HZ 3.0 branch.  Specifically, I tested the new and shiny support for custom serialization.

The classes that I took as a basis for my experiments are MainPortable & Co from the PortableTest.java. I basically create MainPortable object (which is pretty big and complex) and then serialize it 1000000 times using HZ built-in mechanisms and using an alternative custom serializer.

So, here comes my initial feedback about the custom serialization support in HZ 3.0:

1) It works!!! HZ finally supports this feature. Very well done! No need for wrapper classes anymore.

    I was able to specify my own serializer for some of my types. To do it, I used the following syntax in the config file:

    <serialization>
        <portable-version>0</portable-version>
        <serializers>
             <type-serializer type-class="name.of.class.to.be.serialized">name.of.serializer.class</type-serializer>
        </serializers>
    </serialization>

    BTW, a few questions about this:
    - Is it possible to use wildcards for specifying the class to be serialized? It could be useful, if the same serializer is supposed to serialize a lot of different classes.

At the moment wildcards are not supported. Instead type class can be a super-class or an interface. Hazelcast will pick the most specific one (first scan super-classes then interfaces etc). 

OK. This is nice to know. Wildcards can be still interesting, if you have a lot of classes without a common interface or super-class. But this is just a nice to have feature, not something you cannot live without.
 
 
    - Another minor thing that I noticed is: Even if I specify my custom serializer for a certain class, but this class implements DataSerializable or Portable, then HZ would ignore my custom serializer from the config file and use the one described by the interface. I think the configuration file setting, if provided, should override those interfaces.

This is intentional. For DataSerializable and Portable, we want to skip serializer lookup phase immediately. And also making a class DataSerializable or Portable and also registering a different serializer for that type seems a bit ambiguous.

I see. It absolutely makes sense. 
It was just inconvenient for writing the JUnit tests, as I took already existing classes from HZ tests (those classes implement Portable and/or DataSerializable) and wanted to serialize them using my own custom serializer. This didn't work, because this serializer was ignored based on the logic you just described ;-) But in a real world scenarios, it is not very likely to be a problem.

 
 

2) I picked Kryo as an alternative serialization framework, just to see how it would compare to HZ built-in serialization. The outcome on those tests: Kryo is 2 times faster than HZs DataSerializable serialization.

How did you compare these two? Implementing a Kryo type serializer for Hazelcast or .. ?

Yes. I implemented a Kryo type serializer for Hazelcast.
 
 

3) When it comes to the implementation of custom serializers, I implemented TypeSerializer as prescribed by HZ. The implementation is pretty easy and straight forward, just a few lines of code. But I think it is not as efficient as it could be due to the current limitations of TypeSerializer interface. Let me explain. 
  
   The "write" method that I override expects that I write a binary representation into a DataObjectOutput stream. 
  
   Since I use a totally different serialization framework, which does not support DataObjectOutput stream out of the box, I first need to serialize my objects using Kryo and produce a byte array with a binary representation (this is a first pass over the binary representation). Then I have to write it into DataObjectOutput using write(byte[]) (this is a second pass over the binary representation). Later, when the "write" method returns, HZ would perform DataObjectOutput.toBytes() or something like this, thus allocating a new byte array and copying the binary representation to it (this is a third pass). So, we copy our binary representation 2 times more than required...
   
  I think the reason for this is the fact that there is no way just to return a byte array from the write method. Therefore we have to write byte arrays to streams first and then do the same in the opposite direction. And this kills performance. Therefore, I'd suggest to extend the TypeSerializer interface with methods that return byte arrays (and may be ByteBuffers) or take them as parameters (for read methods). What do you think of it?


Yes, we are aware of this problem. Generally most of serialization framework are able to write/read to/from streams, some of are not.

You are right. Most of serialization frameworks are able to write/read to/from streams. Kryo can do it as well. 

The problem is: Each serialization framework provides its own implementation of streams and APIs are different between frameworks. You cannot tell a serialization framework to use Hazelcast streams directly. As a result, you end-up using framework specific streams, then once you serialized your object into such a framework specific stream, you do something like "stream.getBytes()" to get your byte[]. In principle, this is exactly what Hazelcast needs at the end. But since a custom type serializer is not currently allowed to return byte[], you have to write it into a Hazelcast-provided stream. And later, Hazelcast would execute hzStream.getBytes() or something like this to get a binary representation of the key.


 
Do you suggest an additional serializer interface that returns byte[] and read from byte[] or adding these write/read methods to TypeSerializer? If latter, how we will know which method to call?

Good question. I don't have a strong opinion here. One could introduce a new interface similar to TypeSerializer, but using byte[] instead of streams. 
Or one would add a new methods to TypeSerializer. In this latter case, to indicate which method to use (i.e. stream vs byte[]), one could introduce a dedicated attribute in the config file, e.g. "use-byte-arrays". If present and set to true, it woud mean that byte[] versions should be used. And by default Hazelcast would use the current stream-based approach. Does it make sense?
 
 

4) Another minor issue I noticed is: I think HZ assumes that serializers are always thread-safe, i.e. the same instance of a serializer can be used by multiple threads. While it is true for many serialization frameworks, it is not always the case. Kryo, for example, is not thread-safe. May be there are others. Of course, it is not a big problem to implement a workaround in a custom serializer class derived from a TypeSerializer. I just mentioned it here for the sake of completeness.


Yes, TypeSerializers must be thread-safe. I think, it should not be a problem to implement a type serializer thread safe (as you already mentioned). Otherwise api will become a bit complex; implement a TypeSerializerFactory, implement a TypeSerializer .. etc.


Agreed. As I explained it was mentioned more for the sake of completeness.
 
-Leo


@mmdogan
 

Barry Lagerweij

unread,
Mar 4, 2013, 2:11:01 PM3/4/13
to haze...@googlegroups.com
Can you confirm that with the HZ3 serialization framework a custom serializer can finally 'stream' the object to/from a InputStream/OutputStream ? Currently with HZ2 it still uses byte-arrays and FastByteArray streams, would love to send large objects over the wire...

Thanks, Barry




@mmdogan
 

--

mongonix

unread,
Mar 5, 2013, 3:01:50 AM3/5/13
to haze...@googlegroups.com

On Monday, March 4, 2013 8:11:01 PM UTC+1, barryl wrote:
Can you confirm that with the HZ3 serialization framework a custom serializer can finally 'stream' the object to/from a InputStream/OutputStream ? Currently with HZ2 it still uses byte-arrays and FastByteArray streams, would love to send large objects over the wire...

Thanks, Barry


Yes. I can confirm it. HZ3 custom serialization framework provides the following interface for your custom serializers:

public interface TypeSerializer<T> {

    int getTypeId();

    void write(ObjectDataOutput out, T object) throws IOException;

    T read(ObjectDataInput in) throws IOException;

    void destroy();

}

As you can see, you can use ObjectDataOutput and ObjectDataInput streams. And these streams are derived from java.io.DataInput and java.io.DataOutput.

I hope this answers your question.

-Leo
Reply all
Reply to author
Forward
0 new messages