Yes we know that and there is already an issue for that.
http://code.google.com/p/hazelcast/issues/detail?id=153
Before than that there is another way to implement custom
serialization(similar to externalizable)
Here is how you do it:
http://www.hazelcast.com/documentation.jsp#InternalsSerialization.
Hope this will help.
> --
> You received this message because you are subscribed to the Google Groups "Hazelcast" group.
> To post to this group, send email to haze...@googlegroups.com.
> To unsubscribe from this group, send email to hazelcast+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/hazelcast?hl=en.
>
>
--
@fuadm
Fuad
We discussed this internally here is couple of points that we must
consider in this situation. Even though what I suggested to you on the
other thread was better than your approach but there is a better a
way.
First I will answer to your question about threads. For Queues and
Topics we must guarantee the order of notifications. So it is
guaranteed that the Queue is FIFO and the messages on Topic are sent
to the listeners in the appropriate order. For the sake of order there
is only one thread per Queue and Topic who notifies the listeners. And
since the appropriate event receiving methods pass the actual object,
those objects are deserialized by that thread. And this is the actual
bottleneck. Instead of actual Object we should have passed the
MessageObject kind of thing where the actual object is serialized or
deserialized only when you call messageObject.getObject(). This way
you could receive the events in ordered way and process them in as
many threads as you want. And all deserialization would be done on
your threads. We plan to change this on future Hazelcast versions.
This will solve two issues
1. Deserialization that takes on 1 thread
2. Classloading problem; your classloader can be different from what
Hazelcast uses.
So for now,
We know that there is only one thread who notifies. I am redefining my
suggestion as to use the Queue approach. There you can control the
number of threads. I suggested to use on thread which in while loop
calls q.take. This is essential if you care the order. If you don't,
you can have multiple threads who wait on q.take and either process
the item themselves or again send them to other threadpool (executor
service) that processes them. This way
1. you can concurrently take from the Queue
2. The deserialization will be done on user thread, so you can control it.
We expect this result to perform much better.
Hope i didn't confuse you.
Fuad
Here is a more detailed answer.
For queues and topics we need to guarantee the order of messages.
Therefore there is just one thread who notifies the listener. And
since we are passing the object itself that thread have to deserialize
it. Which as you said can be very costly. So the actual bottleneck is
this one thread who notifies.
On Tue, Nov 2, 2010 at 10:32 AM, mongonix <romi...@gmail.com> wrote:
On Wed, Nov 3, 2010 at 10:14 AM, mongonix <romi...@gmail.com> wrote:
> Hi Fuad,
>
>> We discussed this internally here is couple of points that we must
>> consider in this situation. Even though what I suggested to you on the
>> other thread was better than your approach but there is a better a
>> way.
>
> It's getting interesting.
>
>> First I will answer to your question about threads. For Queues and
>> Topics we must guarantee the order of notifications. So it is
>> guaranteed that the Queue is FIFO and the messages on Topic are sent
>> to the listeners in the appropriate order. For the sake of order there
>> is only one thread per Queue and Topic who notifies the listeners.
>
> BTW, what if I need a guaranteed order of notifications only for the
> messages from the same source?
> But there is no restrictions for the messages from different sources
> sent to the same destination.
> How would you model that? Multiple queues or topics - one per source?
> Are there any "cheaper" alternatives?
Yes, it seems that it is the way you should do it.
>
>> And since the appropriate event receiving methods pass the actual object,
>> those objects are deserialized by that thread. And this is the actual
>> bottleneck.
>
> Thanks! This analysis makes our current situation very clear.
>
>> Instead of actual Object we should have passed the
>> MessageObject kind of thing where the actual object is serialized or
>> deserialized only when you call messageObject.getObject(). This way
>> you could receive the events in ordered way and process them in as
>> many threads as you want. And all deserialization would be done on
>> your threads. We plan to change this on future Hazelcast versions.
>> This will solve two issues
>> 1. Deserialization that takes on 1 thread
>> 2. Classloading problem; your classloader can be different from what
>> Hazelcast uses.
>
> Very cool! Sounds like a good plan. I'm looking forward to see it in
> trunk.
Can you please create this as an issue, and add a link to this
conversation. You'll be notified where there will be an action on the
issue.
>
> Few comments on this feature:
> a) messageObject.getObject() should probably remember its result, once
> it is called, to avoid a deserialization of the same object happening
> multiple times.
> b) If there are multiple listeners inside the same JVM/classloader,
> this approach would result in every listener doing its own
> deserialization of the same object. May be this can be optimized
> somehow.
>
Agree, can you also add these to the issue
>> So for now,
>> We know that there is only one thread who notifies. I am redefining my
>> suggestion as to use the Queue approach. There you can control the
>> number of threads. I suggested to use on thread which in while loop
>> calls q.take. This is essential if you care the order. If you don't,
>> you can have multiple threads who wait on q.take and either process
>> the item themselves or again send them to other threadpool (executor
>> service) that processes them. This way
>> 1. you can concurrently take from the Queue
>
>> 2. The deserialization will be done on user thread, so you can control it.
> I guess you assume that I use byte[] approach and do deserialization
> in my threads, right?
No, since this approach does not use the listeners. The bottleneck
occurs when you use listeners. Then Hazelcast notifies you and it uses
one thread for each Queue or Topic to preserve the order.
If you'll not use listeners and have your own threads that takes from
the queue then deserialization will be done on your threads. And you
can increase the number of threads that concurrently waits on take and
consumes the queue. With this approach you get all the benefits but
give up the order.
But if you will put and get to Hazelcast the bytes[] then all problems
I described above disappears. Hazelcast will never try to
serialize/deserialize your objects and having one thread notifying you
want be a big problem.
>
> I have a few comments on the queue-based approach as such:
> a) q.take() is a blocking operation, as you say. What if I need to
> wait for inputs from thousands of queues? I guess for such scenarios
> (which are not so seldom, as one may think), the q.take() approach
> does not scale. There are two alternatives:
> - use asynchronous notifications (as it is the case with listeners
> now)
> - introduce a UNIX select-like functionality, i.e. blocking wait
> for an input from one of the provided sources.
> This can be either an API call or can be eventually modeled as a
> "virtual" queue or topic that in fact collects from multiple sources
> (I think it would be rather elegant to implement it as a virtual queue
> or topic).
You can give a timeout to the queue.take(). it will wait till that
timeout and if nothing comes, will return null. But I agree that still
it is not a very good way.
>
> I think it is very important and useful to support this kind of
> functionality
>
> b) What about control over the serialization? Which thread does it -
> user's thread or Hazelcast thread? When it happens? Should it be
> symmetric in approach to deserialization?
I think I have answered to this question above. If it is not clear let me know.
>
> c) Imagine the situation, where Hazelcast is used for both intra-JVM
> and distributed JVM setup, where concrete topology is decided at run-
> time. It may very well happen, that all producers and consumers of a
> given distributed datastructure (queue, topic, map) are on the same
> JVM or even in the same class-loader. It would be nice in this case,
> if Hazelcast could optimize for this situation and eventually avoid
> the (de)serialization all together, unless it is really required to
> support remote listeners. It would be similar to the local EJB vs.
> remote EJB invocations optimization, where in local case objects are
> passed by reference, instead of copying them. Of course, passing by
> reference would mean that listeners should not modify the message
> object (i.e. it is immutable), but this is acceptable in many cases
> and can be even indicated at the API level may be. I.e. by doing
> something like. myQueue.setProperty("immutable.messages", true);
Serialization happens when you put into the Queue(Map, Topic) and
Hazelcast never stores your objects. So it is must to occur. Imagine
what if you add another remote member and the data starts to migrate
or the backup is taken to that remote member. So serialization should
happen whenever you put.
But for maps we also keep the reference to deserialized Object for
fast local access. For maps it makes sense since you do get the same
object more than once. But it doesn't make sense for queues and
topics. They are one time objects. I agree that there can be
optimizations but there a certain rules we don't want to change for
sake if simplicity and portability.
>
> d) Minor proposal: It would be nice if listeners could get also the
> name of the shared datastructure that triggered them. This would allow
> for using the same listener to listen for (thousands of ) multiple
> inputs. This is a pure convenience and can be modeled even now, but
> with wrapper objects.
I agree, when we will implement the messageObject stuff it can be add
like messageObject.source
>
> e) Minor proposal: It would be nice to have notifications about
> changes of a given instance of data-structure. Such a notification
> should not contain the object itself. It should just say which
> datastructure (e.g. name) and what kind of change (e.g. add, remove,
> update, destroy). Therefore, such a notification is rather cheap. And
> the listener can then decide what to do next, e.g. it may want to
> perform an expensive operation like obtaining the changed object. The
> advantage of this approach is that it avoids expensive sending/
> serialization/deserialization of complex objects and lets user code
> decide what to do when changes happen. It also suites the asynchronous
> (IO) model much better as it avoids any blocking calls.
For maps the notification passes the event object and you can specify
while adding a listener whether you want the value to be passed with
event object or not. Again these optimizations can be done with
messageObject.
>
> Please, let me know what think about these comments and proposals. Do
> they make sense?
>
>> We expect this result to perform much better.
>
>> Hope i didn't confuse you.
>
> Not at all, your explanation is very clear. Thanks a lot!
>
> -Roman
Best Regards,
Fuad