WrongTargetException: WrongTarget! target=null version 3.5.1

397 views
Skip to first unread message

Ortal

unread,
Nov 5, 2015, 10:33:02 AM11/5/15
to Hazelcast

Hello,

I have integrated hazelcast for cache services in my application, my environment consist of 2 nodes, while working on the application and restarting one node at a time I am getting exception on node startup, seem as it try to execute operation- ContainsKey from partition on the other node (which was down), the target is set to null already, so the retry invocation keeps failing, eventually my servlet init fail :

com.hazelcast.spi.exception.WrongTargetException: WrongTarget! this:Address[15.224.237.141]:5701, target:null, partitionId: 213, replicaIndex: 0, operation: com.hazelcast.map.impl.operation.ContainsKeyOperation, service: hz:impl:mapService

                at com.hazelcast.spi.impl.operationservice.impl.Invocation.initInvocationTarget(Invocation.java:288)

                at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:222)

                at com.hazelcast.spi.impl.operationservice.impl.Invocation.run(Invocation.java:262)

                at com.hazelcast.spi.impl.operationservice.impl.PartitionInvocation.run(PartitionInvocation.java:28)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

                at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76)

                at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92)

 

can you tell me if this is a bug, or if there is a way to handle/recover from it?

Jaromir Hamala

unread,
Nov 5, 2015, 11:11:57 AM11/5/15
to Hazelcast
Hi,

it seems like a bug to me. As this point you have just a single member alive - it should own all partitions. Could you post here a bit of more of context? A full log from both members would be great. 

Cheers,
Jaromir

Peter Litvak

unread,
Nov 5, 2015, 11:25:52 AM11/5/15
to haze...@googlegroups.com
We had the same/similar issue with rolling restarts of the EC2 instances.

-- 
peter....@gmail.com
Sent with Airmail
--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/d0f77ba5-0972-44d4-9af8-9914f4ea4907%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ortal

unread,
Nov 8, 2015, 4:39:50 AM11/8/15
to Hazelcast

Hi,

Thanks for your responses.

I tracked the stack trace, my flow is as follow:

 

1)

public final class PartitionInvocation extends Invocation {

 

    public PartitionInvocation(NodeEngineImpl nodeEngine, String serviceName, Operation op, int partitionId,

                               int replicaIndex, int tryCount, long tryPauseMillis, long callTimeout,

                               Object callback, boolean resultDeserialized) {

    @Override

    public Address getTarget() {

        return getPartition().getReplicaAddress(replicaIndex);

    }

 

}

 

abstract class Invocation implements ResponseHandler, Runnable {

 

 

    boolean initInvocationTarget() {

        Address thisAddress = nodeEngine.getThisAddress();

 

        invTarget = getTarget(); // null is returned

}

 

                               

public final class PartitionInvocation extends Invocation {

 

 

    @Override

    public Address getTarget() {

        return getPartition().getReplicaAddress(replicaIndex);

    }

 

               

class InternalPartitionImpl implements InternalPartition {

 

    @Override

    public Address getReplicaAddress(int replicaIndex) {

        return addresses[replicaIndex];  

è Null as  owner member left the cluster

    }

 

 

I have a few questions regarding the flow I described:
                1. If the target is null why there is a point to retry anyhow?

2. Maybe at first place false should be returned when calling containsKey operation in such use case?

3. doesn’t  it seems like a bug?

4. How do you suggest that we will handle such case as currently the server may fail to start?

 

Thanks 


On Thursday, November 5, 2015 at 5:33:02 PM UTC+2, Ortal wrote:

Peter Veentjer

unread,
Nov 8, 2015, 8:11:08 AM11/8/15
to haze...@googlegroups.com
It can be that the owner of that partition is not yet known or it is invalid, e.g. pointing to a ex-member.

So in itself it isn't a problem. When this exception is thrown, it is intercepted and the invocation retried (with some delay) because at some point the target will be available.

But that is exactly the problem... apparently the target is not set in time; so eventually the invocation gives up retrying and then this exception is propagated to the end user.

So the root problem is: why is this owner not set in time.

I already made an issue to provide a bit more informative exception:
https://github.com/hazelcast/hazelcast/issues/6668

But this doesn't solve the root cause of this problem. It smells like a bug and I don't know how to work around this particular issue. It is unclear to me why the owner of this partition
is not set within a given time window (and this time window is pretty big... since the default retry count is 250 and there is 500ms between invocations, so that 125 seconds --> more than 2 minutes.
 

2. Maybe at first place false should be returned when calling containsKey operation in such use case?

3. doesn’t  it seems like a bug?

4. How do you suggest that we will handle such case as currently the server may fail to start?

 

Thanks 


On Thursday, November 5, 2015 at 5:33:02 PM UTC+2, Ortal wrote:

Hello,

I have integrated hazelcast for cache services in my application, my environment consist of 2 nodes, while working on the application and restarting one node at a time I am getting exception on node startup, seem as it try to execute operation- ContainsKey from partition on the other node (which was down), the target is set to null already, so the retry invocation keeps failing, eventually my servlet init fail :

com.hazelcast.spi.exception.WrongTargetException: WrongTarget! this:Address[15.224.237.141]:5701, target:null, partitionId: 213, replicaIndex: 0, operation: com.hazelcast.map.impl.operation.ContainsKeyOperation, service: hz:impl:mapService

                at com.hazelcast.spi.impl.operationservice.impl.Invocation.initInvocationTarget(Invocation.java:288)

                at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:222)

                at com.hazelcast.spi.impl.operationservice.impl.Invocation.run(Invocation.java:262)

                at com.hazelcast.spi.impl.operationservice.impl.PartitionInvocation.run(PartitionInvocation.java:28)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

                at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76)

                at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92)

 

can you tell me if this is a bug, or if there is a way to handle/recover from it?

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.
Reply all
Reply to author
Forward
0 new messages