ConcurrentModficationException within spy.memcached.MemcachedConnection.handleIO method when recovering from failover

145 views
Skip to first unread message

Owen Tran

unread,
Jul 2, 2012, 1:29:39 PM7/2/12
to spymem...@googlegroups.com
Hi,
We're running spymemcached and couchbase through some basic failover scenarios.

1. Run two nodes in a cluster
2. Stop node #2
3. Check to make sure spymemached still works
4. Start node #2
5. Rebalance
6. See that spymemcached fails due to Concurrent modification

The thread is failing due to a ConcurrentModificationException when node is brought back up...

2012-07-02 09:53:51.085 INFO com.couchbase.client.CouchbaseConnection:  Added {QA sa=/10.100.1.64:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
Exception in thread "Memcached IO over {MemcachedConnection to cache02.corp.ezrez/10.100.1.64:11210 cache01.corp.ezrez/10.100.1.63:11210}" java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
at java.util.HashMap$KeyIterator.next(HashMap.java:828)
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:248)
at com.couchbase.client.CouchbaseConnection.run(CouchbaseConnection.java:229)

We're using spymemcached 2.8.1 with couchbase server 1.8

Here's the code in question where selector.keys() is looped over.

    // see if any connections blew up with large number of timeouts
    for (SelectionKey sk : selector.keys()) {
      MemcachedNode mn = (MemcachedNode) sk.attachment();
      if (mn.getContinuousTimeout() > timeoutExceptionThreshold) {
        getLogger().warn("%s exceeded continuous timeout threshold", sk);
        lostConnection(mn);
      }
    }

Any help much appreciated. 

Thanks,
Owen


# SEE LOGS BELOW
# NODE 2 IS DOWN

2012-07-02 09:52:44.227 WARN com.couchbase.client.CouchbaseConnection:  Closing, and reopening {QA sa=cache02.corp.ezrez/10.100.1.64:11210, #Rops=0, #Wops=11, #iq=0, topRop=null, topWop=Cmd: 0 Opaque: 763 Key: 92fe0769.d0d2.4a35.88e0.ca069cd154ae.WEBTIER_EXCEPTION, toWrite=0, interested=0}, attempt 14.
2012-07-02 09:53:14.231 INFO com.couchbase.client.CouchbaseConnection:  Reconnecting {QA sa=cache02.corp.ezrez/10.100.1.64:11210, #Rops=0, #Wops=11, #iq=0, topRop=null, topWop=Cmd: 0 Opaque: 763 Key: 92fe0769.d0d2.4a35.88e0.ca069cd154ae.WEBTIER_EXCEPTION, toWrite=0, interested=0}
2012-07-02 09:53:14.231 INFO com.couchbase.client.CouchbaseConnection:  Connection state changed for sun.nio.ch.SelectionKeyImpl@fb95e78
2012-07-02 09:53:14.232 INFO com.couchbase.client.CouchbaseConnection:  Reconnecting due to failure to connect to {QA sa=cache02.corp.ezrez/10.100.1.64:11210, #Rops=0, #Wops=11, #iq=0, topRop=null, topWop=Cmd: 0 Opaque: 763 Key: 92fe0769.d0d2.4a35.88e0.ca069cd154ae.WEBTIER_EXCEPTION, toWrite=0, interested=0}
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:369)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:242)
at com.couchbase.client.CouchbaseConnection.run(CouchbaseConnection.java:229)
2012-07-02 09:53:14.232 WARN com.couchbase.client.CouchbaseConnection:  Closing, and reopening {QA sa=cache02.corp.ezrez/10.100.1.64:11210, #Rops=0, #Wops=11, #iq=0, topRop=null, topWop=Cmd: 0 Opaque: 763 Key: 92fe0769.d0d2.4a35.88e0.ca069cd154ae.WEBTIER_EXCEPTION, toWrite=0, interested=0}, attempt 15.

# NODE 2 IS BROUGHT BACK UP

2012-07-02 09:53:44.258 INFO com.couchbase.client.CouchbaseConnection:  Reconnecting {QA sa=cache02.corp.ezrez/10.100.1.64:11210, #Rops=0, #Wops=11, #iq=0, topRop=null, topWop=Cmd: 0 Opaque: 763 Key: 92fe0769.d0d2.4a35.88e0.ca069cd154ae.WEBTIER_EXCEPTION, toWrite=0, interested=0}
2012-07-02 09:53:44.259 INFO com.couchbase.client.CouchbaseConnection:  Connection state changed for sun.nio.ch.SelectionKeyImpl@474a9bee
2012-07-02 09:53:51.085 INFO com.couchbase.client.CouchbaseConnection:  Added {QA sa=/10.100.1.64:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
Exception in thread "Memcached IO over {MemcachedConnection to cache02.corp.ezrez/10.100.1.64:11210 cache01.corp.ezrez/10.100.1.63:11210}" java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
at java.util.HashMap$KeyIterator.next(HashMap.java:828)
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:248)
at com.couchbase.client.CouchbaseConnection.run(CouchbaseConnection.java:229)

// LATER, CONTINUOUS ERRORS

2012-07-02 09:58:32.794 WARN com.couchbase.client.CouchbaseConnection:  Node exepcted to receive data is inactive.  This could be due to afailure within the cluster.  Will check for updated configuration.  Key without a configured node is: null.SESSION_DESCRIPTOR.
Jul 2, 2012 9:58:32 AM com.couchbase.client.CouchbaseConnectionFactory checkConfigUpdate
WARNING: No reconnect required, though check requested. Current config check is 1 out of a threshold of 10.

ingenthr

unread,
Jul 30, 2012, 11:56:36 PM7/30/12
to spymemcached
Your colleague just posted on this too. See my reply on his.
Reply all
Reply to author
Forward
0 new messages