Hi,
We're running spymemcached and couchbase through some basic failover scenarios.
1. Run two nodes in a cluster
2. Stop node #2
3. Check to make sure spymemached still works
4. Start node #2
5. Rebalance
6. See that spymemcached fails due to Concurrent modification
The thread is failing due to a ConcurrentModificationException when node is brought back up...
2012-07-02 09:53:51.085 INFO com.couchbase.client.CouchbaseConnection: Added {QA sa=/
10.100.1.64:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
Exception in thread "Memcached IO over {MemcachedConnection to cache02.corp.ezrez/
10.100.1.64:11210 cache01.corp.ezrez/
10.100.1.63:11210}" java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
at java.util.HashMap$KeyIterator.next(HashMap.java:828)
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:248)
at com.couchbase.client.CouchbaseConnection.run(CouchbaseConnection.java:229)
We're using spymemcached 2.8.1 with couchbase server 1.8
Here's the code in question where selector.keys() is looped over.
// see if any connections blew up with large number of timeouts
for (SelectionKey sk : selector.keys()) {
MemcachedNode mn = (MemcachedNode) sk.attachment();
if (mn.getContinuousTimeout() > timeoutExceptionThreshold) {
getLogger().warn("%s exceeded continuous timeout threshold", sk);
lostConnection(mn);
}
}
Any help much appreciated.
Thanks,
Owen
# SEE LOGS BELOW
# NODE 2 IS DOWN
2012-07-02 09:52:44.227 WARN com.couchbase.client.CouchbaseConnection: Closing, and reopening {QA sa=cache02.corp.ezrez/
10.100.1.64:11210, #Rops=0, #Wops=11, #iq=0, topRop=null, topWop=Cmd: 0 Opaque: 763 Key: 92fe0769.d0d2.4a35.88e0.ca069cd154ae.WEBTIER_EXCEPTION, toWrite=0, interested=0}, attempt 14.
2012-07-02 09:53:14.231 INFO com.couchbase.client.CouchbaseConnection: Reconnecting {QA sa=cache02.corp.ezrez/
10.100.1.64:11210, #Rops=0, #Wops=11, #iq=0, topRop=null, topWop=Cmd: 0 Opaque: 763 Key: 92fe0769.d0d2.4a35.88e0.ca069cd154ae.WEBTIER_EXCEPTION, toWrite=0, interested=0}
2012-07-02 09:53:14.231 INFO com.couchbase.client.CouchbaseConnection: Connection state changed for sun.nio.ch.SelectionKeyImpl@fb95e78
2012-07-02 09:53:14.232 INFO com.couchbase.client.CouchbaseConnection: Reconnecting due to failure to connect to {QA sa=cache02.corp.ezrez/
10.100.1.64:11210, #Rops=0, #Wops=11, #iq=0, topRop=null, topWop=Cmd: 0 Opaque: 763 Key: 92fe0769.d0d2.4a35.88e0.ca069cd154ae.WEBTIER_EXCEPTION, toWrite=0, interested=0}
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:369)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:242)
at com.couchbase.client.CouchbaseConnection.run(CouchbaseConnection.java:229)
2012-07-02 09:53:14.232 WARN com.couchbase.client.CouchbaseConnection: Closing, and reopening {QA sa=cache02.corp.ezrez/
10.100.1.64:11210, #Rops=0, #Wops=11, #iq=0, topRop=null, topWop=Cmd: 0 Opaque: 763 Key: 92fe0769.d0d2.4a35.88e0.ca069cd154ae.WEBTIER_EXCEPTION, toWrite=0, interested=0}, attempt 15.
# NODE 2 IS BROUGHT BACK UP
2012-07-02 09:53:44.258 INFO com.couchbase.client.CouchbaseConnection: Reconnecting {QA sa=cache02.corp.ezrez/
10.100.1.64:11210, #Rops=0, #Wops=11, #iq=0, topRop=null, topWop=Cmd: 0 Opaque: 763 Key: 92fe0769.d0d2.4a35.88e0.ca069cd154ae.WEBTIER_EXCEPTION, toWrite=0, interested=0}
2012-07-02 09:53:44.259 INFO com.couchbase.client.CouchbaseConnection: Connection state changed for sun.nio.ch.SelectionKeyImpl@474a9bee
2012-07-02 09:53:51.085 INFO com.couchbase.client.CouchbaseConnection: Added {QA sa=/
10.100.1.64:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
Exception in thread "Memcached IO over {MemcachedConnection to cache02.corp.ezrez/
10.100.1.64:11210 cache01.corp.ezrez/
10.100.1.63:11210}" java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
at java.util.HashMap$KeyIterator.next(HashMap.java:828)
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:248)
at com.couchbase.client.CouchbaseConnection.run(CouchbaseConnection.java:229)
// LATER, CONTINUOUS ERRORS
2012-07-02 09:58:32.794 WARN com.couchbase.client.CouchbaseConnection: Node exepcted to receive data is inactive. This could be due to afailure within the cluster. Will check for updated configuration. Key without a configured node is: null.SESSION_DESCRIPTOR.
Jul 2, 2012 9:58:32 AM com.couchbase.client.CouchbaseConnectionFactory checkConfigUpdate
WARNING: No reconnect required, though check requested. Current config check is 1 out of a threshold of 10.