Concurrent Map problems

105 views
Skip to first unread message

Michael Freeman

unread,
Mar 23, 2011, 8:48:53 AM3/23/11
to haze...@googlegroups.com
One of my team members has managed to induce multiple problems while working with multimaps.
We're using Hazelcast to share dynamically created test data among a bank of Java-based automated regression tests.
We have 7 Hazelcast instances running as standalone Java apps.
The only thing I've changed from the default configuration is the group name and password.
Our test suites connect to the instances as native clients, coming and going as various things launch them.
We're running Hazelcast 1.9.2.2



His basic workflow is like this:    

HazelcastClient hc = HazelcastClient.newHazelcastClient("", "", "");
MultiMap<String, String> mm = hc.<String, String>getMultiMap("");
List<String> keys = new ArrayList<String>(mm.keySet());
Random r = new Random();
int randomKeyIndex = r.nextInt(keys.size());
String key = keys.get(randomKeyIndex);
if (mm.get(key) == null || mm.get(key).size() == 0) {
    mm.remove(key);
} else {
    boolean locked = mm.tryLock(key);
    List<String> values = new ArrayList<String>(mm.get(key));
    Random r1 = new Random();
    int randomValueIndex = r1.nextInt(values.size());
    String value = values.get(randomValueIndex);
    Map<String, Object> dataMap = hc.getMap("maps.test.data." + key + "." + value);
    if (locked) {
        mm.unlock(key);
        mm.remove(key);
    } else {
        mm.remove(key);
    }
    //do other stuff with dataMap now
}
In short we use multimaps as lookup tables to find specific maps that get fed in by other code.

As soon as I can produce a generic example that triggers the problems he's induced I will post it here.




Here's a log snippet of the sort of problems that just seem to suddenly appear after a couple hours of him running his tests.

It looks like we've stumbled across an infinite redo deadlock that hasn't been eliminated yet.

2011-03-23 07:31:00,125 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:00,125 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391793], firstEnqueue=32676sn., enqueueCount=65343, Request{name='c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65342', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:00,234 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:00,234 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391794], firstEnqueue=32554sn., enqueueCount=65100, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329',CONCURRENT_MAP_BACKUP_PUT, redoCount='65099', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:00,234  INFO ConcurrentMapManager:? - [automation] ======= 391794: CONCURRENT_MAP_BACKUP_PUT ========
    thisAddress= Address[10.53.68.67:5702], target= Address[10.53.69.23:5702]
    targetMember= Member [10.53.69.23:5702], targetConn=null, targetBlock=Block [62] owner=Address[10.53.69.24:5702] migrationAddress=null
    Property Information -> Property Address 2 Re-doing [65100] times! c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329 : Suite 3250
2011-03-23 07:31:00,312 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:00,312 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391795], firstEnqueue=32676sn., enqueueCount=65345, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65344', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:00,390 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:00,390 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:00,390 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391796], firstEnqueue=32555sn., enqueueCount=65098, Request{name='c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083',CONCURRENT_MAP_BACKUP_PUT, redoCount='65097', lockThreadId='119'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:00,390 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391797], firstEnqueue=32677sn., enqueueCount=65342, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300843836370.287600241',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65341', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:00,468 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:00,468 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391798], firstEnqueue=32555sn., enqueueCount=65101, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300843836370.287600241',CONCURRENT_MAP_BACKUP_PUT, redoCount='65100', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:00,625 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:00,625 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391799], firstEnqueue=32677sn., enqueueCount=65344, Request{name='c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65343', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:00,734 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:00,734 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391800], firstEnqueue=32555sn., enqueueCount=65101, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329',CONCURRENT_MAP_BACKUP_PUT, redoCount='65100', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:00,812 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:00,812 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391801], firstEnqueue=32677sn., enqueueCount=65346, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65345', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:00,890 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:00,890 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:00,890 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391802], firstEnqueue=32555sn., enqueueCount=65099, Request{name='c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083',CONCURRENT_MAP_BACKUP_PUT, redoCount='65098', lockThreadId='119'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:00,890 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391803], firstEnqueue=32677sn., enqueueCount=65343, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300843836370.287600241',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65342', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:00,953 DEBUG ClusterManager:64 - [automation] not sending heartbeat because connection is null or not live Address[10.53.69.23:5702]
2011-03-23 07:31:00,968 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:00,968 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391804], firstEnqueue=32555sn., enqueueCount=65102, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300843836370.287600241',CONCURRENT_MAP_BACKUP_PUT, redoCount='65101', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:01,125 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:01,125 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391805], firstEnqueue=32677sn., enqueueCount=65345, Request{name='c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65344', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:01,234 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:01,234 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391806], firstEnqueue=32555sn., enqueueCount=65102, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329',CONCURRENT_MAP_BACKUP_PUT, redoCount='65101', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:01,312 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:01,312 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391807], firstEnqueue=32677sn., enqueueCount=65347, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65346', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:01,390 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:01,390 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[10.53.69.23:5702]
2011-03-23 07:31:01,390 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391808], firstEnqueue=32556sn., enqueueCount=65100, Request{name='c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083',CONCURRENT_MAP_BACKUP_PUT, redoCount='65099', lockThreadId='119'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:01,390 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391809], firstEnqueue=32678sn., enqueueCount=65344, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300843836370.287600241',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65343', lockThreadId='-1'}, target=Address[10.53.69.23:5702]} Packet cannot be sent to Address[10.53.69.23:5702]
2011-03-23 07:31:01,390  INFO ConcurrentMapManager:? - [automation] ======= 391808: CONCURRENT_MAP_BACKUP_PUT ========
    thisAddress= Address[10.53.68.67:5702], target= Address[10.53.69.23:5702]
    targetMember= Member [10.53.69.23:5702], targetConn=null, targetBlock=Block [62] owner=Address[10.53.69.24:5702] migrationAddress=null
    Property Information -> Property Address 2 Re-doing [65100] times! c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083 : Suite 3430

And last night calling size() on a MultiMap somewhere managed to induce a stack overflow exception

2011-03-22 22:05:02,381  WARN ClientService:? - [automation] exception during handling CONCURRENT_MAP_SIZE: java.lang.reflect.InvocationTargetException
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at com.hazelcast.impl.FactoryImpl$MProxyImpl$DynamicInvoker.invoke(FactoryImpl.java:1909)
    at $Proxy0.size(Unknown Source)
    at com.hazelcast.impl.FactoryImpl$MProxyImpl.size(FactoryImpl.java:2101)
    at com.hazelcast.impl.FactoryImpl$MultiMapProxyImpl$MultiMapReal.size(FactoryImpl.java:1799)
    at com.hazelcast.impl.FactoryImpl$MultiMapProxyImpl.size(FactoryImpl.java:1688)
    at com.hazelcast.impl.ClientService$MapSizeHandler.doMultiMapOp(ClientService.java:741)
    at com.hazelcast.impl.ClientService$ClientCollectionOperationHandler.processCall(ClientService.java:1139)
    at com.hazelcast.impl.ClientService$ClientOperationHandler.handle(ClientService.java:1062)
    at com.hazelcast.impl.ClientRequestHandler.doRun(ClientRequestHandler.java:41)
    at com.hazelcast.impl.FallThroughRunnable.run(FallThroughRunnable.java:23)
    at com.hazelcast.impl.ClientService$Worker.run(ClientService.java:161)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at com.hazelcast.impl.FactoryImpl$MProxyImpl$DynamicInvoker.invoke(FactoryImpl.java:1902)
    ... 11 more
Caused by: java.lang.StackOverflowError
    at java.text.DateFormat.format(Unknown Source)
    at org.apache.log4j.helpers.PatternParser$DatePatternConverter.convert(PatternParser.java:443)
    at org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65)
    at org.apache.log4j.PatternLayout.format(PatternLayout.java:506)
    at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
    at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276)
    at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
    at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
    at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
    at org.apache.log4j.Category.callAppenders(Category.java:206)
    at com.hazelcast.logging.Log4jFactory$Log4jLogger.log(Log4jFactory.java:121)
    at com.hazelcast.logging.LoggingServiceImpl$DefaultLogger.log(LoggingServiceImpl.java:121)
    at com.hazelcast.logging.LoggingServiceImpl$DefaultLogger.log(LoggingServiceImpl.java:108)
    at com.hazelcast.impl.BaseManager$MultiCall.logRedo(BaseManager.java:829)
    at com.hazelcast.impl.BaseManager$MultiCall.call(BaseManager.java:881)
    at com.hazelcast.impl.BaseManager$MultiCall.call(BaseManager.java:884)
(at this point "at com.hazelcast.impl.BaseManager$MultiCall.call(BaseManager.java:884)" repeats a couple hundred more times so I cut it out)

Michael Freeman

unread,
Mar 23, 2011, 10:39:41 AM3/23/11
to Hazelcast
Another discovery in my logfiles from last night. a size() call blew
up with "java.lang.RuntimeException cannot be cast to
java.lang.Integer" in our client programs and below is what I found on
the instance node's logs
This was an ordinary getMultiMap("").size() call melting down.

2011-03-22 19:25:08,242 WARN ClientService:? - [automation] exception
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.hazelcast.impl.FactoryImpl$MProxyImpl
$DynamicInvoker.invoke(FactoryImpl.java:1902)
... 11 more
Caused by: java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(Unknown
Source)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.release(Unknown
Source)
at java.util.concurrent.locks.ReentrantLock.unlock(Unknown Source)
at com.hazelcast.util.ResponseQueueFactory
$LockBasedResponseQueue.poll(ResponseQueueFactory.java:68)
at com.hazelcast.impl.BaseManager
$ResponseQueueCall.waitAndGetResult(BaseManager.java:470)
at com.hazelcast.impl.BaseManager$SubCall.getResult(BaseManager.java:
946)
at com.hazelcast.impl.BaseManager
$RequestBasedCall.getResultAsObject(BaseManager.java:373)
at com.hazelcast.impl.BaseManager
$ResponseQueueCall.getResultAsObject(BaseManager.java:445)
at com.hazelcast.impl.BaseManager
$RequestBasedCall.getResultAsObject(BaseManager.java:369)
at com.hazelcast.impl.BaseManager
$ResponseQueueCall.getResultAsObject(BaseManager.java:445)
at com.hazelcast.impl.BaseManager$MultiCall.call(BaseManager.java:
879)
at com.hazelcast.impl.BaseManager$MultiCall.call(BaseManager.java:
884)
(" at com.hazelcast.impl.BaseManager$MultiCall.call(BaseManager.java:
884)" repeats another 1000 times or so in the stack trace)

Fuad Malikov

unread,
Apr 11, 2011, 9:08:11 AM4/11/11
to haze...@googlegroups.com, Michael Freeman
The redo is the problem. The stackoverflowexception is thrown becouse of two many recursive redo method is called. I am looking for the code to reproduce it.

Fuad

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To post to this group, send email to haze...@googlegroups.com.
To unsubscribe from this group, send email to hazelcast+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hazelcast?hl=en.



--

@fuadm
Reply all
Reply to author
Forward
0 new messages