One of my team members has managed to induce multiple problems while working with multimaps.
We're using Hazelcast to share dynamically created test data among a bank of Java-based automated regression tests.
We have 7 Hazelcast instances running as standalone Java apps.
The only thing I've changed from the default configuration is the group name and password.
Our test suites connect to the instances as native clients, coming and going as various things launch them.
We're running Hazelcast 1.9.2.2
His basic workflow is like this:
HazelcastClient hc = HazelcastClient.newHazelcastClient("", "", "");MultiMap<String, String> mm = hc.<String, String>getMultiMap("");List<String> keys = new ArrayList<String>(mm.keySet());Random r = new Random();int randomKeyIndex = r.nextInt(keys.size());String key = keys.get(randomKeyIndex);if (mm.get(key) == null || mm.get(key).size() == 0) { mm.remove(key);} else { boolean locked = mm.tryLock(key); List<String> values = new ArrayList<String>(mm.get(key)); Random r1 = new Random(); int randomValueIndex = r1.nextInt(values.size()); String value = values.get(randomValueIndex); Map<String, Object> dataMap = hc.getMap("maps.test.data." + key + "." + value); if (locked) { mm.unlock(key); mm.remove(key); } else { mm.remove(key); } //do other stuff with dataMap now}In short we use multimaps as lookup tables to find specific maps that get fed in by other code.
As soon as I can produce a generic example that triggers the problems he's induced I will post it here.
Here's a log snippet of the sort of problems that just seem to suddenly appear after a couple hours of him running his tests.It looks like we've stumbled across an infinite redo deadlock that hasn't been eliminated yet.2011-03-23 07:31:00,125 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:00,125 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391793], firstEnqueue=32676sn., enqueueCount=65343, Request{name='c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65342', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:00,234 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:00,234 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391794], firstEnqueue=32554sn., enqueueCount=65100, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329',CONCURRENT_MAP_BACKUP_PUT, redoCount='65099', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:00,234 INFO ConcurrentMapManager:? - [automation] ======= 391794: CONCURRENT_MAP_BACKUP_PUT ========
thisAddress= Address[
10.53.68.67:5702], target= Address[
10.53.69.23:5702]
targetMember= Member [
10.53.69.23:5702], targetConn=null, targetBlock=Block [62] owner=Address[
10.53.69.24:5702] migrationAddress=null
Property Information -> Property Address 2 Re-doing [65100] times! c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329 : Suite 3250
2011-03-23 07:31:00,312 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:00,312 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391795], firstEnqueue=32676sn., enqueueCount=65345, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65344', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:00,390 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:00,390 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:00,390 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391796], firstEnqueue=32555sn., enqueueCount=65098, Request{name='c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083',CONCURRENT_MAP_BACKUP_PUT, redoCount='65097', lockThreadId='119'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:00,390 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391797], firstEnqueue=32677sn., enqueueCount=65342, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300843836370.287600241',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65341', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:00,468 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:00,468 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391798], firstEnqueue=32555sn., enqueueCount=65101, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300843836370.287600241',CONCURRENT_MAP_BACKUP_PUT, redoCount='65100', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:00,625 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:00,625 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391799], firstEnqueue=32677sn., enqueueCount=65344, Request{name='c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65343', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:00,734 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:00,734 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391800], firstEnqueue=32555sn., enqueueCount=65101, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329',CONCURRENT_MAP_BACKUP_PUT, redoCount='65100', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:00,812 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:00,812 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391801], firstEnqueue=32677sn., enqueueCount=65346, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65345', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:00,890 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:00,890 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:00,890 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391802], firstEnqueue=32555sn., enqueueCount=65099, Request{name='c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083',CONCURRENT_MAP_BACKUP_PUT, redoCount='65098', lockThreadId='119'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:00,890 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391803], firstEnqueue=32677sn., enqueueCount=65343, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300843836370.287600241',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65342', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:00,953 DEBUG ClusterManager:64 - [automation] not sending heartbeat because connection is null or not live Address[
10.53.69.23:5702]
2011-03-23 07:31:00,968 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:00,968 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391804], firstEnqueue=32555sn., enqueueCount=65102, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300843836370.287600241',CONCURRENT_MAP_BACKUP_PUT, redoCount='65101', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:01,125 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:01,125 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391805], firstEnqueue=32677sn., enqueueCount=65345, Request{name='c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65344', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:01,234 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:01,234 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391806], firstEnqueue=32555sn., enqueueCount=65102, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329',CONCURRENT_MAP_BACKUP_PUT, redoCount='65101', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:01,312 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:01,312 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391807], firstEnqueue=32677sn., enqueueCount=65347, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300844071128.287600329',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65346', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:01,390 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_BACKUP_PUT BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:01,390 DEBUG ConcurrentMapManager:546 - [automation] CONCURRENT_MAP_MIGRATE_RECORD BeforeRedo target Address[
10.53.69.23:5702]
2011-03-23 07:31:01,390 DEBUG ConcurrentMapManager:? - [automation] MBackup{[391808], firstEnqueue=32556sn., enqueueCount=65100, Request{name='c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083',CONCURRENT_MAP_BACKUP_PUT, redoCount='65099', lockThreadId='119'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:01,390 DEBUG ConcurrentMapManager:? - [automation] MMigrate{[391809], firstEnqueue=32678sn., enqueueCount=65344, Request{name='c:manualreferral.Desktop Test Client CI4.Foreclosure.QA1.AutoCI4_1300843836370.287600241',CONCURRENT_MAP_MIGRATE_RECORD, redoCount='65343', lockThreadId='-1'}, target=Address[
10.53.69.23:5702]} Packet cannot be sent to Address[
10.53.69.23:5702]
2011-03-23 07:31:01,390 INFO ConcurrentMapManager:? - [automation] ======= 391808: CONCURRENT_MAP_BACKUP_PUT ========
thisAddress= Address[
10.53.68.67:5702], target= Address[
10.53.69.23:5702]
targetMember= Member [
10.53.69.23:5702], targetConn=null, targetBlock=Block [62] owner=Address[
10.53.69.24:5702] migrationAddress=null
Property Information -> Property Address 2 Re-doing [65100] times! c:manualreferral.null.Foreclosure.QA1.AutoCI1_1300846571761.287601083 : Suite 3430
And last night calling size() on a MultiMap somewhere managed to induce a stack overflow exception2011-03-22 22:05:02,381 WARN ClientService:? - [automation] exception during handling CONCURRENT_MAP_SIZE: java.lang.reflect.InvocationTargetException
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.hazelcast.impl.FactoryImpl$MProxyImpl$DynamicInvoker.invoke(FactoryImpl.java:1909)
at $Proxy0.size(Unknown Source)
at com.hazelcast.impl.FactoryImpl$MProxyImpl.size(FactoryImpl.java:2101)
at com.hazelcast.impl.FactoryImpl$MultiMapProxyImpl$MultiMapReal.size(FactoryImpl.java:1799)
at com.hazelcast.impl.FactoryImpl$MultiMapProxyImpl.size(FactoryImpl.java:1688)
at com.hazelcast.impl.ClientService$MapSizeHandler.doMultiMapOp(ClientService.java:741)
at com.hazelcast.impl.ClientService$ClientCollectionOperationHandler.processCall(ClientService.java:1139)
at com.hazelcast.impl.ClientService$ClientOperationHandler.handle(ClientService.java:1062)
at com.hazelcast.impl.ClientRequestHandler.doRun(ClientRequestHandler.java:41)
at com.hazelcast.impl.FallThroughRunnable.run(FallThroughRunnable.java:23)
at com.hazelcast.impl.ClientService$Worker.run(ClientService.java:161)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.hazelcast.impl.FactoryImpl$MProxyImpl$DynamicInvoker.invoke(FactoryImpl.java:1902)
... 11 more
Caused by: java.lang.StackOverflowError
at java.text.DateFormat.format(Unknown Source)
at org.apache.log4j.helpers.PatternParser$DatePatternConverter.convert(PatternParser.java:443)
at org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65)
at org.apache.log4j.PatternLayout.format(PatternLayout.java:506)
at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276)
at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at com.hazelcast.logging.Log4jFactory$Log4jLogger.log(Log4jFactory.java:121)
at com.hazelcast.logging.LoggingServiceImpl$DefaultLogger.log(LoggingServiceImpl.java:121)
at com.hazelcast.logging.LoggingServiceImpl$DefaultLogger.log(LoggingServiceImpl.java:108)
at com.hazelcast.impl.BaseManager$MultiCall.logRedo(BaseManager.java:829)
at com.hazelcast.impl.BaseManager$MultiCall.call(BaseManager.java:881)
at com.hazelcast.impl.BaseManager$MultiCall.call(BaseManager.java:884)
(at this point "at com.hazelcast.impl.BaseManager$MultiCall.call(BaseManager.java:884)" repeats a couple hundred more times so I cut it out)