ChronicleMap - SIGSEV on Linux under load

37 views
Skip to first unread message

Neil Avery

unread,
Nov 9, 2015, 7:03:50 AM11/9/15
to Chronicle
Hi all,

First off - chronicle map looks amazing! We are looking at moving from MapDB.

The first implementation as part of our message throughput is to use the Map replacement. (Experimented with Queue - but thought this off-heap dropin would be simpler to start with)

During unit and benchmarking tests we are seeing excellent performance and no crashes (200+k/sec), however on Deployment we experience SIGSEV's. Concurrent benchmarking test on OSX experienced no problems.

The goal is to process 500+k messages per second depending on the hardware, it certainly looks possible .

As you can see from the code below its all pretty simple.

Any help very much appreciated!

Regards,
Neil.

Version Info: chronicle-map-2.3.9

// TimeUID and ReplayEvent implement Externalizable

Construction: 
chonmap = ChronicleMapBuilder
.of(TimeUID.class, ReplayEvent.class)
.entries(A10_MILLION_ENTRIES)
.averageValueSize(1 * 1024)
.create();
// consumer-service-thread
this.future = scheduler.scheduleWithFixedDelay(this, 1, Integer.getInteger("replay.queue.pump",2), TimeUnit.MILLISECONDS);


NB: Publisher threads call onto the handle method (8-12 of these)

public void handle(ReplayEvent event) {
if (cancelled || chonmap == null) {
return ;
}


        chonmap.put(event.getId(), event);
        outgoing.add(event.getId());
}

Consumer is a single thread (service-thread):

public void run(){

try {
          int chunkSize = 4 * 1024;
ArrayList<ReplayEvent> sending = null;
            int bytes = 0;

while (outgoing.size() > 0) {
TimeUID item = outgoing.poll(10, TimeUnit.MILLISECONDS);
if (cancelled) return;
if (item != null) {

if (sending == null) sending = new ArrayList<ReplayEvent>();
ReplayEvent item1 = chonmap.remove(item);
                    bytes += item1.getRawData().length();
sending.add(item1);
}

if (bytes >  chunkSize) {
replayHandler.handle(sending);
sending = null;
bytes = 0;
}
}
if (sending != null && sending.size() > 0) {
replayHandler.handle(sending);
}
} catch (Throwable t){
LOGGER.warn("Update to handler failed:" + t, t);
if (t.getMessage().contains("RetryInvocationException: SendFailed.Throwable:noSender")) {
replayHandler = null;
}
} finally {
if (isExpired() || request.isCancelled()) {
cancel();
}
}
}

There are 2 hs_err files providing information


hs_err  1.information indicates the consumer thread is failing....

#

# A fatal error has been detected by the Java Runtime Environment:

#

#  SIGSEGV (0xb) at pc=0x00007fa0dd6827fc, pid=17495, tid=140324411561728

#

# JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build 1.7.0_51-b13)

# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode linux-amd64 compressed oops)

# Problematic frame:

# J  net.openhft.chronicle.map.VanillaChronicleMap$Segment.writeLock()Lnet/openhft/chronicle/map/VanillaChronicleMap$WriteLocked;

#

# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

#

# If you would like to submit a bug report, please visit:

#   http://bugreport.sun.com/bugreport/crash.jsp

#


---------------  T H R E A D  ---------------


Current thread (0x00007fa03031a800):  JavaThread "services-1-6" daemon [_thread_in_Java, id=17785, stack(0x00007f9fd2a45000,0x00007f9fd2b46000)]



ANOTHER hs_err2 provides more information around the  possible cause:

net.openhft.lang.io.NativeBytes

 - klass: 'net/openhft/lang/io/NativeBytes'

RDI=

[error occurred during error reporting (printing register info), id 0xb]


Stack: [0x00007f2374c4a000,0x00007f2374c8b000],  sp=0x00007f2374c88f60,  free space=251k

Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)

net.openhft.lang.io.AbstractBytes.addLong(JJ)J

j  net.openhft.chronicle.map.VanillaChronicleMap$Segment.removeEntry(Lnet/openhft/lang/threadlocal/ThreadLocalCopies;Lnet/openhft/chronicle/map/SearchState;Ljava/lang/Object;JLnet/openhft/chronicle/map/InstanceOrBytesToInstance;Lnet/openhft/chronicle/map/InstanceOrBytesToInstance;Lnet/openhft/chronicle/map/ReadValue;ZLnet/openhft/chronicle/map/MultiMap;Lnet/openhft/lang/io/MultiStoreBytes;JJJZZZ)Ljava/lang/Object;+106

j  net.openhft.chronicle.map.VanillaChronicleMap$Segment.removeWithoutLock(Lnet/openhft/lang/threadlocal/ThreadLocalCopies;Lnet/openhft/chronicle/map/VanillaChronicleMap$SegmentState;Lnet/openhft/chronicle/hash/serialization/internal/MetaBytesInterop;Ljava/lang/Object;Ljava/lang/Object;JLnet/openhft/chronicle/map/InstanceOrBytesToInstance;Lnet/openhft/chronicle/map/GetValueInterops;Ljava/lang/Object;Lnet/openhft/chronicle/map/InstanceOrBytesToInstance;JLnet/openhft/chronicle/map/ReadValue;Z)Ljava/lang/Object;+216

j  net.openhft.chronicle.map.VanillaChronicleMap$Segment.remove(Lnet/openhft/lang/threadlocal/ThreadLocalCopies;Lnet/openhft/chronicle/map/VanillaChronicleMap$SegmentState;Lnet/openhft/chronicle/hash/serialization/internal/MetaBytesInterop;Ljava/lang/Object;Ljava/lang/Object;JLnet/openhft/chronicle/map/InstanceOrBytesToInstance;Lnet/openhft/chronicle/map/GetValueInterops;Ljava/lang/Object;Lnet/openhft/chronicle/map/InstanceOrBytesToInstance;JLnet/openhft/chronicle/map/ReadValue;Z)Ljava/lang/Object;+48

j  net.openhft.chronicle.map.VanillaChronicleMap.remove(Lnet/openhft/lang/threadlocal/ThreadLocalCopies;Lnet/openhft/chronicle/map/VanillaChronicleMap$SegmentState;Lnet/openhft/chronicle/hash/serialization/internal/MetaBytesInterop;Ljava/lang/Object;Ljava/lang/Object;JLnet/openhft/chronicle/map/InstanceOrBytesToInstance;Lnet/openhft/chronicle/map/GetValueInterops;Ljava/lang/Object;Lnet/openhft/chronicle/map/InstanceOrBytesToInstance;Lnet/openhft/chronicle/map/ReadValue;Z)Ljava/lang/Object;+62

j  net.openhft.chronicle.map.VanillaChronicleMap.removeIfValueIs(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+106

j  net.openhft.chronicle.map.VanillaChronicleMap.remove(Ljava/lang/Object;)Ljava/lang/Object;+3

j  com.liquidlabs.log.space.agg.ChronicleReplayAggregator.run()V+81



---------------  S Y S T E M  ---------------


OS:wheezy/sid


uname:Linux 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:31:23 UTC 2012 x86_64

libc:glibc 2.15 NPTL 2.15

rlimit: STACK 8192k, CORE 0k, NPROC 31487, NOFILE 80000, AS infinity

load average:8.87 2.54 2.55


Roman Leventov

unread,
Nov 9, 2015, 7:41:54 AM11/9/15
to java-ch...@googlegroups.com
Thanks for the detailed report,

This might be usage-after-close. Please check you don't access the map after close() called on it.


--
You received this message because you are subscribed to the Google Groups "Chronicle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-chronicl...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages