When running performance tests I was surprised to find that with multiple producers (threads that log messages), throughput of the log4j (ArrayBlockingQueue-based) AsynchAppender was much higher than the Disruptor throughput.
Is this to be expected with multiple producers?
On Solaris 10 (64bit) with JDK1.7.0_06, 4-core Xeon X5570 CPU @2.93GHz with hyperthreading switched on (8 virtual cores):
Single Thread
DisruptorLogger throughput: 1,268,737 ops/sec. latency(ns): avg=808.8 99% < 1024.0 99.99% < 131072.0 (50000 samples)
AsynchAppender throughput: 857,867 ops/sec. latency(ns): avg=892.2 99% < 3686.4 99.99% < 29491.2 (50000 samples)
Two Threads
DisruptorLogger throughput: 536,877 ops/sec. latency(ns): avg=2351.9 99% < 10649.6 99.99% < 222822.4 (100000 samples)
AsynchAppender throughput: 444,856 ops/sec. latency(ns): avg=1926.4 99% < 8192.0 99.99% < 183500.8 (100000 samples)
Four Threads
DisruptorLogger throughput: 61,183 ops/sec. latency(ns): avg=3015.3 99% < 10649.6 99.99% < 5236326.4 (200000 samples)
AsynchAppender throughput: 183,071 ops/sec. latency(ns): avg=4270.5 99% < 32768.0 99.99% < 327680.0 (199998 samples)
Eight Threads
DisruptorLogger throughput: 27,097 ops/sec. latency(ns): avg=3430.8 99% < 16128.0 99.99% < 18419712.0 (399966 samples)
AsynchAppender throughput: 111,550 ops/sec. latency(ns): avg=3817.5 99% < 39731.2 99.99% < 4449894.4 (399994 samples)
On Windows 7 (64bit) with JDK1.7.0_11, 2-core Intel i5-3317u CPU @1.70Ghz with hyperthreading switched on (4 virtual cores):
Single Thread
DisruptorLogger throughput: 838,069 ops/sec. latency(ns): avg=2064.6 99% < 4096.0 99.99% < 996147.2 (4492 samples)
AsynchAppender throughput: 613,668 ops/sec. latency(ns): avg=2506.0 99% < 14745.6 99.99% < 996147.2 (2765 samples)
Two Threads
DisruptorLogger throughput: 606,741 ops/sec. latency(ns): avg=1433.8 99% < 5324.8 99.99% < 235929.6 (9690 samples)
AsynchAppender throughput: 458,478 ops/sec. latency(ns): avg=1497.9 99% < 7372.8 99.99% < 196608.0 (10000 samples)
Four Threads
DisruptorLogger throughput: 54,436 ops/sec. latency(ns): avg=3573.5 99% < 32051.2 99.99% < 1395916.8 (20000 samples)
AsynchAppender throughput: 229,912 ops/sec. latency(ns): avg=4627.6 99% < 13619.2 99.99% < 589824.0 (20000 samples)
Eight Threads
DisruptorLogger throughput: 34,308 ops/sec. latency(ns): avg=7705.5 99% < 20736.0 99.99% < 2340454.4 (40000 samples)
AsynchAppender throughput: 165,455 ops/sec. latency(ns): avg=5167.7 99% < 13670.4 99.99% < 1238220.8 (40000 samples)
I suspect that I am doing something wrong somewhere...
First, could there be a problem with my performance test?
public static void singleThreadPerfTest() {
// JVM warmup
Histogram warmupHist = createHistogram();
for (int i = 0; i < 10; i++) {
perfTest(200000, warmupHist);
}
// wait for buffers to drain
Thread.sleep(10000);
// test
Histogram histogram = createHistogram();
perfTest(100000, warmupHist);
printReport(histogram);
}
private void perfTest(int lines, Histogram histogram) {
if (throughput) {
runThroughputTest(lines, histogram);
} else {
runLatencyTest(histogram);
}
}
public static void multiThreadPerfTest(int threadCount) {
final Histogram warmupHist = createHistogram();
Runnable run1 = new Runnable() {
public void run() {
for (int i = 0; i < 10; i++) {
perfTest(200000, warmupHist);
}
}
};
Thread thread1 = new Thread(run1);
Thread thread2 = new Thread(run1);
thread1.start();
thread2.start();
thread1.join();
thread2.join();
// wait for buffers to drain
Thread.sleep(10000);
// test
final Histogram[] histograms = new Histogram[threadCount];
for (int i = 0; i < histograms.length; i++) {
histograms[i] = createHistogram();
}
final int LINES = 100000;
Thread[] threads = new Thread[threadCount];
for (int i = 0; i < threads.length; i++) {
final Histogram histogram = histograms[i];
threads[i] = new Thread() {
public void run() {
perfTest(LINES, histogram);
}
};
}
for (Thread thread : threads) {
thread.start();
}
for (Thread thread : threads) {
thread.join();
}
for (Histogram histogram : histograms) {
printReport(threadCount, histogram);
}
}
private static void runThroughputTest(int lines, Histogram histogram) {
long s1 = System.nanoTime();
Logger logger = LogManager.getLogger(PerfTest.class);
for (int j = 0; j < lines; j++) {
}
long s2 = System.nanoTime();
long opsPerSec = (1000L * 1000L * 1000L * lines) / (s2 - s1);
histogram.addObservation(opsPerSec);
}
private static void runLatencyTest(Histogram histogram) {
long nanoTimeCost = calcNanoTimeCost();
Logger logger = LogManager.getLogger(PerfTest.class);
final int SAMPLES = 10000;
for (int i = 0; i < SAMPLES; i++) {
long s1 = System.nanoTime();
long s2 = System.nanoTime();
long value = s2 - s1 - nanoTimeCost;
if (value > 0) {
histogram.addObservation(value);
}
// wait 1 microsec
final long PAUSE_NANOS = 1000;
long pauseStart = System.nanoTime();
while (PAUSE_NANOS > (System.nanoTime() - pauseStart)) {
// busy spin
}
}
}