Janusgraph Tests OOM on MacOS: tips on Mac setup and a couple observations on HBase as backend

184 views
Skip to first unread message

Demai

unread,
Sep 28, 2017, 6:15:43 PM9/28/17
to JanusGraph developers
keyword: java.lang.OutOfMemoryError: unable to create new native thread, HBase

hi, 

Long story short, I was struggling to run through Janusgraph Tests on MacOS, and finally in a 'better' shape. So take the opportunity to share about the struggle on MacOS and a couple observation about the test with HBase as backend:

  1. Macbook's limitation on "max user processes" and "kern.num_threads"
  2. High thread demand during HBase testing, for example HBaseIDAuthorityTest and HBaseLockStoreTest
  3. HBase98 vs HBase10: same testings in (2) passed HBase0.98 and failed on HBase1.x 

I have a "solution" for (1) , but (2) and (3) are observations and potential improvement 

(0) failures 
*enviroment: macOS Sierra 10.12.6 with 16GB memory
*java:  "1.8.0_102" ,SE Runtime Environment (build 1.8.0_102-b14), 64-Bit Server VM (build 25.102-b14, mixed mode)
*mvn: Apache Maven 3.3.9
*test method: "mvn clean install" through iTerm2

Typical failure (both only on HBase1.x testing)
 HBaseLockStoreTest>LockKeyColumnValueStoreTest.parallelNoncontendedLockStressTest:364 expected:<100> but was:<80>
 [pool-2-thread-2] ERROR diskstorage.LockKeyColumnValueStoreTest: Unexpected locking-related exception on iteration 81/100
java.lang.RuntimeException: java.lang.OutOfMemoryError: unable to create new native thread

org.janusgraph.diskstorage.hbase.HBaseIDAuthorityTest
testMultiIDAcquisition[0](org.janusgraph.diskstorage.hbase.HBaseIDAuthorityTest)  Time elapsed: 27.565 sec  <<< ERROR!
java.lang.RuntimeException: java.lang.OutOfMemoryError: unable to create new native thread
...
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:910)
at org.janusgraph.diskstorage.hbase.HTable1_0.batch(HTable1_0.java:51)

(1) Macbook's limitation on "max user processes" and "kern.num_threads"
Both failures due to "unable to create new native thread" which is setup very low on Mac and difficult to change. The default ones:
$ulimit -u   ==> max user processes    (-u) 709
$sysctl kern.num_threads ==> kern.num_threads: 10240

The direct cause is from kern.num_threads, which is not (independent) changable according to apple. To 'influence' it, two steps to hack:
step 1) one must change "Max user processes" with this instruction
step 2) turn on performance mode per apple

Now my macOS shows : max user processes: 2499; and kern.num_threads: 25000

(2) High thread demand during HBase testing.
I digged into the HBaseIDAuthorityTest, which called HTalbe.batch() 15K+ times. 
Similar occurred at LockKeyColumnValueStoreTest.parallelNoncontendedLockStressTest(), through it uses threadpool. BTW, its error message is a bit confusion, which I originally thought of a locking bug.

wondering whether OK to reduce the stress testing level, for example, change lockOperationsPerThread = 100 to 70 which works fine for the testing purpose?

(3) HBase98 vs HBase1.0: same testcases in (2) passed HBase0.98 but failed on HBase1.x 

This is odd. As I can consistently repro the issue. 
HBaseIDAuthorityTest: HBase1.X failed at 10350th or so HTable1_0.batch(), which HBase98 tolerate all 15549 batches
HBaseLockStoreTest#parallelNoncontendedLockStressTest: HBase1.x began to through OOM at around 80/100 iteration; and HBase98 saw no issues.
I am wondering what cause HBase1.x requires more resource? BTW, table.flushCommits() is removed from HTable1_0.batch() but it doesn't look like the cause. 

---------------------
OK. that is all. I figured out so far from backward and the hard way. Hence, share here in hope to help whoever use macOS. Or maybe some smarter way. 

Also wondering whether (2) and (3) warrant further investigation or fill an issue? thanks for reading.

Demai

Jason Plurad

unread,
Sep 28, 2017, 6:23:26 PM9/28/17
to JanusGraph developers
Thanks for the feedback, Demai.

These sound like HBase specific issues. Jerry He might have some insight on what's going on.

Ultimately I don't think Mac OS X is a well supported operating system for HBase.

Demai

unread,
Sep 28, 2017, 7:22:20 PM9/28/17
to JanusGraph developers
@Jason, thanks. aside about the anomaly, the test runs pretty long(particularly the ones under HBase). For example, the two testcases in question each run 10+ minutes.
 
@Jerry, when get a chance, would you please to take a look? I can more trace info. Thanks

Demai

Jerry He

unread,
Sep 29, 2017, 12:08:32 AM9/29/17
to Demai, JanusGraph developers
Hi,  Demai

Thanks for your digging!

Are you able to run the tests successfully after you increased the kernel parameters?

Thanks,

Jerry

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/41580f6e-ef4f-4fc1-8caf-17baadf90c75%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Demai

unread,
Sep 29, 2017, 12:27:46 PM9/29/17
to JanusGraph developers
Jerry, 

yeah. I am able to run through both tests after increase the kern threads individually. 

Not able to run the whole test though, as it seems take forever(10+ hours)

Demai

Demai

unread,
Oct 2, 2017, 2:21:01 PM10/2/17
to JanusGraph developers
hi, all,

did a bit investigation (and also thanks to Jerry's help). when run the testcase 'HBaseIDAuthorityTest', there are signification HConnect count on HBase1.x(hundreds) comparing to HBase.98(30). Something wired there and worth an issue to track more investigation. I will open one later today.

Demai

Jerry He

unread,
Oct 4, 2017, 12:25:12 AM10/4/17
to Demai, JanusGraph developers
Hi, Demai

I think you meant the number of threads within one client connection.
The thread name shows like 'hconnection-0xXXX-shaded'
From the 1.2 code, the default is 256.

org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation.getBatchPool()

private ExecutorService getBatchPool() {
if (batchPool == null) {
synchronized (this) {
if (batchPool == null) {
this.batchPool =
getThreadPool(conf.getInt("hbase.hconnection.threads.max", 256),
conf.getInt("hbase.hconnection.threads.core", 256),
"-shared-", null);
this.cleanupPool = true;
}
}
}
return this.batchPool;
}
Reply all
Reply to author
Forward
0 new messages