mappers.per.region option to HBaseInputFormat

HadoopMarc

unread,

Oct 22, 2017, 7:29:46 PM10/22/17

to JanusGraph developers

Hi dev team,

The JanusGraph users list has seen a number of threads regarding OLAP performance with janusgraph-hbase. In particular, it turns out that initial loading of a graph is problematic when the Hbase table is stored in a small number of large regions of say 10Gb. Such large region sizes result in optimal performance of HBase, so system managers are not expected to like HBase backed graphs with many small regions needed for good parellelism during OLAP operations. As a result, HBase 2.0 alpha has introduced a mappers.per.region option to TableInputFormatBase which allows a single region to be spread over multiple mappers cq Spark tasks. Anxious to use this feature before HBase 2.0 and a JG version supporting it, will come out, I made a quick attempt to backport the feature. This turns out to be quite doable, see: https://github.com/vtslab/janusgraph/commit/87bf1000c01dfce92e857349ba479db0d3ef6bd1. This is initial work and I plan to do a performance benchmark with the friendster graph, like the TinkerPop team did.

My questions to you:

would this work be welcomed as a JanusGraph PR before a release based on HBase 2.0 comes out?
if so, do you have any suggestions to improve on the work?

Some additional notes:

SparkGraphComputer has an option to repartition the graph using the workers() method of the GraphComputer builder, but this does not help in a better parallelization of the initial load
The current HBaseInputFormat has a rather intricate inheritance structure, which will probably need rigorous refactoring to use the HBase 2.0 TableInputFormatBase

Cheers, Marc

Robert Dale

unread,

Oct 22, 2017, 11:55:01 PM10/22/17

to HadoopMarc, JanusGraph developers

Can’t make a release on a snapshot. Do they have a pre-release release? We already have one dep on a rc1 release.

--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/fc87971b-664c-4b0b-961a-aef593d9fb40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Robert Dale

HadoopMarc

unread,

Oct 23, 2017, 10:25:23 AM10/23/17

to JanusGraph developers

Hi Robert,

I see I caused some confusion. The link I sent does not have any deps on HBase 2.0, it only copied/back-ported a small bit of HBase 2.0 code that becomes part of JanusGraph and is tested as such. Once HBase 2.0 becomes available, the back-port would vanish from JanusGraph again for the release branch that depends on HBase 2.0.

So my question is whether there would be support for this back-ported feature on the 0.2.X and 0.3.X branches which depend on HBase 1.Y.

Cheers, Marc

Op maandag 23 oktober 2017 05:55:01 UTC+2 schreef Robert Dale:

Jerry He

unread,

Oct 23, 2017, 1:30:25 PM10/23/17

to HadoopMarc, JanusGraph developers

I think it is a useful feature to have. See the JIRA for more
details: https://issues.apache.org/jira/browse/HBASE-16894
On the other hand, it is about timing. It is very likely that the
next release of JanusGraph (say 3 month from now) will be close to
either HBase 2.0 or HBase 1.4 (which also contains the fix). Then we
will have a code backport that may quickly becomes duplicate.

THanks,

> https://groups.google.com/d/msgid/janusgraph-dev/fe315e19-343f-4b99-9a92-4786ac5b3c8c%40googlegroups.com.

HadoopMarc

unread,

Oct 24, 2017, 1:45:28 AM10/24/17

to JanusGraph developers

Hi Jerry,

Thanks for the info about the HBase-1.4 branch, I was not aware of that. I agree then that it is better to focus our efforts to have JanusGraph HBaseInputFormat inherit from HBase InputTableFormatBase. I will do the benchmark test with my current code anyway, to see if performance works out as expected.

Marc

Op maandag 23 oktober 2017 19:30:25 UTC+2 schreef Jerry He:

Jerry He

unread,

Oct 24, 2017, 10:55:31 AM10/24/17

to HadoopMarc, JanusGraph developers

That will be a very nice performance testing!

Thanks.

> https://groups.google.com/d/msgid/janusgraph-dev/f2159ac2-beb1-4be1-a843-209e52648e77%40googlegroups.com.

Reply all

Reply to author

Forward