It reminds me of that one too! At present, I'm locked in with
HBase, so I can't make the switch to Cassandra very easily. I did
try:
result = graph.compute().program(PageRankVertexProgram.build().create()).submit().get()
It took a little over 8 hours to run, but did complete once I
adjusted the hbase.client.scanner.timeout.period to something very
long. Interestingly, I had to modify that in the included jar
file, not in the file in /etc/hbase/conf.
Would really like to get this time to run way down, but not sure what other method to try.
-Joe
--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/1bf6c7c5-84b6-483e-982c-c299fca3e8ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Thank you Marc. I assume this would be java code that would be executed via spark-submit?
-Joe
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/bca40d9f-6376-4dcd-b637-313bb1229d9d%40googlegroups.com.
Hi Marc - not sure I understand. I tried this:
gremlin>
g=graph.traversal()
==>graphtraversalsource[standardjanusgraph[hbase:[10.22.5.63:2181,
10.22.5.64:2181, 10.22.5.65:2181]], standard]
gremlin>
result=graph.compute().program(PageRankVertexProgram.build().create()).submit().get()
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/9fbf5f28-e86b-4158-9aec-d6924f48a266%40googlegroups.com.
Thank you Marc. That runs on my cluster, but takes a very long time. If I try it on a larger graph, the YARN jobs run out of heap. Right now I'm giving them 10G each.
On a small graph, I can run it OK, and I can run the BulkDumperVertexProgram as well. What I can't do, when I run with SparkGraphComputer, is look at the results.
After running:
result =
graph.compute(SparkGraphComputer).program(PageRankVertexProgram.build().create()).submit().get()
I can do a result.memory().runtime, which returns a number (in my
case 609821).
I then do:
g = result.graph().traversal(computer(SparkGraphComputer))
Unfortunately, any command with g, gives the same error - for
example:
g.V().valueMap() returns:
java.io.IOException: No input paths specified in job
Since this is a small graph, if I run it without
SparkGraphComputer, those commands on g work fine, such as:
g.V(id).valueMap('gremlin.pageRankVertexProgram.pageRank')
Trying to find any method to run PageRank on a very large graph
that is stored in JanusGraph. Thanks! Anything you would like me
to try?
-Joe
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/bd772ed8-6482-4f6f-9af5-4e36976d2bce%40googlegroups.com.
Thank you Marc. This seems to suggest that if I split the HBase table up into many many regions, that would correct the issue of running PageRank.
Any idea why I can't execute any commands on the graph once the SparkGraphComputer job completes? They all return java.io.IOException: No input paths specified in job
Thanks again!
-Joe
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/a7258c9b-74d5-4520-9119-18d98762dbfd%40googlegroups.com.
Hi Marc,
Ah - I see the output in /user/username/output/~g. This appears to be gryo format. Thank you! Do you know of a way to update the actual JanuGraph with a new page rank property on each vertex instead of writing out an entire graph in HDFS? Would that be a modification of the PageRank code?
What appears to work, increases performance, and reduces memory
requirements is splitting the tables up into many regions. I have
a graph that is about 24.4 million vertices, uses 7.8G of space in
HBase and I've split it into 462 regions. I can run PageRank on
that graph in 44 minutes on a 5 server cluster with 128G of RAM in
each server. In this case, I gave each task 10G of RAM with a max
memory per node of 96G. I think what may work is to set the max
file size in HBase to something very small like 16M to force
splits with:
alter table 't1', MAX_FILESIZE => 16777216
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/43add385-59d9-48e2-897d-33474bb3069b%40googlegroups.com.