Memory issues using a very simple query

rodria...@gmail.com

unread,

May 8, 2019, 10:22:02 AM5/8/19

to JanusGraph users

We are using JanusGraph with Cassandra DB. Our schema is simple and it works fine with small and medium samples, however, we get memory issues when the data grows, even with very simple queries.

The following query:

g.V().hasLabel('customer')

for more than 3 million users, produces the following error:

{"message":"Java heap space","Exception-Class":"java.lang.OutOfMemoryError","exceptions":["java.lang.OutOfMemoryError"],"stackTrace":"java.lang.OutOfMemoryError: Java heap space\n\tat java.util.Arrays.copyOfRange(Arrays.java:3664)\n\tat java.lang.String.<init>(String.java:207)\n\tat java.lang.StringBuilder.toString(StringBuilder.java:407)\n\tat org.apache.tinkerpop.shaded.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:404)\n\tat org.apache.tinkerpop.shaded.jackson.core.io.SegmentedStringWriter.getAndClear(SegmentedStringWriter.java:83)\n\tat org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3213)\n\tat org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0.serializeResponseAsString(GraphSONMessageSerializerV1d0.java:98)\n\tat org.apache.tinkerpop.gremlin.server.handler.HttpGremlinEndpointHandler.lambda$channelRead$1(HttpGremlinEndpointHandler.java:250)\n\tat org.apache.tinkerpop.gremlin.server.handler.HttpGremlinEndpointHandler$$Lambda$172/1143745749.apply(Unknown Source)\n\tat org.apache.tinkerpop.gremlin.util.function.FunctionUtils.lambda$wrapFunction$0(FunctionUtils.java:36)\n\tat org.apache.tinkerpop.gremlin.util.function.FunctionUtils$$Lambda$173/1328679220.apply(Unknown Source)\n\tat org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:297)\n\tat org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor$$Lambda$104/1217416538.call(Unknown Source)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n"}

Is this supposed to happen? Or we need to adjust our Java heap or JanusGraph server configuration?

Stephen Mallette

unread,

May 8, 2019, 10:46:57 AM5/8/19

to janusgra...@googlegroups.com

You're using HTTP endpoint and asking for 3 million results that all need to be realized in memory and then serialized into a JSON object and then returned over the network. That may be a "simple" query but it's a very expensive one. I think you can expect the OutOfMemoryError in this case. You need to increase -Xmx considerably to even get that to return and maybe not even use the HTTP service - websockets will at least stream results back rather than realize the whole result in memory. In any case, no matter how you do this, it won't be fast and it will gum up a thread in Gremlin Server while its working. Submit enough of those kinds of traversals and Gremlin Server will fall over.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To post to this group, send email to janusgra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/797aa8cd-55d7-48bb-a77e-48acd3153685%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rodrigo Aldecoa

unread,

May 8, 2019, 10:55:08 AM5/8/19

to JanusGraph users

Thank you so much for your quick reply.

Then, the solution is not to use that type of queries in JanusGraph? Querying the customers in batches? Or writing more specific queries so that the results set is smaller?

To unsubscribe from this group and stop receiving emails from it, send an email to janusgra...@googlegroups.com.

Stephen Mallette

unread,

May 8, 2019, 11:13:51 AM5/8/19

to janusgra...@googlegroups.com

Finding all "customers" in a graph is an OLAP style workload because you're effectively traversing the entire graph, so you would probably execute a traversal like that over Spark:

https://docs.janusgraph.org/latest/hadoop-tp3.html

To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.

To post to this group, send email to janusgra...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/446a1e6e-3881-44c5-90c4-6058b6270623%40googlegroups.com.

Tharindu Madanayake

unread,

May 8, 2019, 11:46:13 AM5/8/19

to janusgra...@googlegroups.com

I guess you can do pagination by using range() function where you can limit the result and traverse according to offset and pagesize.

Ex : g.V().hasLabel("customer").range(0,100) will give first 100 records.

Kind Regards,

Tharindu

To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/CAA-H43_CyzF90C6KUOtBVEpjF0ou%3DZ5MC9g8oJOmE8yNOyNh3A%40mail.gmail.com.

Reply all

Reply to author

Forward