Get GC overhead limit exceeded while executing batches of cypher query

149 views
Skip to first unread message

Cherie Pun

unread,
May 12, 2016, 12:52:37 PM5/12/16
to Neo4j
I am trying to iterate through the results from the first query and execute an extra cypher query for each result to either create or find the existing relationship. However I encountered GC overhead error and the stack trace is as followed. I thought running the queries in batch of 100 will solve the memory problem but it seems that either it's using up the memory too quickly or not releasing the memory quick enough. It broke down after processing 200000 queries after I increased the batch size to 1000. Is there a way to optimise my query to reduce the memory used, or is there an alternative way to achieve the same result? Thanks.

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at scala
.collection.mutable.OpenHashMap.foreachUndeletedEntry(OpenHashMap.scala:226)
    at scala
.collection.mutable.OpenHashMap.foreach(OpenHashMap.scala:219)
    at org
.neo4j.cypher.internal.compiler.v2_3.ExecutionContext.foreach(ExecutionContext.scala:46)
    at scala
.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:155)
    at org
.neo4j.cypher.internal.compiler.v2_3.ExecutionContext.foldLeft(ExecutionContext.scala:33)
    at org
.neo4j.cypher.internal.frontend.v2_3.helpers.Eagerly$.mapToBuilder(Eagerly.scala:44)
    at org
.neo4j.cypher.internal.frontend.v2_3.helpers.Eagerly$.immutableMapValues(Eagerly.scala:37)
    at org
.neo4j.cypher.internal.compiler.v2_3.ClosingIterator$$anonfun$next$1.apply(ResultIterator.scala:76)
    at org
.neo4j.cypher.internal.compiler.v2_3.ClosingIterator$$anonfun$next$1.apply(ResultIterator.scala:72)
    at org
.neo4j.cypher.internal.compiler.v2_3.ClosingIterator$$anonfun$failIfThrows$1.apply(ResultIterator.scala:121)
    at org
.neo4j.cypher.internal.compiler.v2_3.ClosingIterator.decoratedCypherException(ResultIterator.scala:130)
    at org
.neo4j.cypher.internal.compiler.v2_3.ClosingIterator.failIfThrows(ResultIterator.scala:119)
    at org
.neo4j.cypher.internal.compiler.v2_3.ClosingIterator.next(ResultIterator.scala:72)
    at org
.neo4j.cypher.internal.compiler.v2_3.ClosingIterator.next(ResultIterator.scala:50)
    at org
.neo4j.cypher.internal.compiler.v2_3.PipeExecutionResult.next(PipeExecutionResult.scala:77)
    at org
.neo4j.cypher.internal.compiler.v2_3.PipeExecutionResult$$anon$2.next(PipeExecutionResult.scala:70)
    at org
.neo4j.cypher.internal.compiler.v2_3.PipeExecutionResult$$anon$2.next(PipeExecutionResult.scala:68)
    at org
.neo4j.cypher.internal.compatibility.ExecutionResultWrapperFor2_3$$anon$1$$anonfun$next$1.apply(CompatibilityFor2_3.scala:234)
    at org
.neo4j.cypher.internal.compatibility.ExecutionResultWrapperFor2_3$$anon$1$$anonfun$next$1.apply(CompatibilityFor2_3.scala:234)
    at org
.neo4j.cypher.internal.compatibility.exceptionHandlerFor2_3$.runSafely(CompatibilityFor2_3.scala:116)
    at org
.neo4j.cypher.internal.compatibility.ExecutionResultWrapperFor2_3$$anon$1.next(CompatibilityFor2_3.scala:234)
    at org
.neo4j.cypher.internal.compatibility.ExecutionResultWrapperFor2_3$$anon$1.next(CompatibilityFor2_3.scala:229)
    at org
.neo4j.cypher.javacompat.ExecutionResult.next(ExecutionResult.java:233)
    at org
.neo4j.cypher.javacompat.ExecutionResult.next(ExecutionResult.java:55)
    at java
.util.Iterator.forEachRemaining(Iterator.java:116)
    at
Neo4j.TrustCalculator.iterateAndExecuteBatchedInSeparateThread(TrustCalculator.java:239)
    at
Neo4j.TrustCalculator.insertSimpleTransitiveTrust(TrustCalculator.java:127)
    at
Neo4j.TrustCalculator.main(TrustCalculator.java:265)
    at sun
.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun
.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java
.lang.reflect.Method.invoke(Method.java:483)

private final String CREATE_UNIQUE_TRUST_RELATION_QUERY =
       
"MATCH (a:" + TwitterLabels.USERS + "),(b:" + TwitterLabels.USERS + ")\n" +
               
"WHERE id(a) = {startNode} AND id(b) = {endNode} \n" +
               
"CREATE UNIQUE (a) - [r:" + TwitterRelationships.TRUST + "] -> (b) \n" +
               
"return r";

private final String FIND_TRANSITIVE_RELATION_QUERY =
       
"MATCH (a:" + TwitterLabels.USERS + ") " +
               
"- [ab:" + TwitterRelationships.TRUST + "] -> " +
               
"(b:" + TwitterLabels.USERS + ") " +
               
"- [bc:" + TwitterRelationships.TRUST + "] -> " +
               
"(c:" + TwitterLabels.USERS + ")\n" +
               
"WHERE a <> c\n" +
               
"return ab, bc";

private ExecutorService createSinglePool() {
   
return Executors.newSingleThreadExecutor();
}

private void iterateAndExecuteBatchedInSeparateThread(int batchsize, Iterator iterator, Consumer consumer) {
   
final int[] opsCount = {0};
   
final int[] batchesCount = {0};
   
ExecutorService executorService = createSinglePool();
   
try {
       
final Transaction[] workerTransaction = {executorService.submit(() -> graphDb.beginTx()).get()};

        iterator
.forEachRemaining(t -> executorService.submit(() -> {
            consumer
.accept(t);
           
if ((++opsCount[0]) % batchsize == 0) {
                batchesCount
[0]++;
                workerTransaction
[0].success();
                workerTransaction
[0].close();
                workerTransaction
[0] = graphDb.beginTx();
           
}
       
}));
        executorService
.submit(() -> {
            workerTransaction
[0].success();
            workerTransaction
[0].close();
       
}).get();

   
} catch (InterruptedException | ExecutionException e) {
         
throw new RuntimeException(e);
   
}
}


public void insertSimpleTransitiveTrust(){
   
try ( Transaction tx = graphDb.beginTx() )
   
{
       
Result results = graphDb.execute( FIND_TRANSITIVE_RELATION_QUERY, new HashMap<String, Object>() );

        iterateAndExecuteBatchedInSeparateThread
(1000, results, result-> insertOneSimpleTransitiveTrust((Map<String, Object>) result));

        tx
.success();
   
}
}

private void insertOneSimpleTransitiveTrust(Map<String, Object> map){
   
Object object = map.get("ab");
   
Relationship trustAb = (Relationship) object;
   
object = map.get("bc");
   
Relationship trustBc = (Relationship) object;
   
if(trustAb.hasProperty(TwitterProperties.CONVERSATIONAL_TRUST) && trustBc.hasProperty(TwitterProperties.CONVERSATIONAL_TRUST)){
       
Node nodeA = trustAb.getStartNode();
       
Node nodeB = trustAb.getEndNode();
       
Node nodeC = trustBc.getEndNode();

       
Map<String, Object> params = new HashMap<String, Object>();
       
params.put("startNode", nodeA.getId());
       
params.put("endNode", nodeC.getId());
       
Result results = graphDb.execute( CREATE_UNIQUE_TRUST_RELATION_QUERY, params );
       
if(results.hasNext()){
               
Map<String, Object> result = results.next();
               
object = result.get("r");
               
Relationship transitiveTrust = (Relationship) object;
                transitiveTrust
.setProperty(TwitterProperties.SIMPLE_TRANSITIVE_TRUST, trustBc.getProperty(TwitterProperties.CONVERSATIONAL_TRUST));
       
}
   
}
}


Michael Hunger

unread,
May 13, 2016, 6:09:51 AM5/13/16
to ne...@googlegroups.com
Can you share your actual code and queries

Von meinem iPhone gesendet
--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Cherie Pun

unread,
May 13, 2016, 6:49:39 AM5/13/16
to ne...@googlegroups.com

Hi,

Thanks for your reply. By actual code do you mean all my files that are related? I have attached the main part of the code that does the querying underneath the stacktrace. The other files just defined strings for the property names and enum type for relationship and node.

The query that I ran first was :


private final String FIND_TRANSITIVE_RELATION_QUERY =
        "MATCH (a:" + TwitterLabels.USERS + ") " +
                "- [ab:" + TwitterRelationships.TRUST + "] -> " +
                "(b:" + TwitterLabels.USERS + ") " +
                "- [bc:" + TwitterRelationships.TRUST + "] -> " +
                "(c:" + TwitterLabels.USERS + ")\n" +
                "WHERE a <> c\n" +
                "return ab, bc";

Iterating through the result from the query, I then perform a another query that create a unique trust relationship between node a and c:


private final String CREATE_UNIQUE_TRUST_RELATION_QUERY =
        "MATCH (a:" + TwitterLabels.USERS + "),(b:" + TwitterLabels.USERS + ")\n" +
                "WHERE id(a) = {startNode} AND id(b) = {endNode} \n" +
                "CREATE UNIQUE (a) - [r:" + TwitterRelationships.TRUST + "] -> (b) \n" +
                "return r";

The second query is being executed on another thread, and the thread commits the transaction for every 1000 queries.

This executes fine until it processed 200,000 result and crashes because of gc limit reached. I did a heap dump and found out that the relationshipproxy objects were taking up most of the space.

I hope that gives you sufficient information, please let me know if you need more explanation of my code.

Kind regards,
Cherie

--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/sjFDz496AwA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

Michael Hunger

unread,
May 13, 2016, 7:16:24 AM5/13/16
to ne...@googlegroups.com

Your tx-handling code looks wrong.

You should execute the whole batch of creating the connections in the other thread, Just grabbing the transaction there is very dangerous as the tx will be used that's attached to the current thread instead.

I also recommend batching in cypher, that makes it much easier:
Just send in a batch (e.g. 100k) list of pairs of id's as parameters and then do

pool.submit( () ->
Iterables.<Long>singleOrNull(db.execute("
UNWIND {data} as pair
MATCH (a),(b) WHERE id(a) = pair[0] and id(b) = pair[1]
MERGE (a)-[:TRUST]->(b) RETURN count(*) as c;
", batchOfIds()).columnAs("c"));
)).get()

If your return ab, bc then that's relationships? but in the other query you match nodes?

You should only return id(a),id(c) to minimize the data created.

Cherie Pun

unread,
May 13, 2016, 7:31:23 AM5/13/16
to ne...@googlegroups.com

Hi,

Thanks for the quick reply. I returned the relationship because the property of the transitive relationship that I am trying to create depends on the property of the original two relationship. Should I have just returned the properties and node id to reduce data? There are two different types of transitive relationship I want to create: 1) assign the property of bc to ac 2) assign the product of the trust strength of ab and bc to ac.

The tx-handling code was written according to answers from stack overflow as I don't have much knowledge in batch executing cypher queries. Thanks for correcting it.

I have read that merge does not guarantee uniqueness of relationship, so when should I use create unique and when I should I use merge? Thanks.

Kind regards,
Cherie

Michael Hunger

unread,
May 13, 2016, 7:38:00 AM5/13/16
to ne...@googlegroups.com
Filter in cypher what you return, and yes only return the props needed

For the batching you can also send in a list of maps on which you can store additional props or information. 

For concurrent execution neither guarantees uniqueness of rels but you execute one after another, so you have a guarantee with merge and create unique

Where did you find that answer on SO?

Von meinem iPhone gesendet

Cherie Pun

unread,
May 13, 2016, 7:41:38 AM5/13/16
to ne...@googlegroups.com

Thanks for the advice! I will try it out and let you know how it goes.

The stack overflow answer is here: http://stackoverflow.com/questions/37126965/iterating-through-relationships-from-query-result-while-adding-new-relationship/37133192#37133192

So I just copied that part of the code and modified to suit my need. I might have done some incorrect modification though.

I really appreciate your help, thank you very much!

Kind regards,
Cherie

Cherie Pun

unread,
May 16, 2016, 10:03:27 AM5/16/16
to Neo4j
Hi,

The correction on the tx handling code fixed the problem so I dont exceed GC limit any more. Thank you very much!
Is it more efficient to manipulate the properties in cypher queries or in java? I realised I have a lot of transitive relationship and it took a very long time to process.
Also, usually how big should my batch be? I am currently using 1000 and it seems quite fast.
Thanks.
> > To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.

> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/sjFDz496AwA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to neo4j+unsubscribe@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/sjFDz496AwA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/sjFDz496AwA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages