Extreme memory rise on neo4j-server when saving entities using OGM

112 views
Skip to first unread message

Ivan Senic

unread,
Nov 17, 2015, 3:28:56 PM11/17/15
to Neo4j
Hi,

I am quite new to neoj4 and I started using it with the neo4j-ogm to map my java objects into the graph and back. I am experiencing quite a strange behavior when I start saving entities via OGM into the graph. It seams that the memory usage rises fast on the neo4j-server (I also experienced few OOM Exceptions):

 


As you can see from screens the memory rises up in terms of the seconds. I started by saving with depth of 1 (behavior described by pic1) and then also tried with depth -1 (that's pic 2 where I always hit OOM exception)., It's important to mention that saving is also very slow, making some of my requests time-out (with 3 seconds time out).. Setting the depth to 0 works as expected (in terms of memory and speed), but then I get no relationships in the graph.


The interesting thing is that I am not saving too much data at all. In my use-case I end up with about 6k nodes and 6k relationships when all requests are processed. Imo that's nothing for the neo4j, thus I am surprised I am seeing this.


With the graph I am trying to represent all the classes (interfaces, annotations, methods, etc) that are loaded in the JVM by starting a small example application. So with every new class being loaded I need to update the graph in order to store information I can read from the byte-code.


The only "special" thing might be that I have a lot of bi-directional relationships (for example class implements interface, interface is realized by class) .. Here are some screens of my graph:



It must be that I am doing something wrong.. And it has to be related to the OGM.. It can not be that saving such a small number of nodes pushes the memory so high..


Here are some more information about my setup: neo4j 2.3.0, neo44-ogm 1.1.3, Ubuntu 14.04, JDK 1.7.0_70. I have no special settings than the default ones.

Any help would be great..

Michael Hunger

unread,
Nov 17, 2015, 7:47:00 PM11/17/15
to ne...@googlegroups.com, Luanne Misquitta
Can you try neo4j-ogm 1.1.4-SNAPSHOT

You might need to add m2.neo4j.org as maven repository with snapshots enabeld.

I think it has some fixes in this regard.

Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Luanne Coutinho

unread,
Nov 17, 2015, 10:22:47 PM11/17/15
to Neo4j, Luanne Misquitta
Hi Ivan,

When you create relationships, are the nodes on either end already persisted? Or are you creating both new nodes and relationships to those in one go via the save with depth -1? How many relationships approximately are created via a single save?

-Luanne

Ivan Senic

unread,
Nov 18, 2015, 5:08:01 AM11/18/15
to Neo4j, lua...@graphaware.com
Will try it and post back if there's any difference..

Ivan Senic

unread,
Nov 18, 2015, 5:17:41 AM11/18/15
to Neo4j, lua...@graphaware.com
Hi Luanne,

It's hard to answer your question, but I will try it..

When you create relationships, are the nodes on either end already persisted?
Almost never.. It might happen that some of the nodes are already persisted and also that all are new.. But how would I save "only the relationship", saying I have both of nodes saved with depth 0, how would I implement to save only the relationship, as in my entity object I have Sets referring to the "connected" nodes?

Or are you creating both new nodes and relationships to those in one go via the save with depth -1?
Yes often I am creating at same time new nodes and relationships. I feel like there is something with respect to this question, should I clearly separate the saving of nodes and relationships?
 
How many relationships approximately are created via a single save?
I can only give you app. estimate, so let's say 10-15..

Any idea?

Luanne Coutinho

unread,
Nov 18, 2015, 5:28:09 AM11/18/15
to Neo4j
Hi Ivan,

The reason for those questions is that the fix that Michael refers to only applies to saving a large number of new relationships when the start and end nodes have been previously persisted. 
For this condition to hold, you'd do something like persist nodes without relationships (either not created, or persist with depth 0), and then add the relationships and persist the nodes again. In this case, the OGM will create only relationships between known nodes and the optimised Cypher query will kick in. This fix does not apply to relationship entities either. 

Further optimisation of the OGM queries is work in progress. Till it is released, the workaround to saving a large number of relationships is the above. 

10-15 is not much at all- is it possible to share your code privately with me?

Thanks
Luanne

Michael Hunger

unread,
Nov 18, 2015, 7:32:57 AM11/18/15
to ne...@googlegroups.com
Perhaps also enable query logging for the ogm
And share the queries with us

Von meinem iPhone gesendet

Ivan Senic

unread,
Nov 18, 2015, 10:13:28 AM11/18/15
to Neo4j
Hi Michael, hi Luanne,

So I managed to continue checking what's going wrong. I updated to 1.1.4-SHAPSHOT version and tried to always first save all nodes (with depth 0) and then save the relationships (depth 1 save on node referencing all relationships). I also diagnosed the code I have with inspectIT, I am attaching the storage with data, if you go to https://github.com/inspectIT/inspectIT/releases/tag/1.6.4.69 you can download the inspectIT UI and import the .itds file. Then go to Invocation Sequences you can see all the traces I have of the method where OGM is used.. You can also see all the json being sent to the neo4j server.. I can also share my code with you if you think that would help, since this is work for an open source project.. The only problem is that I am not sure how much would it help as it's very complicated to setup the environment and reproduce the situation. So maybe checking with inspectIT is good first step as you would anyway see all the queries.

So from diagnose I figured out few things:

1. Some of the hand-written Cyper queries I used for reading from database quite often are not really optimized.. They always lasted ~10ms although I am just retrieving one object..  I used explain to figure out what will be the plan, so I will be including some indexing so that they are improved.. But still I believe has nothing to do with the memory.

2. When executing saving of nodes with depth 0, the operation takes also considerable amount of time 10ms+.. This is quite a lot.. I pulled out the queries being executed at this time and it looks like: 
'cyper': {"statements":[{"statement":"CREATE (_0:`MethodType`{_0_props}) RETURN id(_0) AS _0","parameters":{"_0_props":{"name":"<init>","parameters":["java.util.IdentityHashMap"],"modifiers":2,"returnType":"void"}},"resultDataContents":["row"],"includeStats":false}]} 
So nothing suspicious, imo exactly the query it should be set, but why is it so slow.. So if I want to save 50 new nodes it's already more than half a second just for saving all of those with depth 0. There's no such function as save all, right? Are these times expected and normal?

3. The biggest problem I think is when I try to make that final save with depth 1. There I also saw different type of queries being produced.. So for example this one seams correct and is relatively fast:
'cyper': {"statements":[{"statement":"UNWIND {rowsDECLARES} as row MATCH (startNode) WHERE ID(startNode)=row.startNodeId MATCH (endNode) WHERE ID(endNode)=row.endNodeId MERGE (startNode)-[rel:`DECLARES`]->(endNode) RETURN row.relRef as relRef, ID(rel) as relId UNION ALL UNWIND {rowsEXTENDS} as row MATCH (startNode) WHERE ID(startNode)=row.startNodeId MATCH (endNode) WHERE ID(endNode)=row.endNodeId MERGE (startNode)-[rel:`EXTENDS`]->(endNode) RETURN row.relRef as relRef, ID(rel) as relId","parameters":{"rowsEXTENDS":[{"startNodeId":2891,"endNodeId":54655,"relRef":"_0"}],"rowsDECLARES":[{"startNodeId":2891,"endNodeId":2892,"relRef":"_1"},{"startNodeId":2891,"endNodeId":2901,"relRef":"_10"},{"startNodeId":2891,"endNodeId":2902,"relRef":"_11"},{"startNodeId":2891,"endNodeId":2903,"relRef":"_12"},{"startNodeId":2891,"endNodeId":2904,"relRef":"_13"},{"startNodeId":2891,"endNodeId":2905,"relRef":"_14"},{"startNodeId":2891,"endNodeId":2906,"relRef":"_15"},{"startNodeId":2891,"endNodeId":2907,"relRef":"_16"},{"startNodeId":2891,"endNodeId":2908,"relRef":"_17"},{"startNodeId":2891,"endNodeId":2909,"relRef":"_18"},{"startNodeId":2891,"endNodeId":2910,"relRef":"_19"},{"startNodeId":2891,"endNodeId":2893,"relRef":"_2"},{"startNodeId":2891,"endNodeId":2911,"relRef":"_20"},{"startNodeId":2891,"endNodeId":2912,"relRef":"_21"},{"startNodeId":2891,"endNodeId":2913,"relRef":"_22"},{"startNodeId":2891,"endNodeId":2914,"relRef":"_23"},{"startNodeId":2891,"endNodeId":2915,"relRef":"_24"},{"startNodeId":2891,"endNodeId":2916,"relRef":"_25"},{"startNodeId":2891,"endNodeId":2917,"relRef":"_26"},{"startNodeId":2891,"endNodeId":2918,"relRef":"_27"},{"startNodeId":2891,"endNodeId":2919,"relRef":"_28"},{"startNodeId":2891,"endNodeId":2920,"relRef":"_29"},{"startNodeId":2891,"endNodeId":2894,"relRef":"_3"},{"startNodeId":2891,"endNodeId":2921,"relRef":"_30"},{"startNodeId":2891,"endNodeId":2922,"relRef":"_31"},{"startNodeId":2891,"endNodeId":2923,"relRef":"_32"},{"startNodeId":2891,"endNodeId":2924,"relRef":"_33"},{"startNodeId":2891,"endNodeId":2925,"relRef":"_34"},{"startNodeId":2891,"endNodeId":2926,"relRef":"_35"},{"startNodeId":2891,"endNodeId":2927,"relRef":"_36"},{"startNodeId":2891,"endNodeId":2928,"relRef":"_37"},{"startNodeId":2891,"endNodeId":2929,"relRef":"_38"},{"startNodeId":2891,"endNodeId":2930,"relRef":"_39"},{"startNodeId":2891,"endNodeId":2895,"relRef":"_4"},{"startNodeId":2891,"endNodeId":2931,"relRef":"_40"},{"startNodeId":2891,"endNodeId":2932,"relRef":"_41"},{"startNodeId":2891,"endNodeId":2933,"relRef":"_42"},{"startNodeId":2891,"endNodeId":2934,"relRef":"_43"},{"startNodeId":2891,"endNodeId":2935,"relRef":"_44"},{"startNodeId":2891,"endNodeId":2936,"relRef":"_45"},{"startNodeId":2891,"endNodeId":2937,"relRef":"_46"},{"startNodeId":2891,"endNodeId":2938,"relRef":"_47"},{"startNodeId":2891,"endNodeId":2939,"relRef":"_48"},{"startNodeId":2891,"endNodeId":2940,"relRef":"_49"},{"startNodeId":2891,"endNodeId":2896,"relRef":"_5"},{"startNodeId":2891,"endNodeId":2941,"relRef":"_50"},{"startNodeId":2891,"endNodeId":2942,"relRef":"_51"},{"startNodeId":2891,"endNodeId":2943,"relRef":"_52"},{"startNodeId":2891,"endNodeId":2944,"relRef":"_53"},{"startNodeId":2891,"endNodeId":2945,"relRef":"_54"},{"startNodeId":2891,"endNodeId":2946,"relRef":"_55"},{"startNodeId":2891,"endNodeId":2947,"relRef":"_56"},{"startNodeId":2891,"endNodeId":2948,"relRef":"_57"},{"startNodeId":2891,"endNodeId":2949,"relRef":"_58"},{"startNodeId":2891,"endNodeId":2950,"relRef":"_59"},{"startNodeId":2891,"endNodeId":2897,"relRef":"_6"},{"startNodeId":2891,"endNodeId":2951,"relRef":"_60"},{"startNodeId":2891,"endNodeId":2952,"relRef":"_61"},{"startNodeId":2891,"endNodeId":2953,"relRef":"_62"},{"startNodeId":2891,"endNodeId":2954,"relRef":"_63"},{"startNodeId":2891,"endNodeId":2898,"relRef":"_7"},{"startNodeId":2891,"endNodeId":2899,"relRef":"_8"},{"startNodeId":2891,"endNodeId":2900,"relRef":"_9"}]},"resultDataContents":["row"],"includeStats":false}]}

But I am also getting at same place sometimes smth like the ones in attached file: exampleDepth1Query.txt. And yes the whole 182K is one single save query.. So I assume these kind of queries are making the memory problems..

Does this remind you to smth?

Greets,
Ivan
exampleDepth1Query.txt
neo4j Traces.itds

Luanne Coutinho

unread,
Nov 18, 2015, 10:40:27 AM11/18/15
to Neo4j
Thanks for the data Ivan.
The UNWIND query is the one that we introduced to fix the performance issue when saving many relationships. 

The one you see in exampleDepth1Query is the not-so-optimal query and it's being used because what you're saving does not satisfy the conditions of only new relationships (no new nodes, no updated nodes, no relationship updates, no relationship entities). Unfortunately this means that the optimisation applies to one operation only and that is "create relationships when the nodes on either end are persisted". As I mentioned earlier, work is underway to optimise all the queries and then you should not have to worry about the manner in which you save entities.

There is a save all- you can use the Session.save with a collection of entities. 

Regards
Luanne

Ivan Senic

unread,
Nov 19, 2015, 5:06:30 AM11/19/15
to Neo4j
Hi Luanne,

Regarding save with the collection of entities. It's not really improving anything as each entity will be saved in separate HTTP request. Meaning that same amount of server round-trips are needed to save all entities. I imagined that if I save a collection there will be only one HTTP request executed that saves all of them.. This would be a nice improvement imo.

I also have on more important question. If I use the query methods of the session to return one node, is there any way I get back only the data on that one node without any relationships. Because I see I am constantly getting the whole graph back. I assume that this has to do that cyper specifies "resultDataContents":["graph"] (see below). Is there any chance that I set data content to row (I assume then I would just get that one node? 

'cyper': {"statements":[{"statement":"MATCH (n:ClassType) WHERE n.fqn = { fqn } RETURN n LIMIT 1","parameters":{"fqn":"javax.swing.JLabel"},"resultDataContents":["graph"],"includeStats":false}]}
 
Thanks in advance,
Ivan

Luanne Coutinho

unread,
Nov 23, 2015, 4:10:24 AM11/23/15
to Neo4j
Hi Ivan,

Good point about the save collection. I'll put it up for discussion in one of our planning meetings.

The query method will return to you exactly what you ask for in the cypher query. So if you don't ask for relationships, that is all that will come back, even though the format is graph. Why do you feel you're getting the whole graph back?

Regards
Luanne
Reply all
Reply to author
Forward
0 new messages