Problems with structure after multiple BatchInserter runs and supernodes

30 views
Skip to first unread message

Joseph Guhlin

unread,
Oct 21, 2014, 1:25:28 PM10/21/14
to ne...@googlegroups.com
JDK: 1.8
Neo4j: 2.1.5 -  Embedded and then stand-alone to test the data

I'm using a program to insert a large amount of data to Neo4j. Because of memory limitations and sped limitations I usually have to do this in a few batches using BatchInserter (separate commands, long after the database has shut down -- not multiple threads). I'm getting things like this:

START x=node:main(id = "Medtr2125s0010")
  MATCH (x)-[:EXPRESSED]-(y)
WITH x,y
  MATCH (y)-[:EXPRESSED]-(g)
RETURN x.id,y.id,g.id

Results here (copy and paste from the Web Console wasn't pretty so this is by hand):
x.idMedtr2125s0010
y.idNodule
g.idPAC:26323170
Returned 1 row in 110 ms

Which doesn't make sense to me, not only should there be over 20,000 entries, even if it is finding just this one it should find the same x.id where g.id is.
I've had no trouble doing multiple BatchInserter calls before, but have had lots of trouble with the 2.1.x line, I believe it is related to the RelationshipGroupStore, which was causing a massive slowdown (see my StackOverflow question here: http://stackoverflow.com/questions/26451609/relationshipgroupstore-mapped-memory-setting-for-batchinserter ).

I plan on trying only 1 BatchInserter incarnation tonight and running it to see if it finishes properly and will report back. But this seems like a bug.

Any advice on speeding up when RelationshipGroupStore slows down during the insert would also be greatly appreciated.

Thanks,
--Joseph

Michael Hunger

unread,
Oct 21, 2014, 2:28:29 PM10/21/14
to ne...@googlegroups.com
Joseph, could it be that you write to the index only in one of the runs?
I thought there was an issue once where, when you didn't write to an index in one run it removed the index-definition? 

Could you check that?

Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joseph Guhlin

unread,
Oct 21, 2014, 2:35:05 PM10/21/14
to ne...@googlegroups.com
I did have that as an issue before, and it has been fixed (on the Neo4j side), and the index query in the query above gives the proper result. 

It is the relationship that seems to exist only when coming from node (x) in the above example, not when coming from (y) (despite not listing it as a directional relationship). Plus the missing several thousand relationships. I'm not using a relationship index as I haven't had need to. 

Sorry my first message wasn't more clear.

Best,
--Joseph

Joseph Guhlin

unread,
Oct 28, 2014, 10:52:46 AM10/28/14
to ne...@googlegroups.com
I was able to create the database using only one instance of the BatchInserter, and am still having the error:

START x=node:main(id = "Medtr2125s0010")
  MATCH (x)-[:EXPRESSED]-(y)
WITH x,y
  MATCH (y)-[:EXPRESSED]-(g)
RETURN x.id,y.id,g.id

Where x is never a g in the second part of the statement. Is there any way to regenerate the Relationship Store or Relationship Group Store?

I'll see if I can make this a test case and report it as a bug, unless anyone has any other ideas.

--Joseph
Reply all
Reply to author
Forward
0 new messages