NullPointerException on first query consistently

Aseem Kishore

unread,

Oct 27, 2012, 4:21:56 PM10/27/12

to Neo4j Discussion

On two machines now, both running Neo4j 1.8 GA (Enterprise, Server), I've noticed that for certain queries, I consistently get a NullPointerException for those queries the very first time after a Neo4j startup. Just trying again returns the valid result.

Here are two examples:

For this particular graph, these two queries both throw a NPE every single time after Neo4j is started. Also worth noting is that the response JSON doesn't have a `message` property, just `exception` and `stacktrace`.

The `exception` is "NullPointerException", and the `stacktrace` is always just two elements:

Any idea what might be up? Thanks!

Aseem

Aseem Kishore

unread,

Oct 27, 2012, 4:30:21 PM10/27/12

to Neo4j Discussion

I omitted one detail, which turns out to be very relevant:

Right before these two queries, I'm sending one other query in the background that mutates -- it changes a property on the first node:

If I don't send this query, the other two consistently work. If I do send this query, the other two consistently fail.

I was looking forward to mutable Cypher's transactional behavior to avoid this kind of behavior -- in theory, it shouldn't be a problem sending multiple queries that touch/mutate the same node, since they're each separate transactions, no?

Aseem

brian

unread,

Nov 2, 2012, 10:34:56 AM11/2/12

to ne...@googlegroups.com

I am seeing a similar issue. I've been working with a test that uncovers some concurrency issues when issuing mutating Cypher queries. With a fresh database, I get NPEs. On the second and subsequent runs I don't. I finally put some retry logic in and noticed that after 4-5 retries, the NPEs go away. It may have nothing to do with the retries. It may just be that the NPEs clear after some time period. In my case, I'm using DELETE queries.

In researching the concurrency issues, I discovered that the problems appear to be related to the node cache. If I disabled the node cache by setting

cache_type=none

in neo4j.properties, the concurrency issues went away. So in general, I think there are bugs related to concurrent mutating cypher queries and the cache. You might want to try disabling your cache to see if the problems you're seeing go away.

-brian

Wes Freeman

unread,

Nov 2, 2012, 12:58:31 PM11/2/12

to ne...@googlegroups.com

Is this fixed in 1.9-M01? I had a couple of issues regarding cypher concurrency that were.

Wes

Brian Levine

unread,

Nov 2, 2012, 1:00:31 PM11/2/12

to ne...@googlegroups.com

I'll install 1.9-M01 today, run my tests and report back.

-brian

brian

unread,

Nov 2, 2012, 1:23:22 PM11/2/12

to ne...@googlegroups.com, br...@brianlevine.net

Just installed 1.9-M01. I no longer see the NullPointerExceptions. However, the other problem still exists. In my case this is the "RelationshipRecord[nnn] not in use" error that occurs when I issue concurrent Cypher DELETEs across a number of nodes that share relationships. The test is designed to produce deadlock exceptions so that I can test a retry strategy. Once this error occurs, any query that would traverse that relationship (e.g. start r = rel(*) return r) fails with the same error until I restart the neo4j server. This error does not occur if I set cache_type=none.

-brian

Peter Neubauer

unread,

Nov 2, 2012, 1:41:38 PM11/2/12

to Neo4j User, Brian Levine

Brian,

would you mind filing an issue for this to track it? I will make sure we look at it ASAP.

Thanks!

/peter

Cheers,

/peter neubauer

G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer

Neo4j 1.8 GA - http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html

Wes Freeman

unread,

Nov 2, 2012, 1:42:09 PM11/2/12

to ne...@googlegroups.com

Oh, yes! I have gotten that one as well. Until now I was actually doubting it was a Neo problem, but hadn't dug deep enough into it. My integration tests are failing and I'm pretty sure it is related (because I get the same error trying to clean up after the failing tests locally, and restarting fixes the problem). Can you post a github issue with a description?

Thanks,
Wes

Brian Levine

unread,

Nov 2, 2012, 1:58:13 PM11/2/12

to ne...@googlegroups.com

Sure. I just realized there was no issue for this. Just a number of discussions. Is it possible to include a tarball with in an issue? I'd like to attach the test driver I wrote that reproduces this problem.

-brian

Wes Freeman

unread,

Nov 2, 2012, 2:00:36 PM11/2/12

to ne...@googlegroups.com

You can't attach things in the github issues, but you can put it in a link within the description. To, for example, a github repo with the test driver code.

Wes

Brian Levine

unread,

Nov 2, 2012, 2:11:10 PM11/2/12

to ne...@googlegroups.com

Yes. I'll definitely post a github issue. You migth also want to try disabling the cache to see if your integration tests work better.

-b

Peter Neubauer

unread,

Nov 2, 2012, 2:21:06 PM11/2/12

to Neo4j User

+1

Cheers,

/peter neubauer

G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer

Neo4j 1.8 GA - http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html

Wes Freeman

unread,

Nov 2, 2012, 2:21:12 PM11/2/12

to ne...@googlegroups.com

Our hack was to put sleeps in between some of the stuff, but I've taken that out, and they're just intermittently failing for the moment. I'll give that a try to see if it resolves it.

Wes Freeman

unread,

Nov 2, 2012, 3:42:07 PM11/2/12

to ne...@googlegroups.com

I think that might be it. I managed to get it to fail again after re-enabling the cache, and restarting with the cache disabled again has been succeeding a couple dozen times. But who knows, I might just not be running enough to get it to fail after a restart. Going to leave it disabled for a while to see if I ever hit these failures again. Hate intermittent failures!

At least our github travis status is green for now. :)

https://travis-ci.org/#!/AnormCypher/AnormCypher/builds/3036770

Btw, if anyone wants a script for running against 1.9.M01 on travis, this is ours with cache disabled for now (borrowed and modified the one from scholrly/neo4django):
https://github.com/AnormCypher/AnormCypher/blob/master/install_local_neo4j.bash

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
GPG public key fingerprint registered on keyserver: pgp.mit.edu.

pub 1024D/00FC15DB 2008-06-20
Key fingerprint = 2D6E FAD5 8C75 BB29 F05B DB3A 31B4 6FBB 00FC 15DB
uid Wesley Garrett Freeman (Wes Freeman) <freem...@gmail.com>
sub 2048g/B6956219 2008-06-20

brian

unread,

Nov 2, 2012, 3:49:22 PM11/2/12

to ne...@googlegroups.com

Issue is https://github.com/neo4j/community/issues/962. Test is at https://github.com/blevine/neo4j-concurrent-delete-test

-brian

Jacob Hansson

unread,

Nov 2, 2012, 4:38:45 PM11/2/12

to ne...@googlegroups.com

I think this is a symptom of a known issue with isolation levels. In essence, we need to update isolation level handling to treat each cypher statement as a single read. Today the core API methods have known guaranteed consistency behavior, but the same is not true for cypher, where it is possible to see things that get committed within the runtime of a single query. In this case cypher is doing things with parts of the graph while it is concurrently deleted.

This is a high prio thing, and we've fleshed out how the fix would look. Hopefully we will be able to address soon.

Sent from my phone, please excuse typos and brievety.

Brian Levine

unread,

Nov 2, 2012, 5:57:18 PM11/2/12

to ne...@googlegroups.com

Great! Thanks for the update.

Reply all

Reply to author

Forward