NullPointerException on first query consistently

45 views
Skip to first unread message

Aseem Kishore

unread,
Oct 27, 2012, 4:21:56 PM10/27/12
to Neo4j Discussion
On two machines now, both running Neo4j 1.8 GA (Enterprise, Server), I've noticed that for certain queries, I consistently get a NullPointerException for those queries the very first time after a Neo4j startup. Just trying again returns the valid result.

Here are two examples:

Query:
START page=node({id})
MATCH (page) -[rel:page_author]-> (user)
RETURN rel, user

Params:
{"id":18}

Query:
START u=node({id})
SET u.seen = {now}
RETURN u.seen

Params:
{"id":5,"now":1351368753342}

For this particular graph, these two queries both throw a NPE every single time after Neo4j is started. Also worth noting is that the response JSON doesn't have a `message` property, just `exception` and `stacktrace`.

The `exception` is "NullPointerException", and the `stacktrace` is always just two elements:

[
  "org.neo4j.server.rest.web.CypherService.cypher(CypherService.java:79)",
  "java.lang.reflect.Method.invoke(Method.java:597)"
]

Any idea what might be up? Thanks!

Aseem

Aseem Kishore

unread,
Oct 27, 2012, 4:30:21 PM10/27/12
to Neo4j Discussion
I omitted one detail, which turns out to be very relevant:

Right before these two queries, I'm sending one other query in the background that mutates -- it changes a property on the first node:

Query:
START n=node({id})
SET n.views = COALESCE(n.views?, 0) + 1
RETURN n.views

Params:
{"id":18}

If I don't send this query, the other two consistently work. If I do send this query, the other two consistently fail.

I was looking forward to mutable Cypher's transactional behavior to avoid this kind of behavior -- in theory, it shouldn't be a problem sending multiple queries that touch/mutate the same node, since they're each separate transactions, no?

Aseem

brian

unread,
Nov 2, 2012, 10:34:56 AM11/2/12
to ne...@googlegroups.com
I am seeing a similar issue. I've been working with a test that uncovers some concurrency issues when issuing mutating Cypher queries. With a fresh database, I get NPEs.  On the second and subsequent runs I don't.  I finally put some retry logic in and noticed that after 4-5 retries, the NPEs go away.  It may have nothing to do with the retries.  It may just be that the NPEs clear after some time period.  In my case, I'm using DELETE queries.

In researching the concurrency issues, I discovered that the problems appear to be related to the node cache. If I disabled the node cache by setting

cache_type=none

in neo4j.properties, the concurrency issues went away.  So in general, I think there are bugs related to concurrent mutating cypher queries and the cache.  You might want to try disabling your cache to see if the problems you're seeing go away.

-brian

Wes Freeman

unread,
Nov 2, 2012, 12:58:31 PM11/2/12
to ne...@googlegroups.com
Is this fixed in 1.9-M01? I had a couple of issues regarding cypher concurrency that were.

Wes

--
 
 

Brian Levine

unread,
Nov 2, 2012, 1:00:31 PM11/2/12
to ne...@googlegroups.com
I'll install 1.9-M01 today, run my tests and report back.

-brian



--
 
 

brian

unread,
Nov 2, 2012, 1:23:22 PM11/2/12
to ne...@googlegroups.com, br...@brianlevine.net
Just installed 1.9-M01.  I no longer see the NullPointerExceptions.  However, the other problem still exists. In my case this is the "RelationshipRecord[nnn] not in use" error that occurs when I issue concurrent Cypher DELETEs across a number of nodes that share relationships.  The test is designed to produce deadlock exceptions so that I can test a retry strategy. Once this error occurs, any query that would traverse that relationship (e.g. start r = rel(*) return r) fails with the same error until I restart the neo4j server.  This error does not occur if I set cache_type=none.

-brian

Peter Neubauer

unread,
Nov 2, 2012, 1:41:38 PM11/2/12
to Neo4j User, Brian Levine
Brian,
would you mind filing an issue for this to track it? I will make sure we look at it ASAP.

Thanks!

/peter


Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j 1.8 GA - http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html


--
 
 

Wes Freeman

unread,
Nov 2, 2012, 1:42:09 PM11/2/12
to ne...@googlegroups.com
Oh, yes! I have gotten that one as well. Until now I was actually doubting it was a Neo problem, but hadn't dug deep enough into it. My integration tests are failing and I'm pretty sure it is related (because I get the same error trying to clean up after the failing tests locally, and restarting fixes the problem). Can you post a github issue with a description?

Thanks,
Wes

--
 
 

Brian Levine

unread,
Nov 2, 2012, 1:58:13 PM11/2/12
to ne...@googlegroups.com
Sure.  I just realized there was no issue for this.  Just a number of discussions.  Is it possible to include a tarball with in an issue?  I'd like to attach the test driver I wrote that reproduces this problem.

-brian


--
 
 

Wes Freeman

unread,
Nov 2, 2012, 2:00:36 PM11/2/12
to ne...@googlegroups.com
You can't attach things in the github issues, but you can put it in a link within the description. To, for example, a github repo with the test driver code.

Wes

--
 
 

Brian Levine

unread,
Nov 2, 2012, 2:11:10 PM11/2/12
to ne...@googlegroups.com
Yes.  I'll definitely post a github issue. You migth also want to try disabling the cache to see if your integration tests work better.

-b


--
 
 

Peter Neubauer

unread,
Nov 2, 2012, 2:21:06 PM11/2/12
to Neo4j User
+1


Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j 1.8 GA - http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html


--
 
 

Wes Freeman

unread,
Nov 2, 2012, 2:21:12 PM11/2/12
to ne...@googlegroups.com
Our hack was to put sleeps in between some of the stuff, but I've taken that out, and they're just intermittently failing for the moment. I'll give that a try to see if it resolves it.


--
 
 

Wes Freeman

unread,
Nov 2, 2012, 3:42:07 PM11/2/12
to ne...@googlegroups.com
I think that might be it. I managed to get it to fail again after re-enabling the cache, and restarting with the cache disabled again has been succeeding a couple dozen times. But who knows, I might just not be running enough to get it to fail after a restart. Going to leave it disabled for a while to see if I ever hit these failures again. Hate intermittent failures!

At least our github travis status is green for now. :)

Btw, if anyone wants a script for running against 1.9.M01 on travis, this is ours with cache disabled for now (borrowed and modified the one from scholrly/neo4django): 
https://github.com/AnormCypher/AnormCypher/blob/master/install_local_neo4j.bash

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
GPG public key fingerprint registered on keyserver: pgp.mit.edu.

pub 1024D/00FC15DB 2008-06-20
Key fingerprint = 2D6E FAD5 8C75 BB29 F05B DB3A 31B4 6FBB 00FC 15DB
uid Wesley Garrett Freeman (Wes Freeman) <freem...@gmail.com>
sub 2048g/B6956219 2008-06-20

brian

unread,
Nov 2, 2012, 3:49:22 PM11/2/12
to ne...@googlegroups.com

Jacob Hansson

unread,
Nov 2, 2012, 4:38:45 PM11/2/12
to ne...@googlegroups.com

I think this is a symptom of a known issue with isolation levels. In essence, we need to update isolation level handling to treat each cypher statement as a single read. Today the core API methods have known guaranteed consistency behavior, but the same is not true for cypher, where it is possible to see things that get committed within the runtime of a single query. In this case cypher is doing things with parts of the graph while it is concurrently deleted.

This is a high prio thing, and we've fleshed out how the fix would look. Hopefully we will be able to address soon.

Sent from my phone, please excuse typos and brievety.

--
 
 

Brian Levine

unread,
Nov 2, 2012, 5:57:18 PM11/2/12
to ne...@googlegroups.com
Great! Thanks for the update.
--
 
 
Reply all
Reply to author
Forward
0 new messages