errors on delete

315 views
Skip to first unread message

Yaron Naveh

unread,
Jun 23, 2012, 2:04:33 PM6/23/12
to ne...@googlegroups.com
Hi

I'm using neo4j 1.8 using node-neo4j (which uses the REST api of neo4j).

I need to delete 100 nodes. When I do this synchronously there is no problem (e.g. I wait for one delete to finish before calling the next one).
When I delete them in an async manner (many rest requests sent concurrently) I intermittently get a few failures. I see this coming back in the http response once per each error:

HTTP/1.1 500 Transaction(15744)[STATUS_ACTIVE,Resources=1] can't wait on resource RWLock[Relationship[8253]] since => Transaction(15744)[STATUS_ACTIVE,Resources=1] <-[:HELD_BY]- RWLock[Node[1841]] <-[:WAITING_FOR]- Transaction(15744)[STATUS_ACTIVE,Resources=1] <-[:HELD_BY]- RWLock[Relationship[8253]]
Content-Length: 0
Server: Jetty(6.1.25)

I believe neo4j uses keep-alive to reuse the same connection but this does not seem relevant. I have verified my code is (logically) correct, e.g. I do not delete an already deleted node and etc.

Any idea?

Thanks,
Yaron

Florent Empis

unread,
Jun 23, 2012, 3:52:39 PM6/23/12
to ne...@googlegroups.com
(Answering this with very partial knowledge, I use this as an occasion to learn stuff about Neo4J)
The advice given then was to manage the locking mechanism by hand.
Looking at your trace, I'd say that your data is like this:

(Node X) -[Rel 8253]-(Node 1841)

You are currently trying to delete node X.
Sometime before, you ran an operation (a delete I guess?) on node 1841
To be able to perform this, Relationship 8253 has to be locked (it will be deleted, since per  http://docs.neo4j.org/chunked/milestone/transactions-delete.html  all properties and relationships of a node are deleted upon delete of the node itself)
When you do it in synch mode:
You: Delete Node 1841
You: Wait
Server: Delete confirmed
You: Delete Node X

Works, obviously
When you do it in asynch mode:
What you are attempting to do is:
You:Delete Node 1841
Server:Locks Node 1841 and all its relationships
You:Delete Node X
Server:Attemps to lock 1841 and all its relationships: fails on rel 8253
Server:Boom, Err 500 on delete Node X
Server:(probably) Delete Node 1841 confirmed

Either you can manage the lock on the server side like suggested in the 2009 thread, or I think you'll have to manage it externally.

Can you add some sort of management in your delete calls to the server, partionning it to avoid collisions? Something akin to this:
[0...33][34...66][67...99]
(this example assumes that relationships are all between direct neighbours of course...)
3 threads run in parallel.
Thread A starts at 0 stops at 33
Thread B starts at 66 stops at 34
Thread B starts at 99 stops at 67

This should in theory avoid collisions if we assume delete operation duration to be constant for all nodes...?
(if you want to be extra careful, add a dampenning at the end of the partition: as you get closer to the boundary, wait a little bit between calls...it will give extra time to the neighbouring thread to move further away....)

Please remember I'm probably as new to Neo4J as you, so take all this with a grain of salt :-)

Florent
2012/6/23 Yaron Naveh <yaro...@gmail.com>

Yaron Naveh

unread,
Jun 23, 2012, 6:44:34 PM6/23/12
to ne...@googlegroups.com
Thanks

Actually I have a node X, and multiple nodes a..z that are linked to it. I am deleting all links. Since all links are different (except their end point X) I'm not sure how deleting one link should lock another?

what does it mean to do it on the server side?
not sure how how to handle it externally - if I have multiple clients on different machines I cannot synchronize them. I am expecting neo4j to lock everything of interest while deleting a link.
--

I'm on Twitter (@YaronNaveh)


Michael Hunger

unread,
Jun 23, 2012, 10:19:11 PM6/23/12
to ne...@googlegroups.com
Deleting a relationship currently locks both nodes.

What about deleting them all in a batch transaction?

Or deleting them with mutating cypher? http://docs.neo4j.org/chunked/snapshot/query-delete.html

MIchael

Yaron Naveh

unread,
Jun 24, 2012, 5:59:52 AM6/24/12
to ne...@googlegroups.com
Thanks

I still don't get the problem. I have node A with links to different nodes. I delete the links. There should be no deadlock in my scenario. Possibly A will be locked for a while by one delete, by why don't other deletes wait for the lock to be released?

Also can you give me a link for batch transactions? 

Peter Neubauer

unread,
Jul 5, 2012, 4:30:49 AM7/5/12
to ne...@googlegroups.com
Yaron,
batch transactions are covered at
http://docs.neo4j.org/chunked/snapshot/rest-api-batch-ops.html

Is that what you are looking for?

Cheers,

/peter neubauer

G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j

Yaron Naveh

unread,
Jul 5, 2012, 11:20:09 AM7/5/12
to ne...@googlegroups.com
Thanks

I'm currently using a workaround - doing everything in a sync manner. but I'm still not sure why the async version fails, this is my main worry now. I wouldn't expect batch to be required here - deleting multiple relationships of a single node.

Peter Neubauer

unread,
Jul 5, 2012, 11:21:46 AM7/5/12
to ne...@googlegroups.com
Could you just try Cypher to delete your relationships?
http://docs.neo4j.org/chunked/snapshot/query-delete.html#delete-remove-a-node-and-connected-relationships

Cheers,

/peter neubauer

G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j


Michael Hunger

unread,
Jul 5, 2012, 1:10:08 PM7/5/12
to ne...@googlegroups.com
#1 Each REST call is an individual tx and individual thread, except in cypher and batch operations which group multiple operations.
#2 if you want to send properties as parameters (which you should anyway) then you can post real json as params, see 


Michael

Am 05.07.2012 um 18:52 schrieb Jesse:

The REST API seems to have serious issues with concurrency - the new mutating Cypher stuff does appear to address this but you'll basically be doing everything the Cypher way. Which, mind you, isn't necessarily a bad thing; I am finding Cypher to be very intuitive and powerful... excepting the inconvenient pseudo-JSON format for sending property data :)

Peter Neubauer

unread,
Jul 5, 2012, 1:29:15 PM7/5/12
to ne...@googlegroups.com

Jesse,
How would you envision property data to be sent? Speak up :-)

/peter

Send from mobile.

On Jul 5, 2012 6:53 PM, "Jesse" <je...@guestd.com> wrote:
The REST API seems to have serious issues with concurrency - the new mutating Cypher stuff does appear to address this but you'll basically be doing everything the Cypher way. Which, mind you, isn't necessarily a bad thing; I am finding Cypher to be very intuitive and powerful... excepting the inconvenient pseudo-JSON format for sending property data :)


On Thursday, July 5, 2012 11:21:46 AM UTC-4, Peter Neubauer wrote:
On Thursday, July 5, 2012 11:21:46 AM UTC-4, Peter Neubauer wrote:

Jesse

unread,
Jul 5, 2012, 6:56:01 PM7/5/12
to ne...@googlegroups.com
Hey guys, thanks for getting back to me - #2 looks like it solves my JSON issue, I will definitely be checking that out.

Back to the original topic though: the new mutating Cypher syntax I realize only makes it easier to group multiple operations into the same transaction - there still seem to be concurrency problems if one tries to simultaneously delete two distinct relationships to the same node with a Cypher call each. I get back something like:

Transaction(197)[STATUS_ACTIVE,Resources=1] can't wait on resource RWLock[Relationship[120]] since => Transaction(197)[STATUS_ACTIVE,Resources=1] <-[:HELD_BY]- RWLock[Node[60]] <-[:WAITING_FOR]- Transaction(197)[STATUS_ACTIVE,Resources=1] <-[:HELD_BY]- RWLock[Relationship[120]]

If Neo4j can detect and prevent a RWLock ahead of time, can't it simply queue my Cypher transactions FIFO style? I can't think of an easy way to handle this situation in a scalable fashion from my application (a webserver).

Jesse

Jesse

unread,
Jul 5, 2012, 7:03:15 PM7/5/12
to ne...@googlegroups.com
I suppose I could just keep retrying until the resource opens up? Seems like a pain though if the db could handle it for me...

Michael Hunger

unread,
Jul 5, 2012, 7:10:05 PM7/5/12
to ne...@googlegroups.com
It detects and abort deadlocks, so usually you would retry the operation.

The deadlock detection is happening after the fact, to precompute all that would not be feasible as the tx-system would have to know a lot about the semantics of all your operations. And usually it is seldom enough that a simple
retry is a good solution.

Michael

Jesse

unread,
Jul 5, 2012, 7:31:10 PM7/5/12
to ne...@googlegroups.com
Ok. Thanks again for getting back to me so quick. I've just tested passing JSON as a param and auto-retrying when I see the RWLock error - seems to solve both problems. Cheers!

Jesse

Shireesh

unread,
Jul 7, 2012, 1:26:16 PM7/7/12
to ne...@googlegroups.com

I tried cyper delete, but it deletes all the connected nodes and edges, but i need to delete connected nodes based on their properties (eg : delete connected node only if it has value "x" on a certain property and which is connected to parent node through the edge "relx")

I used server plugins to achieve the above behavior.

How good is this approach, any comments would be helpful.

Thanks,
Shireesh.

Peter Neubauer

unread,
Jul 7, 2012, 3:44:43 PM7/7/12
to ne...@googlegroups.com

Hi there,
If you can find and return the nodes you are interested in deleting with a Cypher query then you should be able to delete them too. Do you have any example graph and what to delete?

/peter

Send from mobile.

Michael Hunger

unread,
Jul 7, 2012, 4:58:45 PM7/7/12
to ne...@googlegroups.com
Shireesh,

you can filter with cypher before deleting.

start n=...
match n-[r]-m
where r.foo = 'bar'
delete r

shireesh adla

unread,
Jul 8, 2012, 11:25:55 AM7/8/12
to ne...@googlegroups.com

Thanks for the response.

I have connected graph in this order,

A --> [produces] X1 [consumes] <-- B
A --> [produces] X2
A --> [consumes] X3 [produces] <-- B
A --> [uses] R1 [uses] <-- B
A --> [uses] R2
X2 -->[has] L1

nodes : [A,B,X1,X2,X3,R1,R2,L1]
edge types : [produces,consumes,uses,has]

Now delete [A] needs to delete in the following way,
-- delete all those nodes which A produces only is no other node consumes it.[eg X2]
-- delete all nodes which A uses, if no other node uses it. [eg : R2]
-- delete the grand child nodes connected to the child node [L1]

so based on above conditions if i delete A,
  nodes to be deleted : A,X2,L1,R2

Thanks,
Shireesh
--
Thanks and Regards ,
Shireesh Adla

shireesh adla

unread,
Jul 8, 2012, 11:27:09 AM7/8/12
to ne...@googlegroups.com

Thanks Michael,

Will give it a try.

Regards,
Shireesh.

brian

unread,
Oct 8, 2012, 11:16:16 AM10/8/12
to ne...@googlegroups.com
I am running into the same problem and will have to use the Cypher method you describe.  However I'm also confused as to why deleting multiple relationships concurrently via the REST API produces a deadlock.  Could you explain further?

Thanks.

-brian


On Thursday, July 5, 2012 11:21:46 AM UTC-4, Peter Neubauer wrote:

Aseem Kishore

unread,
Oct 27, 2012, 7:42:40 PM10/27/12
to ne...@googlegroups.com
Just discovered this thread -- great stuff. We used to see transaction errors from time to time too, but they've mostly disappeared that now that I'm finally migrating our app to mutable Cypher.

Still, I just saw one earlier -- a deadlock error message for the first time -- and it's good to know that a simple retry is recommended in this case.

(Though I agree 100% that, intuitively, I would expect Neo4j to internally serialize queries that need access to the same resources. Isn't transactional support a key selling point of mutable Cypher?)

We obviously shouldn't be retrying every 500 response from Neo4j though (or should we?). So I wanted to ask how you guys recommend we detect retryable failures specifically.

Should we be checking the `exception` property in the response for a whitelisted set of exception names? Which names?

Thanks much!

Aseem

Michael Hunger

unread,
Oct 27, 2012, 8:45:41 PM10/27/12
to ne...@googlegroups.com
Aseem,

can you raise an issue about this?

I think it is most sensible to return a different error code as you suggested when a deadlock exception arises.

What do you think?
- 409 Conflict Indicates that the request could not be processed because of conflict in the request, such as an edit conflict.
- 423 Locked (WebDAV; RFC 4918) The resource that is being accessed is locked.

The only thing I see as an issue (in general) is when streaming back results from requests, that the deadlock exception (and others) only after the header has already been sent (and parts of the response too) as the execution happens lazily and is ongoing on the server.

Michael
> --
>
>

Aseem Kishore

unread,
Oct 27, 2012, 8:57:42 PM10/27/12
to ne...@googlegroups.com
Done:


Streaming will indeed be tricky. Besides this signaling issue, how should clients act btw if they got and processed some results back before this kind of transaction error?

Any example kind of scenarios where a transaction error might invalidate earlier results? Or are earlier results always okay by definition?

Aseem

--



Michael Hunger

unread,
Oct 27, 2012, 9:28:28 PM10/27/12
to ne...@googlegroups.com
Good points,

as the transaction is rolled back the previous elements that originated from a mutating operation are invalid. So earlier results of those are _never_ ok. The transaction that is rolled back spans the whole http request.

Things that originated from a read operation which was not affected by the mutation should still be valid.

I would want to change the results for streaming operations to be more individual records that are returned, so that one or more of those record might be "error-records" reporting failures. This is still TDB though. Much like 
the twitter streaming API results. 

Michael

--
 
 

Andres Taylor

unread,
Oct 28, 2012, 10:26:35 AM10/28/12
to ne...@googlegroups.com

Updating queries are executed eagerly, to avoid this exact problem. If an exception is thrown during execution, no results are returned.

Andrés

--
 
 

Brian Levine

unread,
Oct 28, 2012, 11:08:00 AM10/28/12
to ne...@googlegroups.com
A few points.  In any Cypher query which may contain parts that are mutating and parts that are not, isn't there always a single response which depends on the last clause in the query?  For example, if the last clause is a RETURN, you'll get a response that is dictated by the RETURN clause. If the final clause is a mutating clause, you'll get whatever that clause is supposed to return (e.g. number of rows affected by a DELETE). The point is that the entire query is evaluated before the server begins to return the response.  This is true even in the streaming case, correct?  So if a transaction error occurs during even the most complex Cypher query, it's a transaction error for that whole query. And it occurs before the first byte is returned to the caller.  So it should be possible to immediately communicate the deadlock error in the response. Please correct me if I'm wrong. 

Having said all this, I think there is too much focus on how to map any given Neo4j error to an HTTP status code.  Yes, that's supposed to be the "RESTful way", but I think that when you're sending queries (which may be read queries, mutating querie, or both), a different approach is necessary because you're effectively tunneling a command (Cypher query) inside a REST request. In that case, you either need the HTTP status to reflect the underlying error as closely as possible or the status code should be a 200 (since the query was valid and was accepted by the server).  409 is probably as close as you can get for a deadlock exception, but I'm still not sure it's appropriate as compared to a 200. When an error occurs the body should contain a structured message that provides as much information as possible about what happened and what you can do about it if it's correctable.  In other words, as part of the media type you've defined for your REST API, the representation of an error should also be defined.  The actual form of the error result should at least contain well-documented Neo4j-specific error codes/strings.

With respect to deadlock errors, IMHO it has become way to easy to introduce them. This can happen any time you try to delete 2 or more nodes that share relationships since you must delete the relationships prior to deleting the nodes. Deleting a node, locks the node. Deleting a relationship locks the relationship and both its Nodes. As Aseem pointed out, it seems like Neo4j itself could figure out how to order the operations so that the locks are created/released in the correct order.  It also seems that the server itself could implement a configurable retry strategy so that it's not the responsibility of the client of the REST API to retry over the wire when these errors do occur.

-brian


--
 
 

Peter Neubauer

unread,
Oct 28, 2012, 12:33:28 PM10/28/12
to ne...@googlegroups.com
Brian,
I like the retry strategy config for the REST endpoint. Should that be part of every query as an additional parameter, or a global config in the server properties you think?


Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

--
 
 

Brian Levine

unread,
Oct 28, 2012, 12:58:31 PM10/28/12
to ne...@googlegroups.com
Good question. It might be useful to allow this to be specified in the request. Perhaps there are use-cases in which the caller would want to be informed about a problem immediately without waiting for (possibly costly) retries.  I guess the foolproof answer is "both." Allow a default to be set as a config and and then a dynamic override. Though, I still question whether this should be in the REST server or pushed down even further into the Cypher query engine. If the latter, I would vote for this being a neo4j.properties config.

A couple more points on the whole deadlock issue.  IMHO, I should be able to delete a list of nodes without any problem. If I issue a query to delete nodes 1, 2, 3, I should be able to do it even if those nodes share relationships. The problem arises because of the restriction that a node cannot be deleted if it has relationships.  This means that the caller is required to delete the relationships by some means prior to deleting the nodes.  For example:

start n = nodes(1,2,3) match n-[r?]-() delete r,n

There's no way from this query to determine that my real intent was to delete 1,2,3.  To the engine, it just looks like I want to delete some some nodes and relationships.

The problem is that the intent of the Cypher query was not clear.  But if you could have some sort of 'force' modifier that says: "do whatever you need to do to delete these nodes", that might be helpful as it takes the responsiblity for first deleting the relationships away from the caller, e.g.:

START n = nodes(1,2,3) FORCE DELETE n (or something like that)

The intent of this query is clear. I realize that this is specific to one type of error (deadlock errors) and one type of mutation (deletes) and a more generic solution might be required.  It's just that the deadlock-on-delete issue is the one I happen to be wrestling with right now ;-)

-brian


--
 
 

Reply all
Reply to author
Forward
0 new messages