"Waiting for all transactions to close"

67 views
Skip to first unread message

lauren...@gmail.com

unread,
Sep 4, 2015, 11:38:06 AM9/4/15
to Neo4j
Hi there,

I would need some help about a message appearing in the logs:
-----------------------------------------------
2015-09-02 03:43:09.291+0000 INFO  [o.n.k.i.s.StoreFactory]: Waiting for all transactions to close...
  committed: out-of-order-sequence:133247005 []
  closed:    out-of-order-sequence:133247004 [133246941]
2015-09-02 03:43:39.291+0000 INFO  [o.n.k.i.s.StoreFactory]: Waiting for all transactions to close...
  committed: out-of-order-sequence:133247005 []
  closed:    out-of-order-sequence:133247004 [133246941]
...
-----------------------------------------------

Context:
- Neo4j 2.2.0
- 2 concurrent scripts (written in python 3.3 + py2neo 1.6.2) read and write/update data in the neo4j database

After a while, the scripts seem to freeze. Only suspicious traces found in logs are the message pasted above.
One script running alone doesn't seem to trigger the problem.

From my understanding, this message might be triggered by a process which doesn't consume all results returned by a request. Is it the unique possible cause ?

lauren...@gmail.com

unread,
Sep 5, 2015, 9:33:10 PM9/5/15
to Neo4j
After hours spent on trying to troubleshoot the problem, it seems the issue comes from the BatchWrite class provided by py2neo.
For now, it seems that crreating a new instance of BatchWrite at each iteration of the process and calling run() method instead of submit() may help to solve the problem.
The problem with this "solution" is that run() doesn't seem to propagate exceptions. :(
Investigation in progress...



Nigel Small

unread,
Sep 6, 2015, 11:41:30 AM9/6/15
to Neo4j
Batches should never be reused and a new one will need to be created for each unit of work. Also, run does not spot exceptions, unlike execute since it does not decode the output from the server (including exceptions). This makes it slightly faster but it a tradeoff against using execute.

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lauren...@gmail.com

unread,
Sep 6, 2015, 2:06:04 PM9/6/15
to Neo4j
Hi Nigel,

Thanks for the answer. Very much appreciated and always good to get a confirmation from the expert :)
I'm still investigating the case but progress is very slow because I can't reproduce the problem with "artificial" simple code.

For now, my intuition is that a cypher query or a write batch is stuck and locks the next ones. My main problem is that I can't get any log of an exception.
Currently, I'm checking if enabling execution_guard could help to unlock the situation.

Just one more question: In Py2neo 1.6, we have the following code for WriteBatch:

def run(self):
        return self._execute().close()

def submit(self):
        responses = self._execute()
        try:
            return [BatchResponse(rs).hydrated for rs in responses.json]
        finally:
            responses.close()

One of my previous test seems to indicate that run() don't forward exception but more importantly for me, it seems that it doesn't freeze the processes :)
I guess this behavior is related to the execution of close(). So, I wonder if a modification of submit() method may help to solve my problem. Do you see a potential problem with this modification ?

def submit(self):
        try:
            responses = self._execute()
           return [BatchResponse(rs).hydrated for rs in responses.json]
        finally:
            responses.close()

Mattias Persson

unread,
Sep 7, 2015, 8:07:25 AM9/7/15
to Neo4j
You should try with Neo4j 2.2.5

Nigel Small

unread,
Sep 7, 2015, 10:04:46 AM9/7/15
to Neo4j
Hi Laurent

If you want to see exceptions, you will have to use submit rather than run. Exceptions are only generated when the response is JSON decoded and run does not do this.

I strongly suggest that you upgrade to py2neo 2.0 and, if possible, migrate to using Cypher transactions instead of batches.

Nigel

--

lauren...@gmail.com

unread,
Sep 7, 2015, 12:46:33 PM9/7/15
to Neo4j
Hi Nigel, hi Mathias,

Thanks for the suggestions !


> You should try with Neo4j 2.2.5
Do you think to a specific fix released in this version which may help here ?
Anyway, I think I'll upgrade in a close future.


> I strongly suggest that you upgrade to py2neo 2.0 and, if possible, migrate to using Cypher transactions instead of batches.
Yep. It's clearly a task in my todo list ! :)
For reasons related to the planning of my project, I can't do it right now because it would need to be done in the rush and that sounds risky.

Anyway, I think I've made some good progresses:
- I've found why no exception was bubbling even with a call to submit(). Basically, the exception was silenced by some of my code handling exceptions, Arghhhhh. Stupid me !
- The scenario seems to be the following:
  - a process writes a bunch of update/create with a WriteBatch
  - a concurrent process tries to read some data with a CypherQuery.
  - a lock is detected for the read request. As I've implemented a "retry" pattern around my calls to CypherQuery, the read request is sent again but the first one is never closed => Error message appearing in the logs of Neo4j server.

I'm currently testing this modification of the submit() method:
---------------------------------------------------------------------------------------------------------

def submit(self):
        try:
            responses = self._execute()
            return [BatchResponse(rs).hydrated for rs in responses.json]
        finally:
            if responses:
                responses.close()
---------------------------------------------------------------------------------------------------------

So far, results seem good but I want the processes to run on a long period.
I just want to be sure that putting the call to _execute() inside the try/except block won't have nasty side-effects (especially in case of an exception occurring in the _execute() method.

laurent

Mattias Persson

unread,
Sep 8, 2015, 3:42:25 AM9/8/15
to Neo4j Development
On Mon, Sep 7, 2015 at 6:46 PM, <lauren...@gmail.com> wrote:
Hi Nigel, hi Mathias,

Thanks for the suggestions !

> You should try with Neo4j 2.2.5
Do you think to a specific fix released in this version which may help here ?
Anyway, I think I'll upgrade in a close future.

Yes, there have been resolved issues regarding bugs just like that, so it's definitely a possibility.

> I strongly suggest that you upgrade to py2neo 2.0 and, if possible, migrate to using Cypher transactions instead of batches.
Yep. It's clearly a task in my todo list ! :)
For reasons related to the planning of my project, I can't do it right now because it would need to be done in the rush and that sounds risky.

Anyway, I think I've made some good progresses:
- I've found why no exception was bubbling even with a call to submit(). Basically, the exception was silenced by some of my code handling exceptions, Arghhhhh. Stupid me !
- The scenario seems to be the following:
  - a process writes a bunch of update/create with a WriteBatch
  - a concurrent process tries to read some data with a CypherQuery.
  - a lock is detected for the read request. As I've implemented a "retry" pattern around my calls to CypherQuery, the read request is sent again but the first one is never closed => Error message appearing in the logs of Neo4j server.

I'm currently testing this modification of the submit() method:
---------------------------------------------------------------------------------------------------------
def submit(self):
        try:
            responses = self._execute()
            return [BatchResponse(rs).hydrated for rs in responses.json]
        finally:
            if responses:
                responses.close()
---------------------------------------------------------------------------------------------------------

So far, results seem good but I want the processes to run on a long period.
I just want to be sure that putting the call to _execute() inside the try/except block won't have nasty side-effects (especially in case of an exception occurring in the _execute() method.

laurent

--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/fQx9O3cu0n0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Mattias Persson
Neo4j Hacker at Neo Technology

lauren...@gmail.com

unread,
Sep 8, 2015, 10:33:44 AM9/8/15
to Neo4j
Thanks Mathias ! I've downloaded 2.2.5 and I will upgrade the server in the coming days.

FWIW, the modification of the submit() method doesn't sound like the way to go. If the scenario described in my previous email is correct, this modification can't fix the problem (stupid me, episode 2 :D).
I'm currently trying another fix based on the modification of my 'retry' pattern for failed CypherQueries.

lauren...@gmail.com

unread,
Sep 8, 2015, 11:21:28 AM9/8/15
to Neo4j
Mathias,
I've just tried to install 2.2.5 on my dev server.
It seems that store version of my db is v0.A.4 (neo4j 2.2.0) but neo4j 2.2.5 expects v0.A.3.
So, it doesn't sound like an option for me. I fear that I'll have to wait for a stable 2.3.0 to upgrade my store to v5.


Mattias Persson

unread,
Sep 14, 2015, 5:13:26 PM9/14/15
to Neo4j Development
Store version 0.A.4 was used in milestones for 2.2 and upgrading from that isn't supported. 0.A.3 is that of neo4j version 2.1, and 2.2.x has store version 0.A.5. So you'll have to recreate your database created using a milestone with version 2.2.5

--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/fQx9O3cu0n0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lauren...@gmail.com

unread,
Sep 15, 2015, 12:13:30 PM9/15/15
to Neo4j
@Mathias:
Thanks for the information.
I guess this solution requires some serious preparation/tools on my side considering that my db has around 400M nodes and 800M relationships.
I've seen some nice bug fixes & improvements in v2.2.x but if 2.2.0 is stable enough, may be I'll wait for the release of 2.3 and the upgrade of the store.

@Nigel:
WRT my initial problem, I've tried a few fixes with v1.6.4. Things seemed better but I finally encountered the same problem after 24h of processing.
Therefore, I've decided to upgrade to v2.0.7. It required a few adaptations of my code but nothing too terrible. :)
Things seem better. The "retry pattern" (try ...except... rollback) seems to do its job.
So far, the unique problem encountered was a socket timeout which ended my script and left unclosed connections on the server (same messages as described in OP).
I'm currently testing the solution proposed in this StackOverflow post: http://stackoverflow.com/questions/28776140/py2neo-py2neo-packages-httpstream-http-socketerror-timed-out-execute-stream
Is it still the recommended solution for this problem ?


Reply all
Reply to author
Forward
0 new messages