Possible bug (2.0.0) - fail to commit transactions that are read-only cypher queries (nothing to commit !?)

748 views
Skip to first unread message

M. David Allen

unread,
Feb 5, 2014, 9:56:19 AM2/5/14
to ne...@googlegroups.com
I'm running into exceptions where my transactions fail to commit, even though they're read-only cypher queries.   I created a small test database that's attached to some software I've written (running embedded DB). I'm attaching messages.log for a lot of diagnostics.    I ran some simple stub code to populate the test database, then do some querying on it using methods I wrote.

I'm hoping to find some help with why this is happening.  I can actually safely ignore these transaction exceptions, and nothing seems to go horribly wrong, but they just shouldn't be there and I don't want to ignore them as my strategy.   Below I'm providing what the exceptions look like, and the code of the method that's creating these exceptions.  In other parts of my code base I have other similar queries (always read-only cypher queries wrapped in transactions) that have the same problems.   I ran this stub to try to illustrate the problem in one specific compact way.

The exception dumps --

SEVERE: Failed transaction: Unable to commit transaction
org.neo4j.graphdb.TransactionFailureException: Unable to commit transaction
    at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:134)
    at org.mitre.provenance.db.neo4j.Neo4JStorage.getMembers(Neo4JStorage.java:545)
    at org.mitre.provenance.test.Stub.main(Stub.java:17)
Caused by: javax.transaction.RollbackException: Failed to commit, transaction rolled back
    at org.neo4j.kernel.impl.transaction.TxManager.rollbackCommit(TxManager.java:623)
    at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:402)
    at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:122)
    at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:124)

The code that's throwing the exception:

    public static ProvenanceCollection getMembers(PLUSWorkflow wf, User user, int maximum) {
        ViewedCollection d = new ViewedCollection(user);
        if(db == null) initialize();
       
        if(maximum <= 0 || maximum > Neo4JPLUSObjectFactory.MAX_OBJECTS)
            maximum = 100;
       
        Map<String,Object>params = new HashMap<String,Object>();
        params.put("wf", wf.getId());
        String query = "start r=relationship:relationship_auto_index(workflow={wf}) " +                
                "return r " +
                "limit " + maximum;
       
        try (Transaction tx = db.beginTx()) {        
            ResourceIterator<Relationship> rs = Neo4JStorage.execute(query, params).columnAs("r");       
           
            try {
                while(rs.hasNext()) {
                    Relationship r = rs.next();
                               
                    d.addNode(Neo4JPLUSObjectFactory.newObject(r.getStartNode()));
                    d.addNode(Neo4JPLUSObjectFactory.newObject(r.getEndNode()));
                    d.addEdge(Neo4JPLUSObjectFactory.newEdge(r));           
                }
            } catch(PLUSException exc) {
                exc.printStackTrace();
            }

            rs.close();
            tx.success();
        } catch(TransactionFailureException exc) {
            log.severe("Failed transaction: " + exc.getMessage());
            exc.printStackTrace();
        }
       
        return d;
    } // End getMembers   

Another relevant method (Neo4JStorage.execute):  (db is a GraphDatabaseService)

    public static ExecutionResult execute(String cypherQuery, Map<String,Object>params) {
        if(db == null) initialize();
       
        ExecutionEngine engine = new ExecutionEngine(db);
               
        assert(db.index().getNodeAutoIndexer().isEnabled());
               
        StringBuffer sb = new StringBuffer("");
        for(String k : params.keySet()) sb.append(" " + k + "=" + params.get(k));
       
        //log.info("EXECUTING: " + cypherQuery + " /" +sb);
        return engine.execute(cypherQuery + " ", params);       
    }

messages.log

Jacob Hansson

unread,
Feb 5, 2014, 10:13:55 AM2/5/14
to ne...@googlegroups.com

David, are there any additional cause exceptions available for that stack trace?

Sent from my phone, please excuse typos and brevity.

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

M. David Allen

unread,
Feb 5, 2014, 10:24:09 AM2/5/14
to ne...@googlegroups.com
No, not that I can see.

I removed the catch code and re-ran to see what else it might come up with, and below is what I got:

Exception in thread "main" org.neo4j.graphdb.TransactionFailureException: Unable to commit transaction

    at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:134)
    at org.mitre.provenance.db.neo4j.Neo4JStorage.getMembers(Neo4JStorage.java:545)
    at org.mitre.provenance.test.Stub.main(Stub.java:17)
Caused by: javax.transaction.RollbackException: Failed to commit, transaction rolled back
    at org.neo4j.kernel.impl.transaction.TxManager.rollbackCommit(TxManager.java:623)
    at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:402)
    at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:122)
    at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:124)
    ... 2 more

M. David Allen

unread,
Feb 6, 2014, 8:19:21 AM2/6/14
to ne...@googlegroups.com
I don't mean to be a pest, but I would really appreciate some feedback on this from the devs.  It's something I've been wrestling with since I made the move to 2.0.0, and I'd really like to figure it out.

I am more than willing to run any sample code or test cases that will help diagnose this, or provide any additional information that would be useful to get to the bottom of this.

Thanks.

Jacob Hansson

unread,
Feb 7, 2014, 6:25:13 AM2/7/14
to ne...@googlegroups.com

Thanks for bumping this David, it fell off my radar, sorry.

Im gonna have a look at your original sample code, and Ill get back to you.

in the mean time, would you be able to send me the messages.log file that neo puts in its database folder? If you are running the server, thats in data/graph.db/, otherwise its whatever folder you tell the embedded database to use.

jake

Sent from my phone, please excuse typos and brevity.

Jacob Hansson

unread,
Feb 7, 2014, 7:24:10 AM2/7/14
to ne...@googlegroups.com
Update on this: I've modified the example code you sent, but am so far unable to replicate your issue. The messages.log should contain useful info though. If it does not, I might need your help to put together a stand-alone piece of code that recreates the problem.

Best,
Jake

M. David Allen

unread,
Feb 7, 2014, 9:15:51 AM2/7/14
to ne...@googlegroups.com
Attached is the messages.log - it was also attached to the original message in this thread.

In my original message, I didn't include all the details of neo version, JVM, OS, etc because that all appears to be in this log anyway.

Thanks!




--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/w1L_21z0z04/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.



--
M. David Allen
Mobile: (804) 787-0289
messages.log

Jacob Hansson

unread,
Feb 7, 2014, 2:15:24 PM2/7/14
to ne...@googlegroups.com

Sweet, thanks man, sorry for missing it, was on my phone earlier today. Ill have a look at it on monday.

jake

Sent from my phone, please excuse typos and brevity.

Jacob Hansson

unread,
Feb 10, 2014, 4:50:46 AM2/10/14
to ne...@googlegroups.com
Hey,

I'm sorry to say I've gone through the messages.log, and it contains from what I can see no hints to your problem. The upside of that is that it's not a "global" problem, eg. the database itself is not concerned about these exceptions (and thus does not log them), it only affects these specific transactions. That could mean that what you are seeing is something like a constraint violation or a detected deadlock.

Does this exception happen every time you run the code, or intermittently? Are you using constraints, or are you manually taking locks with tx.acquireXXLock()?

I'll need you to help me out to get further. Would you be able to take the code you have, and slowly remove parts until you get it down to a stand-alone java class with a main method that replicates the problem? Without running code that replicates the problem, or the cause of the exceptions you are seeing, it's nearly impossible to tell what is wrong.

It could be worth trying to upgrade to 2.0.1, which contains a lot of bug fixes - but fair warning, I haven't seen any issues like this that was a "real" bug, so it is not at all certain that upgrading would resolve your issue. 


Sorry this is a bit tricky to debug, hang in there and I'm sure we'll get it sorted.
jake

M. David Allen

unread,
Feb 10, 2014, 8:07:55 AM2/10/14
to ne...@googlegroups.com
Jake,

Thanks for looking into this.  I'm going to try a few extra things including going to 2.0.1.   To answer your immediate questions, the problem happens consistently when I run particular cypher queries - the one I sent is one such query.  It's consistent in that whenever I run certain methods that contain those queries, this results.    It's not consistent in that I have many other read-only cypher queries that don't display this same behavior.

I doubt I'm going to be able to remove code until I can give you something functional; our test database is populated with stuff generated by other classes.  Based on how the code is written I fear I'd have to send you at least a dozen or more classes, it would take substantial work to trim things out, and we haven't yet obtained permission to release this (working on eventual open source release of our software, as it happens).

Can you give me any tips on how I might diagnose/debug myself?  I.e. a different way of logging what neo is doing to look for the problem as an alternative if I can't ship you a working code base?   Also any tips you might suggest about where to look.  What are the possible causes of failure here?   Why is neo even attempting to "commit" a transaction?  (What's to commit when you're only fetching data?)

I'm not manually using locks; we do use constraints.   All of our constraints look like this:
CREATE CONSTRAINT ON (node:MyLabel) ASSERT node.myID IS UNIQUE;

FWIW, before we moved from 1.9.3 -> 2.0.0, we didn't have this problem; this is recent after I started wrapping cypher queries in transactions.  In 1.9.3, I would just run the query and all was fine; I started getting exceptions after moving to 2.0.0, and advice on this list said that wrapping those read-only cypher queries in transactions was necessary, so I started doing it in the java 7 style:    try (Transaction tx = db.beginTx()) { blah(); tx.success(); }

Presently my workaround is to catch and ignore these transactions.  The data coming back from the read-only query looks correct, and so these exceptions are an annoyance more than a blocker, but I'm assuming they're symptomatic of either a bug or something I'm doing wrong that will bite me later if I ignore it.

Thanks

Jacob Hansson

unread,
Feb 12, 2014, 10:48:58 AM2/12/14
to ne...@googlegroups.com
On Mon, Feb 10, 2014 at 2:07 PM, M. David Allen <david...@oldhat.org> wrote:
Jake,

Thanks for looking into this.  I'm going to try a few extra things including going to 2.0.1.   To answer your immediate questions, the problem happens consistently when I run particular cypher queries - the one I sent is one such query.  It's consistent in that whenever I run certain methods that contain those queries, this results.    It's not consistent in that I have many other read-only cypher queries that don't display this same behavior.

I doubt I'm going to be able to remove code until I can give you something functional; our test database is populated with stuff generated by other classes.  Based on how the code is written I fear I'd have to send you at least a dozen or more classes, it would take substantial work to trim things out, and we haven't yet obtained permission to release this (working on eventual open source release of our software, as it happens).

Can you give me any tips on how I might diagnose/debug myself?  I.e. a different way of logging what neo is doing to look for the problem as an alternative if I can't ship you a working code base?   Also any tips you might suggest about where to look.  What are the possible causes of failure here?   Why is neo even attempting to "commit" a transaction?  (What's to commit when you're only fetching data?)

So Neo4j always "commits", even if read-only transactions don't end up actually writing anything. For read-only transactions, commit simply includes releasing any locks held and any resources associated with the transaction and firing transaction event handlers. 

You may find something by lowering the log level for messages.log, which you would do by putting a logback.xml file on your classpath, with appropriate flags for debug logging. Something like:

<configuration>
  <root level="debug">
  </root>
</configuration>

I think that should work, although I haven't tested it. Otherwise, refer to the logback docs for how to set up debug logging. Make sure not to put that in production, it'll make the logging incredibly chatty and will fill up disk space on production systems.
 

I'm not manually using locks; we do use constraints.   All of our constraints look like this:
CREATE CONSTRAINT ON (node:MyLabel) ASSERT node.myID IS UNIQUE;

FWIW, before we moved from 1.9.3 -> 2.0.0, we didn't have this problem; this is recent after I started wrapping cypher queries in transactions.  In 1.9.3, I would just run the query and all was fine; I started getting exceptions after moving to 2.0.0, and advice on this list said that wrapping those read-only cypher queries in transactions was necessary, so I started doing it in the java 7 style:    try (Transaction tx = db.beginTx()) { blah(); tx.success(); }

So we just found that there is an error case in 2.0.1 where previously the system would correctly throw an exception, whereas in 2.0.1 it only marks the transaction for rollback. I'm guessing this is the issue you've ran into. It's on our backlog, so hopefully it will be resolved in upcoming maintenance releases.

Something that may work in your case, for read-only transactions, is to not call tx.success(). That way the system will effectively just throw the transaction away, rather than trying to validate it. I can't swear that that will resolve your issue though, because I still haven't had time to look at the underlying cause. Do try it though, and let me know if it works (or doesn't). 

M. David Allen

unread,
Feb 12, 2014, 12:02:44 PM2/12/14
to ne...@googlegroups.com
I removed calls to tx.success() at the end of read-only queries, and the exceptions disappeared.  I then re-instated them, added level=debug to logback.xml, and I'm not really seeing much different in the way of logging messages.    So I'll do more investigation there as it looks like the logging package isn't picking up my configuration even though the file is in $CLASSPATH as specified by the documentation.

But at least I know that the exceptions go away if tx.success() is not there;  I guess I'm going to stick with my strategy of ignoring those exceptions for now, and look for the maintenance release.

David

ronge

unread,
Aug 11, 2014, 4:53:17 PM8/11/14
to ne...@googlegroups.com, david...@oldhat.org
Hi, sorry for bringing up this old thread. Are there any news regarding this bug. It would be nice to avoid a try catch around the close method in my code.
Here is an example how to reproduce it using JRuby calling directly the Java 2.1.3 Neo4j api .

# This does not work
6] pry(main)> tx = Neo4j::Transaction.new
=> #<Java::OrgNeo4jKernel::TopLevelTransaction:0x6bdaf749>
[7] pry(main)> tx2 = Neo4j::Transaction.new
=> #<Java::OrgNeo4jKernel::PlaceboTransaction:0x696cc212>
[8] pry(main)> tx2.failure
=> nil
[9] pry(main)> tx2.success
=> nil
[10] pry(main)> tx2.close
=> nil
[11] pry(main)> tx.success
=> nil
[12] pry(main)> tx.close
Java::OrgNeo4jGraphdb::TransactionFailureException: Unable to commit transaction
from org.neo4j.kernel.TopLevelTransaction.close(org/neo4j/kernel/TopLevelTransaction.java:134)

# But this works
[13] pry(main)> tx = Neo4j::Transaction.new
=> #<Java::OrgNeo4jKernel::TopLevelTransaction:0x24edefe9>
[14] pry(main)> tx2 = Neo4j::Transaction.new
=> #<Java::OrgNeo4jKernel::PlaceboTransaction:0xfa097b7>
[15] pry(main)> tx2.failure
=> nil
[16] pry(main)> tx2.close
=> nil
[17] pry(main)> tx.close
=> nil
[18] pry(main)> Neo4j::Community::VERSION
=> "2.1.3"

/Andreas

M. David Allen

unread,
Aug 11, 2014, 5:02:49 PM8/11/14
to ronge, ne...@googlegroups.com
I'm not aware of any update.

It was easy enough for me to work around via try/catch - indeed failing to commit a transaction never mattered much on read-only queries.

Since 2.1.3, I've found it more difficult to reproduce this even in my own code.  It still happens but I've been reluctant to bring it up out of inability to provide a code sample which would reliably demonstrate the bug.


Craig Taverner

unread,
Aug 12, 2014, 7:24:40 AM8/12/14
to ne...@googlegroups.com, andrea...@gmail.com, david...@oldhat.org
I've also been hit by this, or a similar problem. I found my problem was related to nested read-only transactions and the case when the inner transaction did not call succss(), but the outer one did. I wrote of a blog on this at http://blog.amanzi.org/2014/08/neo4j-read-only-transactions-still-need.html. My final conclusion is that you really must call success() on all readonly transactions, even though they are readonly.

Curiously I got the idea of playing with the presence or lack of the success call from Jakob Hansson's suggestion above to consider leaving it off. It certainly is the right method to focus on, but I think leaving it off is the wrong solution.

M. David Allen

unread,
Aug 12, 2014, 8:36:11 AM8/12/14
to Craig Taverner, ne...@googlegroups.com, Andreas Ronge
Well, thanks for that blog post.  The explanation makes sense, and it's now clear on how to fix this.

OTOH, I thought one of the benefits of the Java 7-style transaction was that you didn't have to call tx.success().    I.e., we used to write this:

try { 
     Transaction tx = graphDb.beginTx();
     someOperation();
} finally { 
     tx.success();

Now the finding with this blog post is that we now need to write this:
try (Transaction tx = graphDb.beginTx()) { 
     someOperation();
} finally { 
     tx.success();

When I think what we wanted to write was this:
try (Transaction tx = graphDb.beginTx()) { someOperation(); } 

The new-ish try/with idiom doesn't really buy us anything here except a bit of syntax sugar.   Presumably it would be better to use a finally block here in case there were many different ways you could exit the try block successfully, to avoid the duplication you mentioned in the blog post.



David

Craig Taverner

unread,
Aug 12, 2014, 9:28:15 AM8/12/14
to M. David Allen, ne...@googlegroups.com, Andreas Ronge
One option I liked was that the implied close() would also imply success for read-only transactions. I'm not sure why it is still possible to have a failed transaction when it's read-only. Perhaps something to do with HA, so we probably need to hear from the experts on this. I believe they brought back transactions for read-only code to support the subtleties of HA, so it is likely they have reasons for this too.
Reply all
Reply to author
Forward
0 new messages