node.hasProperty("foo") causes NotInTransactionException? (2.0.0)

74 views
Skip to first unread message

M. David Allen

unread,
Dec 31, 2013, 9:53:51 AM12/31/13
to ne...@googlegroups.com
I'm in the middle of retrofitting neo4j code that was running on 1.9.3 to 2.0.0.  This is all happening under windows, jdk1.7.0_45, and eclipse.

 I expected some teething problems and cut-overs; here's one I've run into:

org.neo4j.graphdb.NotInTransactionException
    at org.neo4j.kernel.impl.persistence.PersistenceManager.getResource(PersistenceManager.java:214)
    at org.neo4j.kernel.impl.persistence.PersistenceManager.currentKernelTransaction(PersistenceManager.java:84)
    at org.neo4j.kernel.impl.core.ThreadToStatementContextBridge.transaction(ThreadToStatementContextBridge.java:53)
    at org.neo4j.kernel.impl.core.ThreadToStatementContextBridge.instance(ThreadToStatementContextBridge.java:47)
    at org.neo4j.kernel.impl.core.NodeProxy.hasProperty(NodeProxy.java:346)
    at blah.blah.mypackage.MyStorage.isMyObjectNode(Neo4JStorage.java:219)
    at blah.blah.mypackage.MyFactory.newObject(Neo4JPLUSObjectFactory.java:190)
    at blah.blah.mypackage.MyFactory.listWorkflows(Neo4JStorage.java:933)

I can make it happy by wrapping the methods below in a transaction, but I'm pretty sure that's not the right thing to do.

Here's the relevant method:

    public static boolean isMyObjectNode(Node n) {
        return n != null && n.hasProperty(PROP_PLUSOBJECT_ID) && n.hasProperty(PROP_TYPE) && n.hasProperty(PROP_SUBTYPE);
    }

(Context: all of those PROP_TYPE, etc are all public static final String)

The "newObject" method in my package is just creating one of my domain objects out of a node.  In order to do that, it needs to check which kind of node it's dealing with.   I'd rather not paste that code because it's involved and includes many other subclasses that makes the rabbit hole deeper.  Suffice to say I'm sure that it's just creating my domain objects, and not modifying anything in the graph.

The "listWorkflows" method looks like this:

    public static List<MyWorkflow> listWorkflows(User user, int maxReturn) throws MyException {
        if(maxReturn <= 0 || maxReturn > 1000) maxReturn = 100;
       
        String query = "start n=node:node_auto_index(type=\"" + MyWorkflow.TYPE_WORKFLOW + "\") " +
                       "where has(n.oid) " +
                       "return n " +
                       "limit " + maxReturn;
       
        Iterator<Node> ns = Neo4JStorage.execute(query).columnAs("n");
        ArrayList<MyWorkflow> wfs = new ArrayList<MyWorkflow>();
       
        while(ns.hasNext()) {
            MyObject o = MyFactory.newObject(ns.next());
            if(o.isWorkflow()) wfs.add((MyWorkflow)o);
            else {
                log.warning("Returned non-workflow " + o + " from workflow query!");
            }
        }
       
        return wfs;
    } // End listWorkflows

Indeed this code (and its caller) doesn't happen within a transaction, but it shouldn't need to -- no modification is going on here.

Lastly -- yes, I know about labels, and how labels would probably be a better way to do this.  I'm planning on exploiting that, but the first thing to do is to get the current code base working on the new release, and I've got other fish to fry (like cypher query syntax updates to get rid of "?") before I get to using labels.

Thanks!

Michael Hunger

unread,
Dec 31, 2013, 10:07:08 AM12/31/13
to ne...@googlegroups.com
Hi

Read operation on the database and nodes and relationships now require transactions

Probably most sensible to wrap the tx around the entry point to your service layer.

HTH

Michael

Sent from mobile device
--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

M. David Allen

unread,
Dec 31, 2013, 10:12:37 AM12/31/13
to ne...@googlegroups.com
Just curious, what's the thinking behind requiring transactions for reads?

Also, some documentation updates might be appropriate.  For example:

http://api.neo4j.org/current/org/neo4j/graphdb/NotInTransactionException.html

public class NotInTransactionException
extends RuntimeException
Thrown when attempting to modify the graph outside of a transaction.

Finally, I noticed that with 2.0, apparently the code wants you to use try/with, instead of try/finally, e.g.
try ( Transaction tx = graphDb.beginTx() ) { blah(); } 

I'm also curious about that decision, since it requires I choose between looking
at deprecation warnings on my use of tx.finish() or trying to upgrade all my
developer's IDEs to java7 code style compliance.

Thanks

Johannes Mockenhaupt

unread,
Jan 2, 2014, 5:41:44 PM1/2/14
to ne...@googlegroups.com
On 12/31/2013 04:12 PM, M. David Allen wrote:
> Just curious, what's the thinking behind requiring transactions for reads?

Since nobody else answered yet I'll take a stab at it :-)

As far as I know the reason for requiring transaction for read
operations is optimization: fetched resources can be released after the
transactional context has been closed. A nice side-effect is
consistency: your read operations will be unaffected by write
transactions that are going on in parallel.

> Also, some documentation updates might be appropriate. For example:
>
> http://api.neo4j.org/current/org/neo4j/graphdb/NotInTransactionException.html
>
> public class NotInTransactionException
> extends RuntimeException
<http://docs.oracle.com/javase/7/docs/api/java/lang/RuntimeException.html?is-external=true>
>
> Thrown when attempting to modify the graph outside of a transaction.
>

Thanks for catching that, the docs have been updated but that spot has
been missed. I have opened a pull request for an update of that JavaDoc
which has already been merged.

> Finally, I noticed that with 2.0, apparently the code wants you to use
> try/with, instead of try/finally, e.g.
>
> |try ( Transaction tx = graphDb.beginTx() ) { blah(); }
>
> I'm also curious about that decision, since it requires I choose between looking
> at deprecation warnings on my use of tx.finish() or trying to upgrade all my
> developer's IDEs to java7 code style compliance.

I like the reduction of redundant code this change brings, but then
again, I don't have a migrate a (big) code base too it ;-)

Johannes

> On Tuesday, December 31, 2013 10:07:08 AM UTC-5, Michael Hunger wrote:
>
> Hi
>
> Read operation on the database and nodes and relationships now
> require transactions
>
> Probably most sensible to wrap the tx around the entry point to your
> service layer.
>
> HTH
>
> Michael
>
> Sent from mobile device
>
> Am 31.12.2013 um 15:53 schrieb "M. David Allen" <allen....@gmail.com
> <javascript:>>:
>> send an email to neo4j+un...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out
>> <https://groups.google.com/groups/opt_out>.

Michael Hunger

unread,
Jan 2, 2014, 7:36:25 PM1/2/14
to ne...@googlegroups.com
Thanks a lot Johannes for chiming in.

Adding transactions for reads is mostly about creating the foundations for
- repeatable reads,
- better resource management
- allowing support for more isolation levels
- better automatic locking

The move from try - finally to try - with and AutoClosable was something that came after Java7 was required, just a easier to use pattern.

Currently in Neo4j the approach is fail by default, other tx-systems use succeed by default and offer something like tx.setRollbackOnly() to mark for failure.

According to the core team this is by intent but unfortunately makes it easy to miss the necessary tx.success() call. :(


so the current pattern required is:

try (Transaction tx = db.beginTx()) {
do stuff
tx.success();
}


HTH

Mcihael

M. David Allen

unread,
Jan 3, 2014, 8:40:09 AM1/3/14
to ne...@googlegroups.com
Thanks to both of you for those responses.  They totally make sense.

Like many people, I'm riding the learning curve with graph databases.  I've been doing RDBMS work forever, and only working with neo4j for about a year and a half, and there's still so many things I don't know yet.

RE: the use of transactions, in the RDBMS world it's kind of a secret that all reads are wrapped in transactions, but usually the programmer is ignorant of that; they can just issue SELECT queries and get data back.   The DBMS might do that inside of a transaction for all the reasons you provide, but the user doesn't know it's happening.  Users do transactions more for when they want to do things incrementally, be able to roll-back if something in the middle fails, and so on.

So I won't take issue with the rationale, but perhaps some of this can be hidden from the users over time.  With all of the benefits for read-only operations that you cite, and with transactions being required everywhere, perhaps there are opportunities where the API could do this for me.    If I do have to do it everywhere (reads and writes) then while the Java 7 pattern reduces the amount of code I'm writing, I'm still writing a lot of boilerplate try/with blocks, and trying to remember to put tx.success() before every exit point of the block (e.g. when my try block has 3 different spots where it returns something from the method).

David

Michael Hunger

unread,
Jan 3, 2014, 9:12:06 AM1/3/14
to ne...@googlegroups.com
Right,

the same happens in Neo4j if you use cypher (comparable to SQL).

But if you use the java-API you kind are below the query language level (somewhere where you would never get in a rdbms) and there tx-management is manual.
In general we want to move away from the Java-API to a more query language based usage for all environments then you will have the same behavior as with RDBMs. When cypher is fast enough we'll get there :)

The only leak in this whole thing is when you use cypher from the java-API and return db-entities (like nodes, rels, paths) and want to access them later on. Then you are kind of between the two worlds.

I agree with you that the current transaction pattern is not really optimal, but I doubt it will change. So perhaps it makes sense to manage the tx at your service level entry-point with a single entry / exit and let the lower level methods just participate in the already running transaction.


HTH

Michael

M. David Allen

unread,
Jan 3, 2014, 10:36:16 AM1/3/14
to ne...@googlegroups.com
I get what you're saying with "being between worlds", and why that requires you to use transactions.

General database nerd stuff ahead...correct me if I'm wrong on this, but...

Cypher is a declarative language, where you specify what you want, but you don't get to specify how the engine goes about fetching it.  Declarative languages are awesome, because they free programmers from having to think about how the storage is done.  The declarative property is why SQL defeated CODASYL (https://en.wikipedia.org/wiki/CODASYL).   If you're not working with a declarative language, you're probably doing imperative query.  CODASYL was that, and I think that working with neo4j API objects, and writing your own traversers qualifies for imperative status.  OTOH, imperative query is awesome because while it sticks you with having to understand more about the storage, you can tweak access and "hand optimize", which is regrettably sometimes necessary with graphs (but not generally a good idea).

So....overall, IMHO moving more towards Cypher over time is absolutely the right thing to do, and maybe solves some of this transaction stuff in the bargain.   On the other hand, getting rid of imperative query altogether, for a graph database, probably isn't going to happen because there are too many performance sensitive situations with graph traversal (i.e. things which would be extremely complex for cypher execution, but which can be coded imperatively with smart assumptions about the domain graph, which effectively greatly reduce the complexity of running a certain algorithm - as an example, algorithms for computing all-pairs distances).   I'm not sure I'd want to get rid of imperative query for graphs...but I certainly would want to get to the point where everybody who is new is strongly discouraged from going that route unless they know what they're doing.

OK, so we've got these two worlds (imperative and declarative) for getting to data.  Maybe we can't ever get rid of imperative.  But an architectural goal that might be worth pursuing is trying to find a smart way of separating them so you can't mix.  Notice that in a SQL ResultSet object, you never get a "Row" object, or a "Cell"  object, you generally just get primitive types.  Perhaps over the long run it's a good thing to separate these by ensuring the Cypher API doesn't return API objects subject to further imperative query; instead Cypher could just return metadata and primitive types, which is ultimately all any database stores.   Just a thought.

What do you think?

David

Michael Hunger

unread,
Jan 3, 2014, 10:40:17 AM1/3/14
to ne...@googlegroups.com
Fully agree with you. In both regards. Cypher will get smarter and faster over time but you will always be able to beat it with hand written assembler (aehm SPI code) for certain use-cases (the 10% or 5% case hopefully).

The only way I think where you get them to "not-mix" is to never return database objects from a cypher query, just primitive values, collections and maps. Like it currently is in the http API.

Ad Astra,

Michael

M. David Allen

unread,
Jan 3, 2014, 10:51:29 AM1/3/14
to ne...@googlegroups.com
> The only way I think where you get them to "not-mix" is to never return database objects from a cypher query, just primitive values, collections and maps. Like it currently is in the http API.

That's exactly what I'm suggesting.  It'd be a pretty big change, but that's the way to do it, and how other systems do it I think.  Return only metadata and primitives.

Anyway, good discussion.

David
Reply all
Reply to author
Forward
0 new messages