Transaction design.

Jatin

unread,

May 22, 2012, 5:57:03 AM5/22/12

to Neo4j

Hello,

There is a small design issue I am stuck up with. It's kind of crazy
but will reduce my computations multifold times.
I am implementing improved version of Yen's K-Shortest path algorithm
(https://estudogeral.sib.uc.pt/bitstream/10316/7763/1/obra.pdf)

Now there is a small issue here. Inside the algorithm, in say
computing top 4 shortest path, what the algorithm does is that it say
when computing the 2nd shortest path; It removes arcs of the 1st
shortest path from the graph. And so removes the 2nd while computing
3rd shortest path and so on. And later in the end, it adds all the
removed arcs.

Now the issue here is that when one thread is computing a kth shortest
path, other threads cannot use the algorithm as the graph is not in
its complete shape.

Here is the crazy part which I am unsure of if at all its a good
practice:
I will have the code inside a transaction, but not let the transaction
succeed. i.e. I will implement the algorithm and remove the respective
arcs in the process, when I have the desired output, I will call the
transaction a failure. hence the arcs are never removed and the same
instance of the algorithm can be used. Further there will be no need
to later add those arcs as in reality they were never removed.

But is it a good design to use transactions in such a way?

Regards

Mattias Persson

unread,

May 22, 2012, 6:34:42 AM5/22/12

to ne...@googlegroups.com

2012/5/22 Jatin <puri...@gmail.com>

It's certainly possible and will work as you'd expect. The only problem I see here is that removing relationships will grab write locks which will be held throughout the transaction and prevent any other shortest path algo to remove those relationships as long as the transaction is alive.

I'm a bit torn to these kinds of transactions as they are useful, but often (I'm not saying this case in particular) points to a shortcoming in the application having to use transactions this way. IMHO since the only real issue with this is that write locks are grabbed and held, but never necessary (since it will not commit) it can easily be mitigated if neo4j was able to start an uncommittable transaction which cannot be committed (tx.success() will throw exception perhaps) and no write locks would ever be grabbed in such transactions.

--
Mattias Persson, [mat...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com

Jatin

unread,

May 22, 2012, 7:36:09 AM5/22/12

to Neo4j

Interesting. Well yes that would have solved the problem.

Another way would be again (this one pretty sure is a wrong design) is
instead of removing them, have a flag in the relationship for a
particular par of nodes. Flag will be cleared once the computation is
over. But when running many instances in parallel there shall be many
such key value pairs in the relationships.

Now I do not know the preffered one. As in the second one there will
be some computation is clearing the flags. And little latency
(negligible) in accessing key value pairs of relationship's. Earlier I
estimated this cost to be higher but guess due to write locks there
will be unpredicted behavior in terms of performance.

On May 22, 3:34 pm, Mattias Persson <matt...@neotechnology.com> wrote:
> 2012/5/22 Jatin <purija...@gmail.com>

> Mattias Persson, [matt...@neotechnology.com]
> Hacker, Neo Technologywww.neotechnology.com

Mattias Persson

unread,

May 22, 2012, 8:18:57 AM5/22/12

to ne...@googlegroups.com

2012/5/22 Jatin <puri...@gmail.com>

Interesting. Well yes that would have solved the problem.

Another way would be again (this one pretty sure is a wrong design) is
instead of removing them, have a flag in the relationship for a
particular par of nodes. Flag will be cleared once the computation is
over. But when running many instances in parallel there shall be many
such key value pairs in the relationships.

That's fine because they will be local to the transaction, so the amount of concurrent traversals running doesn't affect anything, other than (again) the write locks being held.

Now I do not know the preffered one. As in the second one there will
be some computation is clearing the flags. And little latency
(negligible) in accessing key value pairs of relationship's. Earlier I
estimated this cost to be higher but guess due to write locks there
will be unpredicted behavior in terms of performance.

You wouldn't need to clear the flags, they will undone when the transaction rolls back. The two solutions are very similar in execution/performance and is merely a matter of taste imho.

Jatin

unread,

May 22, 2012, 11:48:38 AM5/22/12

to Neo4j

Thanks. Clear now :)

Just as a matter of interest. Am I right below:
In a transaction, whenever there is a write on a particular
relationship, the write lock is holded only for that relationship.
Other transactions are still free to write on other nodes and
relationships but will have to wait to write anything on this
relationship.

> Mattias Persson, [matt...@neotechnology.com]
> Hacker, Neo Technologywww.neotechnology.com

Mattias Persson

unread,

May 22, 2012, 12:33:25 PM5/22/12

to ne...@googlegroups.com

There is actually a difference in the two solutions. If putting a flag on the relationships only the relationships themselves are locked, whereas when deleting relationships start/end nodes are also locked.

2012/5/22 Jatin <puri...@gmail.com>

Thanks. Clear now :)

Just as a matter of interest. Am I right below:
In a transaction, whenever there is a write on a particular
relationship, the write lock is holded only for that relationship.
Other transactions are still free to write on other nodes and
relationships but will have to wait to write anything on this
relationship.

Absolutely correct. Except for creating/deleting relationships, where its start/end node are also locked. Other nodes/relationships aren't locked.

Craig Taverner

unread,

May 23, 2012, 6:13:15 AM5/23/12

to ne...@googlegroups.com

I'm curious about what I see as another crucial point here, and that is memory usage. If the algorithms will work with very large graphs, I see the uncommitted transaction as using up a lot of memory, or am I wrong? I always assumed that since many threads can open many transactions in parallel, all of them must be keeping the changelog in memory and the only reason for taking write locks is to avoid a kind of merge problem at commit time.

Anyway, if I'm right and the transaction is in memory anyway, then it seems that re-writing the algorithm in memory is another solution that would work without playing with transactions or keeping write locks. Since the final state of the graph should be the same as the initial, it is ideal to never modify the graph, IMHO.

If memory is an issue and the algorithm should run on large graphs, then I like the property/flag on relationship idea, but commit the transactions regularly. This would prevent the same algorithm running in parallel, but other code (that does not care about the flag) would be able to run.

Jatin

unread,

May 23, 2012, 1:25:15 PM5/23/12

to Neo4j

> I'm curious about what I see as another crucial point here, and that is
> memory usage. If the algorithms will work with very large graphs,

Yes true. Infact that is the problem with GraphAlgoFactory.dijkstra
profiled well here (http://groups.google.com/group/neo4j/browse_thread/
thread/749051d07568d6ec/8ab658a0100caf10)

> If memory is an issue and the algorithm should run on large graphs, then I
> like the property/flag on relationship idea, but commit the transactions
> regularly. This would prevent the same algorithm running in parallel, but
> other code (that does not care about the flag) would be able to run.

This needs to be profiled as many clock cycles will be spent in
clearing those flags. So its this added computation vs the latency
caused due to long transactions.
And because I talking about k-shortest path, if k is large then there
can be many flags to clear,

> On Tue, May 22, 2012 at 6:33 PM, Mattias Persson

> <matt...@neotechnology.com>wrote:

>
>
>
>
>
>
>
> > There is actually a difference in the two solutions. If putting a flag on
> > the relationships only the relationships themselves are locked, whereas
> > when deleting relationships start/end nodes are also locked.
>

> > 2012/5/22 Jatin <purija...@gmail.com>

> > Mattias Persson, [matt...@neotechnology.com]
> > Hacker, Neo Technology
> >www.neotechnology.com

Reply all

Reply to author

Forward