How to abort a transaction with sessions? (.NET)

241 views
Skip to first unread message

Armin Bashizade

unread,
Apr 21, 2020, 5:21:34 PM4/21/20
to Gremlin-users
I'm developing an API in .NET to create and query data on an AWS Neptune instance.
I want to validate new data before committing a write, for example, new data must not create a cycle in the graph. I'm trying to use sessions to have control over the transaction, and create a transaction with these steps:
  1. start a session
  2. create vertices and edges
  3. run cycle detection

    1. if no cycle close session
    2. else abort
How can I close the session without committing the transaction (step 4.2 above)?
AWS Neptune will abort a transaction after a session has been left open for more than 10 minutes (https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-sessions.html), but there has to be a better way.

Also, sessions where added to .NET driver just recently (https://github.com/apache/tinkerpop/pull/1263), when is that expected to be released?

Stephen Mallette

unread,
Apr 22, 2020, 7:56:23 AM4/22/20
to gremli...@googlegroups.com
I'm not sure how Neptune handles or allows for such things so I can't answer that part of your question. I'm curious about your use case though. Could you say more about how you plan to validate that there are no cycles in your graph? Are you just doing that validation among the newly created vertices (i.e. newly created subgraph)?

> Also, sessions where added to .NET driver just recently (https://github.com/apache/tinkerpop/pull/1263), when is that expected to be released?

We don't have a definitive release date as of yet - I would probably throw out the end of May as a rough target as that would be about 4 months since our last release and we'd generally discussed releasing on that sort of cycle for 2020.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/860506f5-5a6a-43cb-9760-8bd3561c45f1%40googlegroups.com.

Armin Bashizade

unread,
Apr 22, 2020, 4:36:23 PM4/22/20
to Gremlin-users
Does Gremlin server allow for closing sessions without committing?

About our use case, the graph consists of multiple directed acyclic subgraphs, which are not connected to each other (like a Polyforest but not exactly), and each subgraph has less than 500 nodes. The plan is to run a cycle detection traversal on newly created vertices:

g.V(newVerticesIds).as('a').repeat(both().simplePath()).emit(loops().is(gt(1))).
           both
().where(eq(
'a')).path().
           dedup
().by(unfold().order().by(id).dedup().fold())

To unsubscribe from this group and stop receiving emails from it, send an email to gremli...@googlegroups.com.

Stephen Mallette

unread,
Apr 23, 2020, 7:26:50 AM4/23/20
to gremli...@googlegroups.com
>  Does Gremlin server allow for closing sessions without committing?

Yes. It will execute a rollback() in that case of whatever is left uncommitted. Of course, Neptune, like other remote Gremlin providers, may not abide by those semantics for reasons specific to their implementation. There is a lot to consider with sessions, perhaps more than what is written in our reference documentation on the topic[1] and we tend to say that it's best to avoid them when possible. They serve a fairly narrow use case which usually falls to special tools like gremlify or DataStax Studio, but I see what you're doing in your case and why sessions seem to fit there.

> the graph consists of multiple directed acyclic subgraphs, which are not connected to each other (like a Polyforest but not exactly), and each subgraph has less than 500 nodes. The plan is to run a cycle detection traversal on newly created vertices:

Here's a few ideas more specific to your case:

1.  With a long timeout, I would imagine you could try to fire off one giant Gremlin statement that in one request would create the subgraph, validate it, then drop on failure. I'd imagine that might work given that subgraphs are disconnected. It might be a something to experiment with because it would get you out of sessions, allow use of bytecode instead of scripts, and move you into a more portable way to build your application. I'd imagine that the challenges/downsides like:
  a. One of the challenges with transactions is that different graphs have different transactional semantics and these differences tend to be even more exposed for remote graph providers and distributed graphs. I'm not sure how Neptune would behave but some graphs might make some or all of that graph readable to other transactions during your validation phase. I'm not sure if that matters to you or not.
  b. The size of the Gremlin traversal might have some effect on performance depending on the platform. DS Graph might complain about step length given the default configuration. A pure Gremlin Server implementation might see some spikes in memory usage or if you fire enough of these types of long run requests at it might overly consume resources. You might mitigate some of this concern with dynamic use of addV() and addE() in your Gremlin and submit a List of Map data as discussed in many posts here on the ML and stackoverflow - I've written about one aspect of such an approach here[2]
2. You could encode a flag into your schema to mark the elements of the subgraph as valid or not. Load the subgraph however you like and include a "valid" property on all the vertices setting it "false". In your application, when you query for particular subgraphs just be sure to check if elements are valid or not to filter them out.  Now you can do the validation step at your leisure and have options on how to go about doing it. You could just issue your cycle detection query  real-time or in batch. If this is a high transaction environment where you will be loading millions of these little subgraphs, you could batch validate with Spark. The downside is just maintaining the "valid" property. I'd suggest building a simple DSL[3] for your application that automatically injects the has('valid',true) into your Gremlin for you. That way you won't have to pepper your Gremlin with extra has() everywhere...it will just be implied. A custom TraversalStrategy might be better but that doesn't work well with most remote Gremlin providers like Neptune.
3. I know you'd said that you use .NET but if you used Java you could pre-validate your data. Load your subgraph to TinkerGraph then run cycle detection on that. If it passes, write the data to Neptune. I really like this approach because it will be really really fast given the in-memory processing you will get for the cycle detection step, has none of the downsides mentioned above and is completely portable to any TinkerPop-enabled system.




To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/9edd434f-0c3d-4610-986a-9bfda9b28ee4%40googlegroups.com.

Armin Bashizade

unread,
Apr 23, 2020, 6:04:02 PM4/23/20
to Gremlin-users
Yes. It will execute a rollback() in that case of whatever is left uncommitted.

My bad on wording the question, I meant is there a step (currently in .NET driver or coming to it in future) that I can use to either abort the transaction explicitly then close the session, or as an alternative make the transaction fail intentionally? Just to make sure we're on the same page I'm looking for something to rollback in the if statement below:

var sessionId = Guid.NewGuid().ToString();
using(var gremlinClient = new GremlinClient(new GremlinServer(endpointport), sessionId: sessionId))
{
    var remoteConnectionnew DriverRemoteConnection(gremlinClient);
    var g = AnonymousTraversalSource.Traversal().WithRemote(remoteConnection);
    // add vertices
    // add/remove edges
    var createsCycle = await g.V().As("a").Repeat(__.Both().SimplePath()).Emit(__.Loops().Is(P.Gt(1))).Both().Where(P.Eq("a")).Promise(t => t.HasNext());
    if (createsCycle)
    {
        // abort or rollback
    }
}


Of course, Neptune, like other remote Gremlin providers, may not abide by those semantics for reasons specific to their implementation.

Neptune supports transactions with sessions as per their doc:
The client starts a session transaction when it is initialized. All the queries that you run during the session form are committed only when you call client.close( ). Again, if a single query fails, or if you don't close the connection within the maximum session lifetime that Neptune supports, the session transaction fails, and all the queries in it are rolled back.

It says you can close the session and that will commit, but I was hoping there would be a step to rollback before closing.


Thanks for the suggestions, I wish there was a TinkerGraph for other languages, as well.
Since Neptune guarantees no dirty-reads, so 1.a is not a problem. 1.b might become an issue, because Neptune does not support sideEffect step and variables, still I don't think my use case would create very large traversals. I wanted to go with 2 but it's like managing the transaction myself and I prefer to leave that to the database if possible.
So I'd rather to go with 1, if what I'm asking above could not be done.

To simplify the problem, let's say it's okay if the new vertices remain in the graph, however the edges must roll back, if cycle was found.
I need to clarify that requests to API are not necessarily edge creation, they could also include edge removal.

I tried to come up with a traversal that would modify->validate->rollback/commit and this is what I thought should work:
g.V(newEdgeSrcId0).addE(newEdgeLabel0).to(V(newEdgeDstId0)).as('newEdge')
    ...
    .V(newEdgeSrcIdN).addE(newEdgeLabelN).to(V(newEdgeDstIdN)).as('newEdge')
    .coalesce(
        V().outE(edgeIdsList).drop(),
        // look for cycles
        V(newVerticesIds).as('a')
            .repeat(both().simplePath())
            .emit(loops().is(gt(1)))
            .both().where(eq('a')).path()
            .dedup().by(unfold().order().by(id).dedup().fold())
            .as('cycle')
        .where(count().is(gt(0)))
        // rollback and return cycle
        .V(deleteEdgeSrcId0).addE(deleteEdgeLabel0).to(V(deleteEdgeDstId0))
        ...
        .V(deleteEdgeSrcIdN).addE(deleteEdgeLabelN).to(V(deleteEdgeDstIdN))
        .coalesce(
            V().outE().hasId(select(all, 'newEdge').unfold().id()).drop(),
            select('cycle')
        )
    )

I tried the query for one edge creation, neither returned the cycle nor dropped the edge:
gremlin> g = TinkerFactory.createModern().traversal()

// break the existing cycle by removing edge 1->4
gremlin> g.E('8').drop()
gremlin> g.V('1').addE('test').to(V('4')).as('newEdge')
            .V('1', '4').as('a').repeat(both().simplePath()).emit(loops().is(gt(1)))
            .both().where(eq('a')).path()
            .dedup().by(unfold().order().by(id).dedup().fold()).as('cycle')
            .where(count().is(gt(0)))
            .coalesce(V().outE().hasId(select(all, 'newEdge').unfold().id()).drop(),
                select('cycle')))

just to see if I can get the cycle, which again returns nothing:
// reload graph and break existing cycle
gremlin> g.V('1').addE('test').to(V('4')).as('newEdge')
            .V('1', '4').as('a').repeat(both().simplePath()).emit(loops().is(gt(1)))
            .both().where(eq('a')).path()
            .dedup().by(unfold().order().by(id).dedup().fold())

but if I run the cycle detection part afterwards, it will give the cycle:
gremlin> g.V('1', '4').as('a').repeat(both().simplePath()).emit(loops().is(gt(1)))
           .both().where(eq('a')).path()
           .dedup().by(unfold().order().by(id).dedup().fold())
==>[v[1],v[4],v[3],v[1]]

even dropping the created edge in the same traversal didn't work:
gremlin> g.V('1').addE('test').to(V('4')).as('newEdge').V().outE().hasId(select(all, 'newEdge').unfold().id()).drop()

gremlin> g.E()
==>e[7][1-knows->2]
==>e[8][1-knows->4]
==>e[9][1-created->3]
==>e[10][4-created->5]
==>e[11][4-created->3]
==>e[12][6-created->3]
==>e[13][1-test->4]

do you have any hints for fixing this?

Stephen Mallette

unread,
Apr 24, 2020, 2:06:34 PM4/24/20
to gremli...@googlegroups.com
The solution ended up being pretty simple, but I thought it due some explanation so I wrote the answer as a blog post:


The short answer is that you need to restrict the left-hand side of the path being evaluated by simplePath() to start at the step labelled "a" .

gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.E(8).drop()
gremlin> g.V(1).addE('test').to(V(4)).
......1>   V(1, 4).as('a').
......2>   repeat(both().simplePath().from('a')).
......3>     emit(loops().is(gt(1))).
......4>   both().
......5>   where(eq('a')).
......6>   path().
......7>   dedup().
......8>     by(unfold().order().by(id).dedup().fold())
==>[v[1],e[13][1-test->4],v[1],v[4],v[3],v[1]]

> 1.b might become an issue, because Neptune does not support sideEffect step and variables, still I don't think my use case would create very large traversals.

I thought Neptune supported this style of dynamic object creation:


so that might help depending on what you're doing. Perhaps it will help inspire some more concise and efficient Gremlin. I didn't realize Neptune still didn't have sideEffect(). You might be able to use some other steps to work your way around that hopefully.



To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/1d3df740-e221-4c00-821a-bd175b1a5ffc%40googlegroups.com.

Kelvin Lawrence

unread,
Apr 24, 2020, 2:43:57 PM4/24/20
to Gremlin-users
Hi all, 

Sorry I am late to the party on this thread. Been one of those weeks!

Neptune supports sideEffect(), not sure what was meant by "with variables" but I use it all the time in my Gremlin code with Neptune.

Currently, to abort a Neptune Gremlin session you would need to force an exception by doing something like trying to add a Vertex with an ID that already exists. Hopefully this will be improved in a future release but for now that would be a way to force a rollback.

Cheers
Kelvin

Armin Bashizade

unread,
Apr 28, 2020, 9:25:12 AM4/28/20
to Gremlin-users
You're right Kelvin, I was looking at Unsupported Gremlin Methods in Neptune docs, and just read the function name. However the unsupported sideEffect function is this one:
org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal.sideEffect(java.util.function.Consumer)

Thanks Stephen for the blog post, I also realized I can use from-step on path to skip the added edges.
Reply all
Reply to author
Forward
0 new messages