Remove Duplicate Edges in Azure Cosmos DB

468 views
Skip to first unread message

Matthew Landowski

unread,
Oct 2, 2019, 6:56:54 PM10/2/19
to Gremlin-users
Preface my skill level is novice with the Gremlin API.

I ran into an issue when creating edges that ended up creating duplicate edges the from V and to V are the same as is the label.

However the API isn't complete for Cosmos DB and filter isn't implemented.

gremlin> :> g.v().outE().as('e').limit(10).where(inV().has('type', 'Criteria'))
==>[id:9dc58a44-67d2-4d13-81bf-a959bbffce54,label:has,type:edge,inVLabel:Document,outVLabel:Test,inV:93c1e4b7-b154-79b3-fd30-c14ffec810ed,outV:02487654]
==>[id:d8c3cec8-8759-48b8-857a-593349215849,label:has,type:edge,inVLabel:Document,outVLabel:Test,inV:50411853-6231-5404-7d0c-c55f2c613d70,outV:03673332]
==>[id:c3dffeff-051e-4afa-9554-2add0154ef5c,label:has,type:edge,inVLabel:Document,outVLabel:Test,inV:9638f0e3-ec7f-bb20-03fe-4dceadb97744,outV:03685383]
==>[id:a7523870-b0f0-4f39-afcb-23911f447807,label:has,type:edge,inVLabel:Document,outVLabel:Test,inV:abac8f2d-703c-68e9-754a-5978230f80c3,outV:03774888] # duplicate
==>[id:bb70e9ff-1a4b-44d9-bc27-af1be8f6247e,label:has,type:edge,inVLabel:Document,outVLabel:Test,inV:abac8f2d-703c-68e9-754a-5978230f80c3,outV:03774888] # duplicate
==>[id:2337ed63-4d37-4a56-ad65-b206c061aa2e,label:has,type:edge,inVLabel:Document,outVLabel:Test,inV:a4fbbdbb-0928-8156-0c19-db2bb56cfcd2,outV:03631225]
==>[id:562325f4-c44e-4417-adfc-c836af875424,label:has,type:edge,inVLabel:Document,outVLabel:Test,inV:46f9796a-40b1-ae6c-32e8-6d2689c81333,outV:03505463]
==>[id:c2ba3b1c-1ed8-4947-97bc-64b40b7eda9b,label:has,type:edge,inVLabel:Document,outVLabel:Test,inV:12f67ba3-b36e-5cc9-7259-9d60aedb94a6,outV:03777813]
==>[id:872ae1c9-15d5-4eb0-84cf-d4c97710b347,label:has,type:edge,inVLabel:Document,outVLabel:Test,inV:12f67ba3-b36e-5cc9-7259-9d60aedb94a6,outV:03777813]
==>[id:86812ed9-aec4-4062-b455-e243a8b749cd,label:has,type:edge,inVLabel:Document,outVLabel:Test,inV:fb284663-dc3c-2e18-54a5-fa042eefea1f,outV:03637868]

List of Gremlin steps available: https://docs.microsoft.com/en-us/azure/cosmos-db/gremlin-support

Thanks,
Matt

Amit Chandak

unread,
Oct 2, 2019, 8:48:22 PM10/2/19
to Gremlin-users

Kfir Dadosh

unread,
Oct 3, 2019, 5:55:23 AM10/3/19
to Gremlin-users
I'm not sure to which example you are referring, but you can just replace "filter" with "where":
https://stackoverflow.com/a/57761342/1140697

Kfir

Matthew Landowski

unread,
Oct 3, 2019, 10:26:00 AM10/3/19
to Gremlin-users
Thanks for the tip. Not quite there, but getting closer.

gremlin> :> g.V().as("ov").outE().as("e1").inV().as("iv").inE().where(neq("e1")).where(outV().as("ov")).as("e2").where(select("e1","e2").by(label).where("e1", eq("e2"))).drop()
ActivityId : f7711623-46db-4c2d-8ff4-bb8a174f322a
ExceptionType : OperationCanceledException
ExceptionMessage :
        The operation was canceled.
Source : mscorlib
        CancellationToken : 27681391
        HResult : 0x8013153B
 - try increasing the timeout with the :remote command
Type ':help' or ':h' for help.
Display stack trace? [yN]

Oliver Towers

unread,
Oct 3, 2019, 11:46:56 AM10/3/19
to gremli...@googlegroups.com
The traversal will require a large amount of intermediate state/processing to complete which likely the reason it is timing out.

In addition, drop() operations can slow down execution.

Can you first try returning the count of edges that need to be dropped and see if that completes before hitting the timeout?

g.V().as("ov").outE().as("e1").inV().as("iv").inE().where(neq("e1")).where(outV().as("ov")).as("e2").where(select("e1","e2").by(label).where("e1", eq("e2"))).count()

If this returns you may be able to retry the original traversal with drop() until it succeeds. CosmosDB Gremlin traversal executuon is not transactional, so it can partial complete the drop() before timing out.

If the traversal stil hits the error, then I'd suggest breaking the work into separate requests to reduce the duration. For example, filtering the initial set of vertices and edges to a subgraphs based on labels can avoid processing the entire graph in one request:

 g.V().hasLabel('groupA').as("ov").outE('GroupAToGroupB').as("e1").inV().as("iv").inE().where(neq("e1")).where(outV().as("ov")).as("e2").where(select("e1","e2").by(label).where("e1", eq("e2"))).drop()
 
Similar approach can be applied if you know a property you can filter.


Oliver


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/5c00bad9-0ab3-42c4-a2d6-591f55a144cb%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages