Preventing duplicate edges ?

493 views
Skip to first unread message

Jérémy Berthet

unread,
Oct 21, 2014, 11:49:18 AM10/21/14
to aran...@googlegroups.com
Hello,

I'm trying to play a bit with the concept of graph database. I have a simple use case to implement. This is not a real project, just something for testing how everything goes one. I'm tracing contact (let says phone call) between people and I want to log the date of each contact. I was thinking of a vertices collection called "person" and a unoriented relation definition "in_contact" beetween 2 persons.

It is pretty easy to create an edge beetween 2 persons with the date for the first contact. But what to do when a second contact beetween the 2 same persons is happening ? I wanted to update the already existing edge data for adding the new date, but I couldn't figure out a simple way to find the id of an already existing edge given 2 persons id.

Did I miss something ? Maybe my way to think is wrong and I should create a new edge on each contact with the date ?

The final goal, in that use case, would be then to weight and report relations between people by the number and frequency of contacts.

Thanks for any answer,
Jérémy


Michael Hackstein

unread,
Oct 22, 2014, 5:29:02 AM10/22/14
to aran...@googlegroups.com
Hi Jérémy,

there are several conceptional approaches you could follow for your use case.
I will try to explain all of them and give some hints on their pros & cons:

1) Create one edge per contact:
* Simply store a new edge (_from, _to, details) for edge contact.
+ Will be fastest write performance.
+ You do not have to take care of writing conflicts if several instances are storing these edges.
+ You can directly see how many calls a person has made in total (to any other person) (Length of connected edges)
- Edge collection grows rather fast
- You have to write an AQL Filter to find all contacts between person A and B.

2) Create one edge per person-pair:
* Store one (directed) edge per person pair and add a counter and all other relevant values.
+ Stored amount of data is minimal
+ You can directly see how many other persons one person has contacted (Length of connected edges)
+ You can directly see how many contacts a person has to a specific other person on one edge.
+ Graph-queries should perform faster as their runtime is typically bound by the amount of edges per vertex.
- Requires a transaction for each write
- Reduces write performance in case of many concurrent writers.


I would always prefer version 2) above version 1) except one specific case, you have a large amount of concurrent writers of edges.

To update the edge you can for example use the AQL statement:

FOR contact in in_contact
  FILTER contact._from == @from AND contact._to @to
  LET oldCount = contact.count
  LET dates = contact.dates
  UPDATE contact WITH  {count: oldCount + 1, dates: UNION(dates, [@newDate]) } in in_contact

and replace the bindVars @from, @to, @newDate accordingly.
This will then be transactional and can be used concurrently.

hope that helps

best
Michael


--
You received this message because you are subscribed to the Google Groups "ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arangodb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-----------------------------------

Michael Hackstein

 

Mobil:   +49-176-44410782
Fax:     +49-221-2722999-88

 

triagens GmbH
Hohenstaufenring 43-45
50674 Köln

 

Sitz der Gesellschaft: Köln
Registergericht Köln; HRB 53597

 

Geschäftsführung:
Dr. Frank Celler
Martin Schönert
Claudius Weinberger

 

 

Diese e-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese e-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese e-Mail. Wir haben alle verkehrsüblichen Maßnahmen unternommen, um das Risiko der Verbreitung virenbefallener Software oder e-Mails zu minimieren, dennoch raten wir Ihnen, Ihre eigenen Virenkontrollen auf alle Anhänge an dieser e-Mail durchzuführen. Wir schließen außer für den Fall von Vorsatz oder grober Fahrlässigkeit die Haftung für jeglichen Verlust oder Schäden durch virenbefallene Software oder e-Mails aus.

 

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. We have taken precautions to minimize the risk of transmitting software viruses but nevertheless advise you to carry out your own virus checks on any attachment of this message. We accept no liability for loss or damage caused by software.

Jérémy Berthet

unread,
Oct 27, 2014, 8:30:26 AM10/27/14
to aran...@googlegroups.com
Hello Michael,

Thank you for your complete answer and suggestions. I've also thought about a 3rd way, similar to the 2nd where I would use deterministic keys for the edge (like the concatenation of person1_id and person2_id) to maybe simplify the needed AQL a bit.

Anyway, I was hoping that the graph module of Arango would offer a standard function to find the edges existing between 2 vertices. Look like it isn't the case.

Regards,
Jérémy

Michael Hackstein

unread,
Oct 27, 2014, 8:43:59 AM10/27/14
to aran...@googlegroups.com
Hi Jérémy,

i will add that as a feature request.
I think you are right and it is needed quite often.
I will keep you updated here.

About the 3rd way, i think it is a good approach but there are some things to consider:
* You can only use _key of your persons as "/" is not allowed within the key.
* You have to add a spaceing symbol between both _key's which is not in your _key values (if you do not set key's yourself in the vertices you can go with "-" f.e.)
* If you are using several vertex collections the _key is probably not unique. {_id: "v/1", _key:"1"}, {_id:"v2/1", _key:"1"} can coexist in different collections, but key is equal.

So in the list is nothing hard to implement in your logic, it is just to avoid going into the trap ;)

best
Michael

Michael Hackstein

unread,
Oct 27, 2014, 9:01:46 AM10/27/14
to aran...@googlegroups.com
Hi Jérémy,

i was just thinking about the features of the graph_module:
What is possible is the following (assume graph is from the general-graph module):

    var edges = graph._edges({_from: "v/1", _to:"v/2"}).toArray();

This will give you the list of all edges between v/1 and v/2.
You can use this as an alternative.

However if you want to modify this edge the set of actions is not transactional by default:

  var edge = graph._edges({_from: "v/1", _to:"v/2"}).next();
  // Some other thread might modify edge in between here!
  graph.edges.update(edge._key, {count: edge.count: + 1});

So you would have to use the transaction module in this case (or make sure that there cannot by any concurrency)

You can also make use of the graph_module in AQL:

FOR contact in GRAPH_EDGES("myGraph", @from, { "direction": "outbound", edgeExamples: {"_to": @to}});

  LET oldCount = contact.count
  LET dates = contact.dates
  UPDATE contact WITH  {count: oldCount + 1, dates: UNION(dates, [@newDate]) } in in_contact

This is now transactional.

best
Michael


Am 27.10.2014 um 12:30 schrieb Jérémy Berthet <grape...@gmail.com>:

Jérémy Berthet

unread,
Oct 27, 2014, 10:06:25 AM10/27/14
to aran...@googlegroups.com
Hello,

Thank you for those complements.

I will make some tests from this point and see what fit the best.

Since it is not a real project, I don't have due date, but only need to find some spare time to play around a bit :)

Regards,
Jérémy

prakash Pandey

unread,
Sep 5, 2016, 6:24:06 AM9/5/16
to ArangoDB
Hi Michael,

I am facing the same problem. You was talking about the feature to add.

I am currently using 3.x.x.

Is the feature available in this version?? 

Jan

unread,
Sep 5, 2016, 6:45:34 AM9/5/16
to ArangoDB
Hi,

since 3.0 it is possible to create a unique index on the `_from` and `_to` attributes.
This will make each combination of `_from` and `_to` in the edge collection unique.

    db._createEdgeCollection("relations");
    db.relations.ensureIndex({ type: "hash", fields: ["_from", "_to"], unique: true });

Inserting edges works as usual:

    db.relations.insert({ _from: "test/1", _to: "test/2" });
    {
      "_id" : "relations/321436929",
      "_key" : "321436929",
      "_rev" : "321436929"
    }

    db.relations.insert({ _from: "test/2", _to: "test/2" });
    {
      "_id" : "relations/321436964",
      "_key" : "321436964",
      "_rev" : "321436964"
    }

But inserting an edge with the same `_from` and `_to` attribute values as an already existing edge will now fail:

    db.relations.insert({ _from: "test/1", _to: "test/2" });
    JavaScript exception: ArangoError 1210: unique constraint violated
    !db.relations.insert({ _from: "test/1", _to: "test/2" });
    !     ^
    stacktrace: ArangoError: unique constraint violated
        at Error (native)
        at <shell command>:1:6

Best regards
Jan

prakash Pandey

unread,
Sep 5, 2016, 12:24:17 PM9/5/16
to aran...@googlegroups.com
Dear Michael,

Please guide me (or please show me direction) to solve this problem using Arangodb (if possible using ArangoDb)?


I have 2 documents A and B
and Edge E

A,B : Vertex
E : Edge

what I want is every time A(some user) does perform some event and goes to vertex B
I don't want to create new Relations if relation already exist between them.

i.e if relationship already exist between A and B for some event, then I simply want to increase the counter of Edge E, without creating new Edges.

how can I achieve This?

Any help is highly appreciated..!

Thanks..!

--
You received this message because you are subscribed to a topic in the Google Groups "ArangoDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/arangodb/UDJF0Wv50Rw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to arangodb+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages