AQL graph traversal filtering (more conditional edges for vertex), intersection substitution

545 views
Skip to first unread message

Eduard Tomášek

unread,
Nov 23, 2016, 10:39:02 AM11/23/16
to ArangoDB


Hi,


i would like to ask for help. We are trying use ArangoDB in our startup and now Im little jammed.
I have a question for following schema:


I tried to get person who is musician and painter. Solution that i found is:

FOR x IN
    INTERSECTION (
        (
            FOR v,e IN ANY 'test_vertices/419620442844' 
            GRAPH 'test'
            OPTIONS {
bfs: true, 
uniqueVertices: 'global'
}
            FILTER e.label == 'instance_of'
RETURN v
        ),
        (
            FOR v,e IN ANY 'test_vertices/419620442862'
            GRAPH 'test'
            OPTIONS {
bfs: true, 
uniqueVertices: 'global'
}
            FILTER e.label == 'instance_of'
RETURN v
        )
    )
RETURN x

Using intersection is fine, but is it exist any better way? I cannot imagine if i would like all pop albums of performers who are musicians and painters.

And I have similar problem with next schema:

In this I want all rock albums of John Smith. When I used graph traversal and got neighborhood of two steps (IN 1..2) and there I could filter path in current layers (steps).

But only one condition (instance_of or genre) or using OR operator not AND. At the beginning i expected that I take neighborhood of John Smith and define acceptance rules 

for neighborhood node like has edge instance_of connected to Album and has edge genre pointing to Pop or some callback like solution where i can scan neighborhood 

and decide. Only i have now is this:


FOR p IN 1..2 ANY 'John Smith' GRAPH 'test'
FILTER
p.edges[0].label == 'performer'
AND p.edges[1].label == 'instance_of'
AND p.vertices[2].label == 'album'
RETURN p

This gives me all John Smith's albums, but I dont know how to add that I want all pop albums of John Smith. I cannot use AND operator because Im filtering path,
so it doesnt make sense and of course with AND it returns empty list.

I would like to ask for help. Maybe I did not understand how it works in ArangoDB. I used google and ArangoDB docs, but i cannot find good solution or some material where to study it.

Thanks for help.

My problem summary:

In first schema Im solving problem: all pop albums of performers who are musicians and painters

In second schem Im solving problem: all rock albums of John Smith

mic...@arangodb.com

unread,
Nov 24, 2016, 5:47:45 AM11/24/16
to ArangoDB
Hi Eduard,

regarding your first example i think it would be better to do a two step traversal instead of of the intersection.
The two step traversal should first find all people that have one of the professions, say musician and in the second step should filter those ones that are artists as well.
My solution is only assumed to be faster if your people only have few professions not hundreds or even thousands of them.


FOR person, e1 IN INBOUND 'test_vertices/419620442844' GRAPH 'test' OPTIONS {bfs: true, uniqueVertices: 'global'}
  FILTER e1.type == "instance_of"
  // here we have all musicians
  LET temp = (
    // We start a subquery to check if the person is a painter as well
    FOR prof, e2 IN OUTBOUND person GRAPH 'test' OPTIONS {bfs: true, uniqueVertices: 'global'}
      FILTER e2.type == "instance_of"
      FILTER prof._key == "419620442862" // Check if the target is the painter
      LIMIT 1 // If we find at least 1 we are good
      RETURN 1 // don't care for any result
  )
  FILTER LENGTH(temp) > 0 // Now throw away every person that is no painter
  RETURN person


In order to get the pop albums of all these artists you can continue the query with exactly the same trick:


FOR person, e1 IN INBOUND 'test_vertices/419620442844' GRAPH 'test' OPTIONS {bfs: true, uniqueVertices: 'global'}
  FILTER e1.type == "instance_of"
  // here we have all musicians
  LET temp = (
    // We start a subquery to check if the person is a painter as well
    FOR prof, e2 IN OUTBOUND person GRAPH 'test' OPTIONS {bfs: true, uniqueVertices: 'global'}
      FILTER e2.type == "instance_of"
      FILTER prof._key == "419620442862" // Check if the target is the painter
      LIMIT 1 // If we find at least 1 we are good
      RETURN 1 // don't care for any result
  )
  FILTER LENGTH(temp) > 0 // Now throw away every person that is no painter
  LET albums = (
      // Now collect a list of pop albums for each artist
      FOR album, e3 IN INBOUND person GRAPH 'test' OPTIONS {bfs: true, uniqueVertices: 'global'}
         FILTER e3.type == "performer"
         LET temp2 = (FOR genre, e4 IN OUTBOUND album GRAPH 'test' OPTIONS {bfs: true, uniqueVertices: 'global'}
           FILTER e4.type == "genre"
           FILTER genre._key == "pop" // Check that target ist pop
           LIMIT 1
           RETURN 1
        )
        FILTER LENGTH(temp2) > 0
        RETURN album
  )
  RETURN {
    artist: person,
    albums: albums
  }
    

For your second question, we can actually use the same trick again.
But now the subquery traversal needs 2 results instead of 1, we need "genre: rock" and "instance_of: album".
I would suggest the following query:

      FOR album, e1 IN INBOUND "test_vertices/john_smith" GRAPH 'test' OPTIONS {bfs: true, uniqueVertices: 'global'}
         FILTER e1.type == "performer"
         LET temp1 = (FOR other, e2 IN OUTBOUND album GRAPH 'test' OPTIONS {bfs: true, uniqueVertices: 'global'}
           FILTER (e2.type == "genre" AND other._key == "rock") // If we have a genre make sure it's rock
           OR (e2.type == "instance_of" AND other._key == "album") // If we have instance_of make sure it's album
           LIMIT 2 // We need one of each
           RETURN 1
        )
        // If not both the above conditions are fulfilled we should reject this album.
        FILTER LENGTH(temp2) == 2
        RETURN album


NOTE: This query works only if there are no duplicate edges. So if an album is tagged twice as "rock" the above query may not work. In this case you have to do two LIMIT 1 queries instead, one for each condition.


Finally i have some general hint for you to most likely improve the performance:

In the idea of ArangoDB data-modeling in most cases the "type" label of edges can be modeled by using a collection for it instead of an attribute.
Say you have one collection "instance_of" strong all edges with "{type: instance_of}".
Than in the traversal you can define that you want to traverse this collection only, instead of all edge collections of the Graph.
This is assumed to be faster (if you do not define vertex centric indexes in addition) than post filtering on type.

If that is true for your use-case has to be evaluated.

Alternative Example for the first query would look like:

FOR person IN INBOUND 'test_vertices/419620442844' instance_of OPTIONS {bfs: true, uniqueVertices: 'global'}
  // here we have all musicians
  LET temp = (
    // We start a subquery to check if the person is a painter as well
    FOR prof IN OUTBOUND person instance_of OPTIONS {bfs: true, uniqueVertices: 'global'}
      FILTER prof._key == "419620442862" // Check if the target is the painter
      LIMIT 1 // If we find at least 1 we are good
      RETURN 1 // don't care for any result
  )
  FILTER LENGTH(temp) > 0 // Now throw away every person that is no painter
  LET albums = (
      // Now collect a list of pop albums for each artist
      FOR album IN INBOUND person performer OPTIONS {bfs: true, uniqueVertices: 'global'}
         LET temp2 = (FOR genre IN OUTBOUND album genre OPTIONS {bfs: true, uniqueVertices: 'global'}
           FILTER genre._key == "pop" // Check that target ist pop
           LIMIT 1
           RETURN 1
        )
        FILTER LENGTH(temp2) > 0
        RETURN album
  )
  RETURN {
    artist: person,
    albums: albums
  }

I did just replace all `GRAPH "test"` by the name of the collection e.g: `instance_of`.

best
Michael
Reply all
Reply to author
Forward
0 new messages