Lowest common ancestor of two given vertices in a tree

82 views
Skip to first unread message

Jean-Baptiste Musso

unread,
Jul 25, 2016, 4:13:13 PM7/25/16
to Gremlin-users
Using TinkerPop 3.2.1, I'm trying to find the lowest common ancestor (https://en.wikipedia.org/wiki/Lowest_common_ancestor) in the following tree:

graph = TinkerGraph.open()
g = graph.traversal() 
  
g 
  .addV("name", "A").as("a")
  .addV("name", "B").as("b")
  .addV("name", "C").as("c")
  .addV("name", "D").as("d")
  .addV("name", "E").as("e")
  .addV("name", "F").as("f")
  .addV("name", "G").as("g")
  .addE("hasParent").from("a").to("b")
  .addE("hasParent").from("b").to("c")
  .addE("hasParent").from("d").to("c")
  .addE("hasParent").from("c").to("e")
  .addE("hasParent").from("e").to("f")
  .addE("hasParent").from("g").to("f")



Here's my closest attempt at trying to fetch the lowest common ancestor of the two vertices called "A" and "D":

g.V().has("name", within("A", "D"))
  .match(
    __.as("children").until(outE("hasParent").count().is(0)).repeat(out("hasParent")).emit().as("ancestors"), __.as("ancestors").path().unfold().count().as("pathLength")
  )
  .order()
    .by(select("pathLength"))
  .limit(1)
  .select("ancestors")
    .by("name")

// Result:

==> B // id: 2

// Expected:

==> C // id: 4

There's obviously something wrong here with this query and I've been unable to get anything better. I also feel that the .match() step may not be necessary here. I added extra as() steps for easier select()/debugging. Any hints would be appreciated - thanks.

Daniel Kuppitz

unread,
Jul 25, 2016, 5:00:58 PM7/25/16
to gremli...@googlegroups.com
It's really easy if you only need the common ancestor for 2 vertices:

g.V().has("name","A").
  repeat(out("hasParent")).emit().as("x").
  repeat(__.in("hasParent")).emit(has("name","D")).
  select("x").limit(1)

Complexity increases dramatically for more than 2 vertices. Let me know if you need a solution for that case too.

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/04e2c14a-26de-4d2a-b7f4-1817f6270cf0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Kuppitz

unread,
Jul 25, 2016, 5:34:32 PM7/25/16
to gremli...@googlegroups.com
For more than 2 vertices this will work (in OLTP):

input = ["A","B","D"]
g.V().has("name", input.head()).
  repeat(out("hasParent")).emit().as("x").
  V().has("name", within(input.tail())).
  repeat(out("hasParent")).emit(where(eq("x"))).
  group().by(select("x")).by(path().count(local).fold()).
  unfold().filter(select(values).count(local).is(input.tail().size())).
  order().by(select(values).unfold().sum()).
  select(keys).limit(1)

Cheers,
Daniel

Jean-Baptiste Musso

unread,
Jul 25, 2016, 6:06:14 PM7/25/16
to gremli...@googlegroups.com
Thanks Daniel.

Regarding that second emit()in the first query, I didn't realize that emit() could take a predicate even though it's in the documentation. However, the difference between .emit(has("name","D")) and .emit().has("name","D") is still unclear to me since the results appear to be the same for each pair of vertices I tried with. Could you please explain the differences, if any?

I'll study carefully that second query for more than 2 vertices - that's some heavy stuff, ah!

Jean-Baptiste

Daniel Kuppitz

unread,
Jul 25, 2016, 6:38:40 PM7/25/16
to gremli...@googlegroups.com
 the difference between .emit(has("name","D")) and .emit().has("name","D") is still unclear to me

The results are the same, but emit(<boolean>) eliminates invalid paths earlier. That's especially important in OLAP as it makes a difference whether you collect a handful of valid paths or a ton of invalid paths just to filter them at the next step.

Cheers,
Daniel


Reply all
Reply to author
Forward
0 new messages