[TinkerPop3] Union, Intersection, Difference and the future of "copySplit" -- Gremlin3 RFC

Skip to first unread message

Marko Rodriguez

unread,
Jan 30, 2014, 11:02:12 AM1/30/14
to gremli...@googlegroups.com
Hi everyone,

So, as many of your know, Gremlin2's copySplit()-step is sketchy! With the new "Holder"-model in Gremlin3, such behaviors are no longer an issue and remain semantically consistent when used within a loop() and when path() information is attempted to be retrieved from them.

Next, Daniel Kuppitz was never a big fan of the name "copySplit" and there has always been a desire to add more operations to parallel streams, so we decided to go with:

union()
intersection()
difference()

To start this RFC thread, I implemented UnionPipe:


Pretty basic stuff here … though intersect() and difference() will, unfortunately, not be so easy given that they require memory structures. To demonstrate union() in action, please see:


Gremlin.of(g).v(1).union(
    Gremlin.of().out("knows"),
    Gremlin.of().out("created").in("created")
).value("name").forEach(System.out::println);
 
//////
 
vadas
josh
josh
peter
marko

Whatever objects comes out of the step previous to union() is added to the starts of each of the internal pipes of union(). Then, these pipes are next()'d in a round robin fashion and turned into a single stream.

Watch this now, we have correct behavior around path() and notice how the path lengths are difference as the internal union() pipes have difference lengths.

Gremlin.of(g).v(1).union(
    Gremlin.of().out("knows"),
    Gremlin.of().out("created").in("created")
).value("name").path().forEach(System.out::println);

/////

[v[1], v[2], vadas]
[v[1], v[3], v[4], josh]
[v[1], v[4], josh]
[v[1], v[3], v[6], peter]
[v[1], v[3], v[1], marko]

Finally, looping now works without issue (unlike copySplit and Gremlin2):

Gremlin.of(g).v(1).as("x").union(
    Gremlin.of().out("knows"),
    Gremlin.of().out("created").in("created")
).jump("x", h -> h.getLoops() < 2).value("name").path().forEach(System.out::println);

//////

[v[1], v[3], v[4], v[3], v[4], josh]
[v[1], v[3], v[4], v[3], v[6], peter]
[v[1], v[3], v[4], v[3], v[1], marko]
[v[1], v[3], v[4], v[5], v[4], josh]
[v[1], v[4], v[3], v[4], josh]
[v[1], v[4], v[3], v[6], peter]
[v[1], v[4], v[3], v[1], marko]
[v[1], v[4], v[5], v[4], josh]
[v[1], v[3], v[6], v[3], v[4], josh]
[v[1], v[3], v[6], v[3], v[6], peter]
[v[1], v[3], v[6], v[3], v[1], marko]
[v[1], v[3], v[1], v[2], vadas]
[v[1], v[3], v[1], v[3], v[4], josh]
[v[1], v[3], v[1], v[4], josh]
[v[1], v[3], v[1], v[3], v[6], peter]
[v[1], v[3], v[1], v[3], v[1], marko]

TADA! …the new Pipes model is so much easier to work with internally. Again, for those that care, the trick to all this is Holder<>

^^ the gold of Gremlin3.

Enjoy,
Marko.


Philippe Mulet

unread,
Jan 31, 2014, 1:37:28 PM1/31/14
to gremli...@googlegroups.com
Love it. Do you still need OR step then ?
Could do union(...) and then a BACK step.

Marko Rodriguez

unread,
Jan 31, 2014, 1:40:01 PM1/31/14
to gremli...@googlegroups.com
Hi,

That is interesting. I didn't realize that. You can do:

union() + back() for OR
intersect() + back() for AND

The drawback is that back() is a path-based step and thus, computes paths during execution. For Gremlin OLTP, no biggie. For Gremlin OLAP, expensive. However, we don't know if we will be able to do internal pipelines in Gremlin OLAP as (at least right now how things are designed), it would be cRaZy.

Nice thought Mr. Mulet,
Marko.
On Jan 31, 2014, at 11:37 AM, Philippe Mulet <mulet.p...@gmail.com> wrote:

Love it. Do you still need OR step then ?
Could do union(...) and then a BACK step.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Marko Rodriguez

unread,
Feb 3, 2014, 7:26:24 PM2/3/14
to gremli...@googlegroups.com
Hello,

IntersectPipe is now complete and ready for comments. I'm not to happy with the logic right now, a bit convoluted -- would like a Stream.of() one-liner :D


Anywho, here it is practice:

"What are the vertices that some vertex in V knows AND created."

Gremlin.of(g).V().intersect(
     Gremlin.of().out("knows"),
     Gremlin.of().out("created")
).path(v -> ((Vertex) v).getValue("name")).forEach(System.out::println);

The result being, of course, those things that 'marko' knows and created as he is the only vertex in the toy graph that both knows someone and has created something.

[marko, vadas]
[marko, lop]
[marko, josh]

Now, Mr. Mullet realized that union() + back() = OR and intersect() + back() = AND. This got me to thinking --- notice how match()-pipe does AND, but returns the pre-pipeline object.

Gremlin.of(g).V().match("a", "b",
     Gremlin.of().as("a").identity().as("b"),
     Gremlin.of().as("b").out("knows"),
     Gremlin.of().as("b").out("created")
).path(v -> ((Vertex) v).getValue("name")).forEach(System.out::println);

…returning:

[marko]

Now this is an artifact of there only being 1 "endPipeline" in match(). However, we could probably do away with this restriction and simply union endPipelines in match() and thus, intersect() would be simulated as:

Gremlin.of(g).V().match("a", "b",
     Gremlin.of().as("a").out("knows").as("b"),
     Gremlin.of().as("a").out("created").as("b")
).path(v -> ((Vertex) v).getValue("name")).forEach(System.out::println);

Given that match() is currently AND only, I'm wondering how nicely we could fold all this together -- union() [OR], intersect() [AND], and match()………………boil the brain => refrain.

Enjoy!,
Marko.

Daniel Kuppitz

unread,
Feb 3, 2014, 9:05:43 PM2/3/14
to gremli...@googlegroups.com
Hey,

"What are the vertices that some vertex in V knows AND created."
Gremlin.of(g).V().intersect(
     Gremlin.of().out("knows"),
     Gremlin.of().out("created")
).path(v -> ((Vertex) v).getValue("name")).forEach(System.out::println);
 

The result being, of course, those things that 'marko' knows and created as he is the only vertex in the toy graph that both knows someone and has created something.
 
[marko, vadas]
[marko, lop]
[marko, josh]

That really makes no sense. To me that means that marko knows vadas, lop and josh and also created vadas, lop and josh, because it's an intersection of everything that marko knows and created. Your last example makes it even clearer:

Gremlin.of(g).V().match("a", "b",
     Gremlin.of().as("a").out("knows").as("b"),
     Gremlin.of().as("a").out("created").as("b")
).path(v -> ((Vertex) v).getValue("name")).forEach(System.out::println);

b clearly represents something that marko knows and created.

A better example for intersection would be: What did people create together with someone they know:

Gremlin.of(g).V().intersect(
     Gremlin.of().out("created"),
     Gremlin.of().both("knows").out("created")
).path(v -> ((Vertex) v).getValue("name")).forEach(System.out::println);

Should result in:

[marko, lop]    marko knows josh and marko and josh created lop
[josh, lop]     josh knows marko and josh and marko created lop

Cheers,
Daniel



2014-02-04 Marko Rodriguez <okram...@gmail.com>:

Hambardzum Abajyan

unread,
Feb 14, 2017, 5:19:49 PM2/14/17
to Gremlin-users
Is there any implementation now for this concepts in TinkerPop 3 ?

I could only find Union in the source code. Was this idea discontinued ?

Daniel Kuppitz

unread,
Feb 14, 2017, 5:26:02 PM2/14/17
to gremli...@googlegroups.com
intersection() and difference() were not implemented as separate steps, but all 3 operations can be done using the currently available steps.

// union
gremlin> g.V().hasLabel("person").union(has("age", lte(29)), has("age", gte(29))).valueMap()
==>[name:[marko],age:[29]]
==>[name:[marko],age:[29]]
==>[name:[vadas],age:[27]]
==>[name:[josh],age:[32]]
==>[name:[peter],age:[35]]

// intersection
gremlin> g.V().hasLabel("person").has("age", lte(29)).aggregate("x").has("age", gte(29)).where(within("x")).valueMap()
==>[name:[marko],age:[29]]

// difference
gremlin> g.V().hasLabel("person").union(has("age", lte(29)).store("x"), has("age", gte(29)).store("y")).barrier().union(where(without("x")), where(without("y"))).valueMap()
==>[name:[vadas],age:[27]]
==>[name:[josh],age:[32]]
==>[name:[peter],age:[35]]

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/6b98b0e3-8322-4a56-8cd1-90828e8ca87e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jean-Baptiste Musso

unread,
Feb 17, 2017, 7:14:50 AM2/17/17
to gremli...@googlegroups.com
This is very useful, maybe we could add it to the Gremlin recipes page?


Jean-Baptiste

Craig McClendon

unread,
Jan 2, 2018, 4:15:19 PM1/2/18
to Gremlin-users
I'm struggling to determine how to perform an intersection for a slightly more complex case. 

Using the gremlin console v 3.3.1 and the "Modern" graph.

Creating the graph with this:
gremlin>graph = TinkerFactory.createModern()
gremlin>g = graph.traversal()

I can find all the people that know "vadas" like this:
g.V().hasLabel('person').has('name', 'vadas').in('knows').hasLabel('person').valueMap()


And I can find all the people that created the software "lop" with this:
g.V().hasLabel('software').has('name', 'lop').in('created').hasLabel('person').valueMap(


I can find all the people that know "vadas" OR created "lop" with a union operation:
g.V().union(
g.V().hasLabel('person').has('name', 'vadas').in('knows').hasLabel('person'),
g.V().hasLabel('software').has('name','lop').in('created').hasLabel('person')
).dedup().valueMap()

But I can't figure out how to find all the people that know "vadas" AND created "lop". Essentially I want an INTERSECT operation (I think), but can't quite determine how to work it using aggregate() or other means. 


Robert Dale

unread,
Jan 2, 2018, 4:57:57 PM1/2/18
to gremli...@googlegroups.com
g.V().where(
        and(__.out('knows').has('name','vadas'),
            __.out('created').has('name','lop'))
).valueMap(true)

// can also be shortened without where()
g.V().and(__.out('knows').has('name','vadas'), __.out('created').has('name','lop')).valueMap(true)

// as match()
g.V().match(__.as('a').out('knows').has('name','vadas'), __.as('a').out('created').has('name','lop')).select('a').valueMap(true)





Robert Dale

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.

Stephen Mallette

unread,
Jan 3, 2018, 9:13:35 AM1/3/18
to Gremlin-users
I offered a similar answer to Robert's on SO but included an alternative that should always use indices:


Craig McClendon

unread,
Jan 3, 2018, 10:48:43 AM1/3/18
to Gremlin-users
Thanks Robert, Stephen. 
I didn't intend to cross post it. I had hit the POST button here and it didn't show up for ~20 minutes. I finally gave up and posted to SO instead. 
I appreciate your assistance, still trying to get into thinking in graphs.. 
Cheers. 

Antriksh Shah

unread,
Apr 6, 2018, 2:07:49 AM4/6/18
to Gremlin-users
Hey would union query require a scan over all vertices?

gremlin> g.V().has("name","abc")
==>v[25366056984] 

gremlin> g.V().has("name","pqr")

But when I fire:

gremlin> g.V().union(has("name","abc"), has("name","pqr"))
11:28:07 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes

Am I missing something here?

Stephen Mallette

unread,
Apr 6, 2018, 6:16:49 AM4/6/18
to Gremlin-users
JanusGraph doesn't appear to optimize union(). They would need to modify their traversal strategies to do so.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.

Antriksh Shah

unread,
Apr 6, 2018, 6:21:18 AM4/6/18
to Gremlin-users
Facing similar issue with OR query also. Is there any optimised way of performing a union or a boolean OR query?

Stephen Mallette

unread,
Apr 6, 2018, 6:27:28 AM4/6/18
to Gremlin-users
I would have thought that JanusGraph would have optimized all variations of P so maybe:

 g.V().has("name", within("abc","pqr"))

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.

Robert Dale

unread,
Apr 6, 2018, 6:59:42 AM4/6/18
to gremli...@googlegroups.com

Antriksh Shah

unread,
Apr 11, 2018, 2:02:00 AM4/11/18
to Gremlin-users
Hey Robert,

I wanted to clarify one small thing, in continuation to the previous trail.

Is there any way I can find two vertices based on two different properties?

Vertex1 name: abc
Vertex2 age=24

g.V().union(has("name","abc"), has("age","24"))
g.V().or(has("name","abc"), has("age","24"))
g.V().coalesce(has("name","abc"), has("age","24"))
all are returning me WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes


g.V().has("first_name","abc").fold().
  union(unfold(), V().has("last_name","pqr"))

The above works fine, but I wanted to know if this is the best option at the moment.
For my work, I want to implement a coalesce function g.V().coalesce(has("name","abc"), has("age","24"))

Robert Dale

unread,
Apr 11, 2018, 5:24:12 AM4/11/18
to gremli...@googlegroups.com
coalesce() is an exclusive or. It will not get you both vertices.  See also http://tinkerpop.apache.org/docs/current/reference/#coalesce-step

If you don't like fold()/unfold(), you can use inject() to start some traversals where the step is not a start.

In the case where you want both:
g.inject(1).union(V().has("name","abc"),V().has("age","24"))

Or in the case where you want one or the other:
g.inject(1).coalesce(V().has("name","abc"),V().has("age","24"))



Robert Dale

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.

Antriksh Shah

unread,
Apr 11, 2018, 6:51:00 AM4/11/18
to Gremlin-users
Thanks Robert. Worked like a charm. Didn't knew inject(1) use case before. 
Reply all
Reply to author
Forward
0 new messages