[Blueprints] RFC: Added methods to GraphQuery and VertexQuery

Marko Rodriguez

unread,

May 22, 2013, 5:08:40 PM5/22/13

to gremli...@googlegroups.com

Hi,

*** EXPERIMENTAL FEATURE -- NO RELEASE DATE -- REQUEST FOR COMMENT ***

I have been working with Luca (of OrientDB) the last couple of days and we have added some more methods to GraphQuery and VertexQuery that provide more complex semantics around querying a graph/vertex according to the underlying support graph-centric and vertex-centric indices. I would like to get people's thoughts on these methods. The work is currently in the queryfeature/ branch available here:

https://github.com/tinkerpop/blueprints/blob/queryfeature/blueprints-core/src/main/java/com/tinkerpop/blueprints/Query.java

Here are the new methods:

has(key) = "return those elements that have a property with the provided key."

hasNot(key) = "return those elements that do not have a property with provided key."

has(key,values…) = "return those elements that have their property equal to either of the var arg values."

has(key, compare, values…) = "return those elements that have their property comparable to either of the var arg values."

limit(skip, total) = "skip the first X elements and then return a total of Y number of elements."

My personal notes:

1. I've noticed has(key,NOT_EQUAL,null) in numerous client code bases and this is bad as we don't want to make "null" be a token for "emptiness."

- hence the inclusion of has(key) and hasNot(key).

- also, with var args in has(), "null" as an Object becomes ambiguous.

2. While developing Faunus, has(key,values…) was needed to make ORing efficient in MapReduce. Its generally pleasing as it moves us away from filter{ it.value == x | it.values == y || …}.

- we now have AND/OR semantics in querying -- has()-chaining is AND and has(values…) is OR.

- these method signatures, while not "query" will see their way through to Gremlin as Gremlin may compile such chains into respective queries as needed.

3. OrientDB's SKIP/LIMIT functionality is very handy for paging results and for those databases that support non-iterative "jumping," this will be efficient.

Note that all databases will not support all these efficiently and, at minimum, these will be implemented with linear scans. Like always, Blueprints provides default implementations for those databases that don't have anything more intelligent than what can be accomplished via the Blueprints-specific methods. Finally, these default implementations can be extended as needed (e.g. OrientGraphQuery/OrientVertexQuery) to weave in their specific optimizations that implement the semantics in an efficient manner as dictated by the database.

https://github.com/tinkerpop/blueprints/blob/queryfeature/blueprints-core/src/main/java/com/tinkerpop/blueprints/util/DefaultGraphQuery.java

https://github.com/tinkerpop/blueprints/blob/queryfeature/blueprints-core/src/main/java/com/tinkerpop/blueprints/util/DefaultVertexQuery.java

Thank you,

Marko.

http://markorodriguez.com

Daniel Kuppitz

unread,

May 22, 2013, 5:25:10 PM5/22/13

to gremli...@googlegroups.com

Hi Marko,

has(key, compare, values…) = "return those elements that have their property comparable to either of the var arg values."

Useful example for this one? Looks like YAGNI to me.

Cheers,

Daniel

Marko Rodriguez

unread,

May 22, 2013, 5:54:43 PM5/22/13

to gremli...@googlegroups.com

Hi,

has(key, compare, values…) = "return those elements that have their property comparable to either of the var arg values."

Useful example for this one?

"Give me all people that are at least 30 years old and who's location is either USA, Germany, or Italy."

g.query().has('location','usa','germany','italy').has('age',GREATER_THAN,30)

In Gremlin, that is (will be) compiled down to via:

g.V.has('location','usa','germany','italy').has('age',T.gt,30)

has()-chaining = AND

has(var args) = OR

Looks like YAGNI to me.

What is YAGNI?

Thoughts?,

Marko.

http://thinkaurelius.com

Daniel Kuppitz

unread,

May 22, 2013, 6:39:21 PM5/22/13

to gremli...@googlegroups.com

g.V.has('location','usa','germany','italy').has('age',T.gt,30)

Hm, that's this one: has(key,values…) and has(key, compare, value), right?

I mean this one: has(key, compare, values…) - why multiple values?

What is YAGNI?

YAGNI: You aren't gonna need it. Or: We don't have a use case for this, but maybe someone else will have one.

And I have one more suggestion for limit(skip, total). I prefer the .NET/LINQ way: .skip(x).take(y)

The problem with limit(x, y) is, that I always need to ask myself: "What was limit(x, y)? Skip x and take y or take from x to y?" Splitting into skip and take has also the advantage that you have a short form for limit(0, x) and limit(x, ?Integer.MAX_VALUE?).

Cheers,

Daniel

Marko Rodriguez

unread,

May 22, 2013, 6:54:16 PM5/22/13

to gremli...@googlegroups.com

Hi,

Hm, that's this one: has(key,values…) and has(key, compare, value), right?
I mean this one: has(key, compare, values…) - why multiple values?

That is a very good point. However, it runs into the situation of:

has("country",NOT_EQUAL,"usa","germany","italy")

"All people not in USA, Germany, Italy." Though this will be difficult for indices to use as its hard to say "NOT" with an index call.

I would be more than happy to remove has(key,compares,values…) and only have var args on has(key,values…) (i.e. EQUALS).

WDYT?

Good thoughts Daniel,

Marko.

http://thinkaurelius.com

Marko Rodriguez

unread,

May 22, 2013, 7:14:08 PM5/22/13

to gremli...@googlegroups.com

Hi,

I just removed the var args on has(key,compare,values…).

One thing we could do is merge has(key) into has(key, values…). Likewise for hasNot(key).

Meaning:

has(key) is equivalent to providing no values… . Thus, if no values, it assumes, "does the property exist?"

hasNot(key) is equivalent to providing no values… . Thus, if no values, it assumes, "does the property not exist?"

I'm a bit anti-hasNot, but again, its something I see creeping up a lot in people's codebase:

filter{!it.getPropertyKeys().contains('location'))

Thoughts?,

Marko.

http://markorodriguez.com

Marko Rodriguez

unread,

May 22, 2013, 7:30:36 PM5/22/13

to gremli...@googlegroups.com

Hi Daniel (everyone),

Check this out now:

https://github.com/tinkerpop/blueprints/blob/queryfeature/blueprints-core/src/main/java/com/tinkerpop/blueprints/Query.java

There is now:

has(key, values…)

hasNot(key, values…)

has(key, compare, value)

If no values… are provided then its a wildcard -- does the property exist or not?

As Daniel mentioned, the has(key,compare,values…) is ?YAGNI? as comparators (besides == and !=) are "sweeping" in that you will never likely multi-compare. has() provided EQUALS semantics, but not NOT_EQUAL. Thus, hasNot() is provided which also now (along with has(key,values…)) gives you key-existence checking.

Finally, this bleeds perfectly into Gremlin's API (and, for var args, Faunus' Gremlin implementation).

Thoughts?,

Marko.

http://markorodriguez.com

Daniel Kuppitz

unread,

May 22, 2013, 7:45:07 PM5/22/13

to gremli...@googlegroups.com

Great, just as I would have suggested it in my next mail.

What do you think about limit vs. skip/take?

Cheers,

Daniel

Marko Rodriguez

unread,

May 22, 2013, 7:54:53 PM5/22/13

to gremli...@googlegroups.com

Hi,

Great, just as I would have suggested it in my next mail.

Cool. This makes me happy. I think these are nice additions and will make Blueprints ?2.4.0? all the prettier.

If you provide me your @author signature I can add you to the interface /** **/ JavaDoc for your contribution. E.g.

/**

* @author Marko A. Rodriguez (http://markorodriguez.com)

*/

What do you think about limit vs. skip/take?

I don't like it as two methods --- skip(x).take(y). Why?

g.query().has('name','marko').skip(10).has('age',T.gt,22).take(4)

With two methods, and Query being a fluent-interface, you can get into weird behaviors as such -- the two methods that work together being split by another method in the chain.

I did do this --- I changed the Query API to have the variables named as such:

limit(long take)

limit(long skip, long take)

Thoughts?,

Marko.

http://thinkaurelius.com

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Daniel Kuppitz

unread,

May 22, 2013, 8:21:45 PM5/22/13

to gremli...@googlegroups.com

Hey,

Cool. This makes me happy. I think these are nice additions and will make Blueprints ?2.4.0? all the prettier.

If you provide me your @author signature I can add you to the interface /** **/ JavaDoc for your contribution. E.g.

Nice. Simply @author Daniel Kuppitz <daniel....@shoproach.com>

I don't like it as two methods --- skip(x).take(y). Why?

g.query().has('name','marko').skip(10).has('age',T.gt,22).take(4)

With two methods, and Query being a fluent-interface, you can get into weird behaviors as such -- the two methods that work together being split by another method in the chain.

Ok, you're right. It'll probably add more problems than it solves.

I did do this --- I changed the Query API to have the variables named as such:

limit(long take)
limit(long skip, long take)

It doesn't help in the Gremlin REPL (and hopefully I never need to do something in Java :)),, but it's absolutely ok. Thanks for taking my suggestions into account.

Cheers,

Daniel

Daniel Kuppitz

unread,

May 23, 2013, 4:26:00 AM5/23/13

to gremli...@googlegroups.com

Hey Marko,

it's often good to sleep over it.

There's currently on more case besides equality and inequality comparison where someone could need multiple args: CONTAINS

// all documents containing the words aurelius and titan

g.query().has('type','document').has('text', CONTAINS, 'aurelius', 'titan')...

However, I don't know anything about future plans for fulltext integration, but IMO has is not made for fulltext queries. A simple fulltext query implementation should also give me the option to query over multiple fields/properties (that's at least what I usually do with fulltext queries), something like:

// find all products where either the title, description or sku contains foo or bar

g.query().match(['title','description','sku'], 'foo', 'bar')

Cheers,

Daniel

Am Donnerstag, 23. Mai 2013 01:30:36 UTC+2 schrieb Marko A. Rodriguez:

Daniel Kuppitz

unread,

May 23, 2013, 7:02:24 AM5/23/13

to gremli...@googlegroups.com

Nevermind. There's no CONTAINS and/or fulltext search in blueprints. I've mixed up a few things here.

Cheers,

Daniel

Marko Rodriguez

unread,

May 23, 2013, 9:26:57 AM5/23/13

to gremli...@googlegroups.com

Hi,

Ultimately, we would like to get such comparators as:

CONTAINS

WITHIN

…

…into Blueprints. A simple way would be to add them to the Compare enum in Query.

Regarding match(). That is nice. However, then you need within(). Then you need ????… You start to bloat out the Query methods and you then need to worry about being exhaustive.

Marko.

http://markorodriguez.com

Daniel Kuppitz

unread,

May 23, 2013, 9:41:29 AM5/23/13

to gremli...@googlegroups.com

Don't you think that fulltext queries are the only special case? I mean location queries can still be done with the default has (has('location', WITHIN|OUTSIDE, Rectangle|Polygon)). IMO only fulltext queries need a special handling, because they are so complex and often used with multiple properties (actually I never did a fulltext search over a single field/property).