[titan-0.5.0] - query by vertex label possible?

2,577 views
Skip to first unread message

Damir Vandic

unread,
Aug 15, 2014, 9:26:20 PM8/15/14
to aureliu...@googlegroups.com
Hi,

Is it possible to efficiently get vertices by vertex label? (i.e., through some index) I don't see it mentioned anywhere in the docs.

Thanks.

Damir

Matthias Broecheler

unread,
Aug 15, 2014, 9:59:57 PM8/15/14
to aureliu...@googlegroups.com

No, it is not because this will not be efficient. Imagine a database with a billion "people" - how would you efficiently retrieve those?

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/5cac4934-c5cd-4482-a28b-63f3a6535d1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Damir Vandic

unread,
Aug 16, 2014, 5:17:04 AM8/16/14
to aureliu...@googlegroups.com
Yes that makes sense, but I wanted to use it in conjunction with other constraints. For example, querying by the name attribute but only searching for Persons and not, let's say, Companies. I guess I will just use an extra "type" attribute. 

Thank you for your answer.

Kind regards,

Damir Vandic | Ontostream



Matthias Broecheler

unread,
Aug 16, 2014, 6:29:18 AM8/16/14
to aureliu...@googlegroups.com

Oh, I misunderstood. This is absolutely possible. You can define an index for the name property key and then use the indexOnly (person) call to build this index for person vertices only. Then, calls involving g.query().has("name","xyz").has("label","person") will be very fast.

Damir Vandic

unread,
Aug 16, 2014, 11:08:48 AM8/16/14
to aureliu...@googlegroups.com
Oh, indeed, that's nice! However, in the case where you have 1 billion vertices in total and of those you have 1000 Product vertices, what would you recommended approach be if you just want to fetch all product vertices? Still use a custom nodeType field? (I'm assuming that g.has("label", "Product") has to scan all 1 billion vertices)

Kind regards,

Damir Vandic | Ontostream



Matthias Broecheler

unread,
Aug 16, 2014, 12:14:00 PM8/16/14
to aureliu...@googlegroups.com

We were thinking about altering this but then thought that the likelihood for abuse is too high, I.e. people will use it on small datasets, think that it works and then notice issues when they try to scale.

If you feel strongly about this, please file an issue and we will reconsider in the future.

Damir Vandic

unread,
Aug 16, 2014, 1:25:34 PM8/16/14
to aureliu...@googlegroups.com
Sure, no problem. By the way: is this also the reason why for unselective queries/fields (low cardinality) you recommend using mixed indices?I guess you mean elastic search and not lucene, as lucene is not restricted to one machine.

Kind regards,

Damir Vandic | Ontostream



dmill

unread,
Aug 17, 2014, 3:49:06 AM8/17/14
to aureliu...@googlegroups.com
I'm also interested in this subject as I'm also looking into a similar use case as what was described by Damir. 
I will be using labels and indexOnly to properly index some properties and will also need to retrieve (in some instances) all "1000 Product" entries.

Would the correct data model for such a case (or at the very least alternative in this situation) be to create an indexed Product super node with unidirectional labeled edges towards the products? Which I guess would be akin to what we had to do before Vertex labels. It makes this slightly redundant, though this time around we have the added benefit of indexOnly(). 

Thanks.  

Matthias Broecheler

unread,
Aug 19, 2014, 2:15:59 AM8/19/14
to aureliu...@googlegroups.com

For labels, the unidirected edge goes the other way. But if you know there will only be a small number of products you can create a dedicated property or one product root vertex that points to all products.

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

Asaf Shakarzy

unread,
Aug 19, 2014, 5:19:18 AM8/19/14
to aureliu...@googlegroups.com
Guys,

Back to this,

How come query such g.query().has("label", "product").vertices() does a full graph scan ?

(I guess there is not really a "label" property on the vertex, right?), either why, how come Titan doesn't find the label vertex "product" and retrieve all products associated with it?


With 1 billion products it makes no sense, but with thousands of vertices of a label it makes more sense and I don't see why we need to pass through a graph scan,

(btw, I tried somehow to retrieve the vertices from the label vertex but since the edges are undirectional I couldn't find a way to do that either)

Thanks

dmill

unread,
Aug 21, 2014, 8:45:45 AM8/21/14
to aureliu...@googlegroups.com
Asaf : Damir and Matthias were discussing this earlier.
I'm a little confused about that discussion however. Maybe someone can clear it out? :
In the case of the aforementioned query and indexing mentioned (indexOnly(person) on name index) : g.query().has("name","xyz").has("label","person") . Does this retrieve all name:xyz + label:person set through indexing. Or does it first retrieve by name then scan all of them to find the correctly labeled ones?
Thanks in advance

Daniel Kuppitz

unread,
Aug 21, 2014, 9:53:35 AM8/21/14
to aureliu...@googlegroups.com
Hi,

Does this retrieve all name:xyz + label:person set through indexing. Or does it first retrieve by name then scan all of them to find the correctly labeled ones?

It's a composite index, so it's retrieve all name:xyz + label:person.

Cheers,
Daniel



Asaf Shakarzy

unread,
Aug 21, 2014, 10:45:48 AM8/21/14
to aureliu...@googlegroups.com


What I was trying to say is that querying by g.query('label', 'foo') causes full scan while Titan can simply retrieve those vertices by the edges that connects the label vertex,

This requires me to add another root vertex attached to all vertices of the 'foo' label which is very annoying.

(Unless I'm missing something here)

Liu Yiming

unread,
Jul 5, 2015, 10:10:11 AM7/5/15
to aureliu...@googlegroups.com
Hi, could you tell me how to achieve the same goal through Java API?

titanGraph.query().has("label","person") ?

I don't think this will work for Java. What if the vertex has a property called "label"?

Thanks,

Yiming

在 2014年8月16日星期六 UTC+8下午6:29:18,Matthias写道:

Daniel Kuppitz

unread,
Jul 5, 2015, 10:27:16 AM7/5/15
to aureliu...@googlegroups.com
What if the vertex has a property called "label"?

That's not possible, label is reserved and cannot be used as a property name.

Cheers,
Daniel


Liu Yiming

unread,
Jul 5, 2015, 10:32:40 AM7/5/15
to aureliu...@googlegroups.com
I see, thanks very much.

Yiming

在 2015年7月5日星期日 UTC+8下午10:27:16,Daniel Kuppitz写道:

Jordi Aranda

unread,
Dec 1, 2016, 10:58:42 AM12/1/16
to Aurelius
Does this still work as commented above? (i.e. does a query by label imply a full scan?). I was profiling some queries and it seems nodes can be retrieved by label efficiently:

gremlin> g.V().hasLabel('person').profile().cap(TraversalMetrics.METRICS_KEY)
16:55:30 WARN  com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx  - Query requires iterating over all vertices [(~label = person)]. For better performance, use indexes
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TitanGraphStep([~label.eq(person)])                                   25          25          23.041    99.26
  optimization                                                                                 0.033
  scan                                                                                         0.000
SideEffectCapStep([~metrics])                                          1           1           0.172     0.74
                                            >TOTAL                     -           -          23.214        -

Obviously, I have different types of nodes, not only of type 'person'. From the profile output, it seems like there are only n hits in the storage backend, being n the number of nodes of type 'person' in the graph (and not N, the total number of nodes in the graph, which is what I understand other users were suggesting).

Best,

Nikolai Grigoriev

unread,
May 18, 2017, 3:27:13 PM5/18/17
to Aurelius, daniel....@shoproach.com
In Gremlin you would use "has(T.label, "vertex_label")".
Reply all
Reply to author
Forward
0 new messages