Titan, Tinkerpop and Sail

984 views
Skip to first unread message

Rob Styles

unread,
Nov 7, 2012, 12:18:59 PM11/7/12
to aureliu...@googlegroups.com
Hi All,

I'm new to Titan, coming from a mostly RDF graph background.

I have an existing Java app that works with RDF and I had doped to use Sail as an interim step before moving to tinkerpop.

I've got a HDFS cluster running and have Titan running against HBase.

        Configuration conf = new BaseConfiguration();
        conf.setProperty("storage.backend", "hbase");
        conf.setProperty("storage.hostname", "cluster01.local");
        TitanGraph graph = TitanFactory.open(conf);

This seems to open up a connection just fine and I have Titan appearing in hbase.

        hbase(main):001:0> status 'titan'
        1 servers, 0 dead, 3.0000 average load

Then I try to wrap the TitanGraph in a GraphSail to get Sail RDF functionality on top of Titan/HBase

        GraphSail sail = new GraphSail(graph);

This fails with an exception as TitanGraph appears to not support sufficient indexing.

Exception in thread "main" java.lang.IllegalArgumentException: Only vertex indexing is supported
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsTransaction.createKeyIndex(TitanBlueprintsTransaction.java:140)
at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.createKeyIndex(TitanBlueprintsGraph.java:88)
at com.tinkerpop.blueprints.oupls.sail.GraphSail.createIndices(GraphSail.java:152)
at com.tinkerpop.blueprints.oupls.sail.GraphSail.<init>(GraphSail.java:120)
at com.tinkerpop.blueprints.oupls.sail.GraphSail.<init>(GraphSail.java:98)
at App.main(App.java:35)

Is it possible to overcome this?

thanks in advance

rob



Marko Rodriguez

unread,
Nov 7, 2012, 12:24:01 PM11/7/12
to aureliu...@googlegroups.com, Joshua Shinavier
Hi,

> I'm new to Titan, coming from a mostly RDF graph background.

Welcome.

> Exception in thread "main" java.lang.IllegalArgumentException: Only vertex indexing is supported
>
> Is it possible to overcome this?

No. Titan does not support edge indexing (it is simply too expensive to index edges by their labels (and/or named graph property)). As such, a faithful mapping to the Sail API is not possible with Titan.

One thing could be asked --- is it possible to support a non-faithful mapping that will throw that exception if you try and do a SPARQL query of this nature: ?x knows ?y. This could be possible, but I'm not an expert on the GraphSail implementation. Josh Shinavier is the expert (cc:d in this email). Josh: would this be possible to add this to GraphSail in Blueprints?

Thank you for your question Rob and good luck exploring the Aurelius Graph Cluster.

Marko.

http://markorodriguez.com

Rob Styles

unread,
Nov 7, 2012, 12:29:21 PM11/7/12
to aureliu...@googlegroups.com, Joshua Shinavier
Thanks for the swift reply :)

In what way is it too expensive? Storage or performance?

?x knows ?y is the kind of query I'm going to want to quite a lot of for the kind of analysis I'm doing. I'm also doing more than that such as

?person1 lived at ?address
?person2 lived at ?address
?person1 != ?person2

and so on.

I already have several of these patterns mapped out as sparql queries. Re-writing them in Gremlin would not take too long I hope.

rob



--



Marko Rodriguez

unread,
Nov 7, 2012, 12:36:10 PM11/7/12
to aureliu...@googlegroups.com, Joshua Shinavier
Hey,

For those types of queries (global pattern match), you would write them in Gremlin but for execution in Faunus. These are global graph computations and something that Titan is not designed to evaluate. Titan is optimized for short, local neighborhood graph queries (i.e. ego-centric traversals). However, with Faunus, you turn such global scans of your data into a Map/Reduce job using Hadoop.


Faunus is still in 0.1-alpha with a 0.1 release planned before years end. So... you would be bleeding edge in that respect. Is the system you currently have being used (i.e. in production)?

HTH,
Marko.

--
 
 

Matthias Broecheler

unread,
Nov 7, 2012, 12:44:09 PM11/7/12
to aureliu...@googlegroups.com

Hey Rob,
To add on to what marko said, global pattern matching without anchor vertex (ie some starting vertex in the graph) are hard to do at scale without some sort of pre computation. The reason is that you would need to maintain low selectivity indices which will break sooner or later. For those queries, faunus is the right tool. You can store your data in titan and then use faunus to execute those queries. This is scalable. If you need more real time answers, you can use faunus to pre compute a partial answer set.

HTH,
Matthias

--
 
 

Rob Styles

unread,
Nov 7, 2012, 12:48:28 PM11/7/12
to aureliu...@googlegroups.com
Thanks Matthias,

The triple stores I've worked with handle this situation by indexing POS and PSO to allow entry by predicate. That would give you the relatively low selectivity indexes.

Are you saying that doesn't scale? Many of the triple stores have problems scaling and most are scale-up solutions rather than scale-out. That's one of the reasons I'm looking at Titan to start with :)

rob



--
 
 

Matthias Broecheler

unread,
Nov 7, 2012, 12:58:07 PM11/7/12
to aureliu...@googlegroups.com

Exactly, pso and pos won't scale because p has too low selectivity. Imagine storing the census data. That's some 100 million lives-in. Whatever index you build, it will be a hot spot and cannot easily scale out.
Titan was designed for scalability from the ground up. So, these indexes had to go.
We do have plans to support global patterns at scale. these improvements have passed our prototype stage and are being published soon. however, they won't be production ready until summer of 2013 on or current roadmap.
Best,
Matthias

--
 
 

Rob Styles

unread,
Nov 7, 2012, 1:03:44 PM11/7/12
to aureliu...@googlegroups.com
OK, that makes sense, using low-selectivity to solve (or avoid) the distributed join problems.

What I'm doing is not dissimilar to census data in both volume and complexity. My volume is slightly lower, but complexity slightly higher.

The current plan is to convert data into a well-connected graph and use further graph analysis to introduce more connections. Once the data has been made into a rich graph we plan to create de-normalised document-style views in a no-sql store (elastic search) to serve the application at runtime.

It sounds like I should look at Faunus, though that may be too new for my needs (beta with customers in next few months)

rob


--
 
 

Marko Rodriguez

unread,
Nov 7, 2012, 1:24:54 PM11/7/12
to aureliu...@googlegroups.com
Hi,

It sounds like I should look at Faunus, though that may be too new for my needs (beta with customers in next few months)

Yes. I would say spring/summer 2013 is when Faunus can be used in a production setting. Right now it is great for offline data science -- analyzing your Titan graph (descriptive statistics, etc.). However, for a complete production workflow in which Titan+Faunus+(Fulgora--something else) interact seemlessly, expect smooth perfection next year.

Take care and good luck with your project,
Marko.


--
 
 

Joshua Shinavier

unread,
Nov 8, 2012, 12:04:37 AM11/8/12
to Marko Rodriguez, aureliu...@googlegroups.com
Hi guys,

The answer is no, GraphSail does not work without edge indexing.  GraphSail uses a combination of graph-based and index-based matchers to answer queries, and can be configured to rely more heavily on the one style than the other depending on the needs of the application.  However, there are two triple (quad) patterns which *require* index-based matching: ?p?? (e.g. the ?x knows ?y example Marko gave, in which ?x and ?y are unbound) and ?p?c.  Yes, in theory we could flag those patterns as unsupported if the graph doesn't support edge indices -- then there would simply be certain (unusual) queries you couldn't get an answer for.  Currently not the case, though.

Just to be totally clear, here's an example of a query which would fail on GraphSail-on-Titan:

PREFIX foaf: ...
SELECT * WHERE {
    ?x foaf:knows ?y .
    ?x foaf:name ?xname .
    ?y foaf:name ?yname .
}

So would this:

PREFIX ex: ...
SELECT * WHERE {
    ?person1 ex:livedAt ?address .
    ?person2 ex:livedAt ?address .
    FILTER(?person1 != ?person2)
}

This would succeed:

PREFIX ex: ...
SELECT * WHERE {
    ?person1 ex:livedAt <http://example.org/someAddress> .
    ?person2 ex:livedAt <http://example.org/someAddress> .
    FILTER(?person1 != ?person2)
}

Best,

Josh

Marko A. Rodriguez

unread,
Dec 27, 2012, 3:58:27 PM12/27/12
to Greg McFall, aureliu...@googlegroups.com, jo...@fortytwo.net
Hi,

What would need to happen is that GraphSail would be able to accept graphs that do NOT support edge indexing and then, when a particular query is attempted to be made that requires edge indexing, an appropriate exception is thrown:

"The underlying graph store does not support edge indexing and thus, a query of the provided form."

I don't know how easy this would be as this is Josh's territory. But if you make a ticket in Blueprints, that would be the way in which an idea is planted and ultimately, if all goes well, a manifestation is realized.

Thanks Greg,
Marko.


BTW: Do you know the song Rosa Lee McFall? 




On Dec 27, 2012, at 8:14 AM, Greg McFall <gregory...@gmail.com> wrote:

I would certainly be interested in GraphSail-on-Titan even if it fails for certain unsupported queries.
How does one request that this feature be added to the backlog?

Greg McFall

unread,
Dec 27, 2012, 4:28:51 PM12/27/12
to aureliu...@googlegroups.com, Greg McFall, jo...@fortytwo.net
I have submitted a ticket in Blueprints.

Joshua Shinavier

unread,
Jan 2, 2013, 9:14:56 PM1/2/13
to Greg McFall, aureliu...@googlegroups.com
Hi Greg,

I have made the necessary changes to GraphSail (in Blueprints master HEAD).  If you try it out with Titan, please let me know how it goes.  You will need to use this form of the 2-arg constructor:

     GraphSail(baseGraph, "");

The "" argument tells GraphSail not to index on any triple patterns via edge properties (the default value being "p,c,pc").  Now, if Titan were to support edge iteration, GraphSail-on-Titan would fully support all SPARQL queries against its store (although some queries might execute slowly).  Since Titan does not support edge iteration, if you submit a query like:

PREFIX foaf: ...
SELECT * WHERE {
    ?x foaf:knows ?y .
    ?x foaf:name ?xname .
    ?y foaf:name ?yname .
}

...the query operation will fail with a Titan-level exception.  Meanwhile, more typical queries (with known subjects and objects in all triple patterns) should succeed.

Best,

Josh

Marko A. Rodriguez

unread,
Jan 3, 2013, 11:19:03 AM1/3/13
to aureliu...@googlegroups.com, Greg McFall
Hey Josh,

Cool stuff man. If this goes well, then with the release of Blueprints 2.3.0, we can say that Titan supports OpenRDF. 

As a side: there was a paper about SPARQL + Hadoop. … Just had a passing thought of Faunus+GraphSail. :)
I think that LARC project also has done something in this vein (…was a ACM Communication paper I believe :? ).

If you ever want to pair on testing stuff with GraphSail/Titan over a multi-machine EC2 cluster, I'm around.

Marko.
--
 
 

Greg McFall

unread,
Jan 4, 2013, 11:47:23 AM1/4/13
to aureliu...@googlegroups.com, Greg McFall
Josh,
I've run into a snag using GraphSail with Titan.

I pulled blueprints-graph-sail-2.3.0-SNAPSHOT from git and had to make a couple of tweaks in order to integrate with Titan.

My version of Titan is using blueprints-core-2.2.0.  To ensure compatibility, I modified the GraphSail POM to use blueprints-core-2.2.0 as well.
This change resulted in a couple of compiler errors.  In two places, I needed to replace

((TransactionalGraph) store.graph).commit();

with 
((TransactionalGraph) store.graph).stopTransaction(Conclusion.SUCCESS); 

I also needed to comment out a number of @Override annotations.

With those changes, I was able to build GraphSail 2.3.0-SNAPSHOT.

Then I wrote the following test program...

Configuration config = new BaseConfiguration();
config.setProperty("storage.backend", "cassandra");
config.setProperty("storage.hostname", "127.0.0.1");
TitanGraph titan = TitanFactory.open(config);
Sail sail = new GraphSail<TitanGraph>(titan, "");
sail.initialize();
 

Unfortunately, this test program is failing with the following stack trace...


Jan 4, 2013 11:04:10 AM com.tinkerpop.blueprints.oupls.sail.GraphSail createTripleIndices
WARNING: no (?s p ?o ?c) index. Certain query operations will be inefficient
Jan 4, 2013 11:04:10 AM com.tinkerpop.blueprints.oupls.sail.GraphSail createTripleIndices
WARNING: no (?s ?p ?o c) index. Certain query operations will be inefficient
Exception in thread "main" java.lang.IllegalArgumentException: Only vertex indexing is supported
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsTransaction.createKeyIndex(TitanBlueprintsTransaction.java:140)
at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.createKeyIndex(TitanBlueprintsGraph.java:88)
at com.tinkerpop.blueprints.oupls.sail.GraphSail.createTripleIndices(GraphSail.java:404)
at com.tinkerpop.blueprints.oupls.sail.GraphSail.<init>(GraphSail.java:137)
at com.pearson.graph.SailMain.main(SailMain.java:24)

Do you have any idea what might be going wrong?
Is my dependency on blueprints-core 2.2.0 the culprit here?  

Regards,
Greg

Greg McFall

unread,
Jan 4, 2013, 3:02:09 PM1/4/13
to aureliu...@googlegroups.com, Greg McFall
Josh,
In the method GraphSail.createTripleIndices(String tripleIndexes), you check whether the input parameter is null.
I think you also need to check whether it has zero length, and return immediately if true...

        if (tripleIndexes.length()==0) {
        return;
        }

I made this change and got passed the first problem.  But now I am running into another problem.
I am using a SailConnector as shown in the following snippet...

SailConnection g = sail.getConnection();
ValueFactory f = sail.getValueFactory();
URI alice = f.createURI("http://example.com/alice");
URI bob = f.createURI("http://example.com/bob");
URI likes = f.createURI("http://example.com/vocab/likes");
URI status = f.createURI("http://example.com/vocab/status");
URI aliceStatus = f.createURI("http://example.com/alice/status");
g.addStatement(alice, likes, bob);
g.addStatement(alice, status, aliceStatus);
CloseableIteration<? extends Statement,SailException> sequence = g.getStatements(alice, null, null, false);
while (sequence.hasNext()) {
Statement s = sequence.next();
                        ...
}
The method call sequence.next() is producing a NullPointerException with the following stack trace...

Exception in thread "main" java.lang.NullPointerException
at com.tinkerpop.blueprints.oupls.sail.GraphSailConnection.toSesame(GraphSailConnection.java:756)
at com.tinkerpop.blueprints.oupls.sail.GraphSailConnection.fillStatement(GraphSailConnection.java:644)
at com.tinkerpop.blueprints.oupls.sail.GraphSailConnection.access$5(GraphSailConnection.java:641)
at com.tinkerpop.blueprints.oupls.sail.GraphSailConnection$StableStatementIteration.next(GraphSailConnection.java:631)
at com.tinkerpop.blueprints.oupls.sail.GraphSailConnection$StableStatementIteration.next(GraphSailConnection.java:1)
at info.aduna.iteration.IterationWrapper.next(IterationWrapper.java:71)
at com.pearson.graph.SailMain.main(SailMain.java:45)

Do you have any insight into this problem?

~ Greg

Joshua Shinavier

unread,
Apr 7, 2013, 4:16:10 AM4/7/13
to aureliu...@googlegroups.com, Greg McFall
Hi Ivan,

I didn't add that note to Wikipedia, but it is correct: you can now load RDF data into Titan and query it using SPARQL, if you use the right combo of constructor and Titan back end.  However, Titan-on-GraphSail does not pass the full suite of Sail tests (for any of the three back ends), so I'm not going to call it a triple store just yet.  That is likely to change in the coming weeks, as the necessary changes are probably not very difficult.

Note: for best results, use GraphSail 2.4.0-SNAPSHOT with Titan; I have just committed some changes which improve the behavior of the "no edge indices" GraphSail on Cassandra and BDB, e.g.

        conf.setProperty("storage.backend", "cassandra");
        conf.setProperty("storage.hostname", "127.0.0.1");
        TitanGraph g = TitanFactory.open(conf);
        GraphSail sail = new GraphSail(g, "")

GraphSail-on-Titan-on-Cassandra is mostly compliant w.r.t. the Sail tests when the default edge indices are used, only slightly less so when edge indices are disabled.  Both GraphSail-on-HBase and GraphSail-on-Titan-on-BDB are almost completely compliant with edge indices disabled, but do not work with edge indices enabled.

HTH.

Josh



On Sat, Apr 6, 2013 at 6:17 AM, Ivan Balepin <csm...@gmail.com> wrote:
Bump - now with 0.3.0 and edge indexing out, curious if Sail and SPARQL work with Titan. I keep getting "Only vertex indexing is supported" with GraphSail on Titan 0.2.1. Titan is listed as supporting SPARQL in Wikipedia.


пятница, 4 января 2013 г., 12:02:09 UTC-8 пользователь Greg McFall написал:

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Sreekanth S

unread,
Oct 13, 2017, 9:03:03 AM10/13/17
to Aurelius
Hi,

Now Titan is take off by JanusGraph, so Tinker pop GraphSail support JanusGraph? I am working with JanusGraph now. Is it possible to load RDF/OWL file directly to JanusGraph?

Please advice me about the same. Thanks in advance.
Reply all
Reply to author
Forward
0 new messages