Titan gem

73 views
Skip to first unread message

Ilya Kardailsky

unread,
Mar 13, 2014, 3:18:55 PM3/13/14
to pacer...@googlegroups.com
Hey guys,

I've been playing around with Pacer recently, it's a great library. Good to see people talking about using it with Titan. I managed to get Titan 0.4.2 to run in my ruby app in a reasonably stable manner after reading Mark's post about his experiences, here is an updated pacer-titan gem that you can load up and use without too much trouble. I've put it up on rubygems too. It requires the latest bleeding-edge Pacer for the Blueprints 2.4.0 compatibility with Titan.

It's my first gem so it's a bit rough around the edges and I've had some issues, sorry about the wall of text:

I ran into the same problem Mark described with class loading when starting 'embedded' storage and index backends. I ended up not including any of the backend classes in the gem's pom, the user can load extra jars and dependencies for whatever backend they choose to use for now, this works well using jbundler. Cassandra thrift works fine if you keep titan-cassandra in the gem's jar, but Elasticsearch crashes so I've kept that out of the manifest too in favour of including it using jbundler.

Any ideas about improving class loading in a gem, and should I include all backend support jars anyway?

I used the monkey patches for enabling pacer indexing mentioned in an issue on github. Index lookups via Pacer are still a bit slower than using Titan's index query methods via its blueprints_graph then converting results to a Pacer route - by a factor of 10 on my testing (10 milliseconds instead of 1 millisecond mind you). Hitting an external index causes a full graph scan however.

I've added a couple helper methods to do index lookups using Titan's native GraphCentricQueryBuilder and convert to a Pacer route: g.query{ has('property', 'value').has('text', Text::CONTAINS, 'value') }.out...

Darrick do you think it's worth me trying to feed Titan's methods into your index route wrappers (which I'm still trying to figure out) or do you think it will start working once you fix the index registering issue with Pacer? 

Cheers!
Ilya

Darrick Wiebe

unread,
Mar 13, 2014, 4:42:15 PM3/13/14
to Pacer Group
Hi Ilya,

Thanks for your work on pacer-titan, it sounds great! I'll have a look at the code and get back to you on your questions as soon as I can, probably within the next few days.

Cheers!
Darrick


--
You received this message because you are subscribed to the Google Groups "pacer-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pacer-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ilya Kardailsky

unread,
Mar 15, 2014, 4:54:22 PM3/15/14
to pacer...@googlegroups.com
Thanks Darrick! 

I've been studying more of your code and the Pipes framework. Please disregard the above questions on the index lookup monkey patches, it seems Pacer is correctly building indexed routes for Titan. 

It seems that the query() method is available in Blueprints too, with its own GraphQueryPipe. I've ended up making a similar filter pipe in my pacer-titan gem for index lookups (sort of like your Lucene implementation in pacer-neo4j) with some good performance gains: 

Querying Titan's standard exact-matching index of the SNOMED-CT medical terminology database, I found routing through your index pipes in JRuby introduced quite a bit more latency:

irb(main):011:0> Benchmark.measure { g.v(concept_id: 103571007) }

=> @real=0.015000104904174805


irb(main):012:0> Benchmark.measure { g.query{ has('concept_id', 103571007) } }

=> @real=0.0009999275207519531


After changing index_query to use my GraphQuery pipe, times were much more comparable:


irb(main):027:0> Benchmark.measure { g.v(concept_id: 103571007) }

=> @real=0.003999948501586914


Given that this would let us do property filters, best index selection, etc all in the pure Java blueprints, do you think it would be a good idea to refactor Pacer to use this?

Ilya

Darrick Wiebe

unread,
Mar 19, 2014, 2:49:02 PM3/19/14
to Pacer Group
Hey Ilya,

Sorry for the slow response!

I actually did temporarily implement using GraphQueryPipe in Pacer a few weeks ago, but had to revert it when it turned out that it has the strange behaviour of including the source graph as the first element in its paths. Working around that turned out to be such a pain in the ass that I decided to just go back to using my own pipes instead.

It's interesting to know the actual numbers for the difference between using a native Java pipe vs. one written in Ruby. An option that I've had in mind since day 1 but never ultimately needed to spend the time on is to reimplement my Pacer-specific pipes in Java. If you are interested in contributing to that, it would be easy to set up the Pacer project to build its own jar. Then we could have either a reimplementation of the GraphQueryPipe with Pacer's path semantics, or if we're lucky we could accomplish that by simple subclassing. Most of the other Pacer pipes could be moved from pure Ruby to Java for some general speedups as well, I'd expect.

On another topic, it would be interesting to start to incorporate vertex queries into Pacer. That's another thing that I've had in mind for a long time but have not found time to pursue, especially since I haven't been using a graphdb that would actually benefit much (if at all) from them. If you've got many super nodes in your Titan data, I'd expect that you'd see even better performance improvements from that change than from using the GraphQueryPipe.

Thanks again for the great work on pacer-titan!

Darrick

Ilya Kardailsky

unread,
Mar 21, 2014, 9:55:39 PM3/21/14
to pacer...@googlegroups.com
Thanks Darrick,

Yes, I might play around with using Titan's MultiVertexQuery in a few pipes at some point and report my results.

Ilya
Reply all
Reply to author
Forward
0 new messages