Any plans to offer Spark as well or instead of Hadoop for Titan?

488 views
Skip to first unread message

Michael Woytowitz

unread,
Jul 10, 2014, 12:45:47 PM7/10/14
to aureliu...@googlegroups.com
Hi Titan team,

I just sat through a Cassandra DataStax webinar.  They are now providing an integrated Spark based analytics solution for Cassandra.  The webinare presented significant performance benefits of Spark over traditional Hadoop.

Their approach is to have some of the nodes in the Cassandra cluster dedicated to analytics while others are transactional with the standard eventual consistency mechanism to update the analytics nodes.
We plan to use a similar approach for reporting in our Titan cluster.

Any plans to add Spark based analytics or replace Titan-Hadoop with Titan-Spark?

http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/analytics#navtop

thank you in advance for your reply

Josh Mahonin

unread,
Jul 10, 2014, 1:04:52 PM7/10/14
to aureliu...@googlegroups.com
Hi Michael,

I know in Titan 0.5 there are TitanInputFormat and TitanOutputFormat classes provided which, in theory, Spark can use to create an RDD as well as persist back to Titan.

I've attempted myself, although ran into some tricky class loading issues between the Spark runtime and some of Titan's dependencies, so I ended up shelving it. Recently, however, I came across some slides from Spark Summit recently that showed someone making it work, so it should be possible.

I'm planning on taking a look at it again when 0.5 hits a full release and I get some spare bandwidth.

Josh


--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthias Broecheler

unread,
Jul 15, 2014, 9:18:44 PM7/15/14
to aureliu...@googlegroups.com
Yes, we do want to support Spark but haven't gotten the time/engineering resources to do so. We'll keep you guys in the loop.


--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Matthias Broecheler
http://www.matthiasb.com

Michael Oczkowski

unread,
Jul 17, 2014, 6:18:34 PM7/17/14
to aureliu...@googlegroups.com
Perhaps by way of Apache Crunch since it generates MR pipelines that can run on both Hadoop and Spark?

Patrick Barker

unread,
Jul 17, 2014, 8:33:33 PM7/17/14
to aureliu...@googlegroups.com
Spark with Titan would be amazing!

Byung-Wan Lim

unread,
Jul 17, 2014, 9:58:30 PM7/17/14
to aureliu...@googlegroups.com
Spark with Titan(or Faunus) would be amazing!!! (2)

Matt Chamberlin

unread,
Jul 21, 2014, 5:26:48 PM7/21/14
to aureliu...@googlegroups.com
+1 for Titan+Spark!

Michael Woytowitz

unread,
Jul 21, 2014, 7:12:00 PM7/21/14
to aureliu...@googlegroups.com
Would Titan+Spark use a GraphX bridge / integration?
https://spark.apache.org/graphx/


Robert Duffy

unread,
Apr 15, 2015, 8:59:14 PM4/15/15
to aureliu...@googlegroups.com
Any updates in this space on integrating Spark with Titan?

Austin Sharp

unread,
Apr 16, 2015, 1:44:54 PM4/16/15
to aureliu...@googlegroups.com
As others have said, this would be great!

Stephen Mallette

unread,
Apr 16, 2015, 1:59:24 PM4/16/15
to aureliu...@googlegroups.com
If you are interested in Spark integration you might want to follow along on with TinkerPop3 development on the gremlin-users mailing list.  Here's a link to the most recent work with spark - a benchmark over the Friendster dataset:


Matthias Broecheler

unread,
Apr 17, 2015, 5:16:08 PM4/17/15
to aureliu...@googlegroups.com
Titan 1.0 will support a spark connector so that the Spark GraphComputer referenced by Stephen can be executed over a graph stored in Titan. This might be limited to those storage backends that support a spark connector.

Todd Leo

unread,
Jul 8, 2015, 9:09:09 AM7/8/15
to aureliu...@googlegroups.com

Hi Matthias,

We are planning to store our knowledge base in Titan, and utilize Spark GraphX to perform heavy graph computation, thus we are thrilled to know “Titan 1.0 will support a spark connector”. Does this means GraphX can easily fetch data in Titan?

Plus, after wandered in Titan’s github issue tracker, no spark-related issue was created on 1.0 milestone. And I can’t tell if spark connector is available as #1045 closed, which is under 0.9 milestone.

Best,
Todd Leo

Matthias Broecheler

unread,
Jul 10, 2015, 2:14:49 PM7/10/15
to aureliu...@googlegroups.com
Yes, Spark is supported in the 0.9 line and will be for 1.0. However, we do need help from the community testing the release.
In principle this should entail support for GraphX. Note, however, that TP3 supports graph analytics directly through Gremlin which is likely much easier to use with a lower learning curve.

Reply all
Reply to author
Forward
0 new messages