Taking a snapshot of a Titan graph for OLAP

197 views
Skip to first unread message

Roy Levin

unread,
Dec 11, 2014, 3:17:01 AM12/11/14
to aureliu...@googlegroups.com
Hi,

I wanted to thank Daniel, Matthias, Marko, Stephen and the others for taking the time to answer all my previous questions.

I have another question about Titan.
In our scenario we are examining possible solutions for mixing OLTP and OLAP capabilities in the best possible way
(i.e. achieving both scalability and performance).
I have been thinking about the option of using Titan as an OLTP graph and then taking a read-only snapshot of it
to use as an RDD to process with GraphX (spark) to apply OLAP analytics over.
I know it is possible (based on existing DB techniques) to generate such a snapshot in O(1) time.

I was wondering if a Titan read-only Transaction
(as explained here http://s3.thinkaurelius.com/docs/titan/current/tx.html)
can serve as an RDD for Spark?

To better understand this my main questions are:
(1) How are the transactions implemented in Titan and is it ok that I hold such a transaction open for a long time (i.e. until the analytics algorithms are done).
(2) I guess to use with GraphX I will need to create this transaction in Scala and then provide it as an RDD for GraphX --- are there any such code examples for Titan with Scala and Spark?

Thanks again for answering all my questions.

Regards,
Roy.

Matthias Broecheler

unread,
Dec 11, 2014, 2:39:26 PM12/11/14
to aureliu...@googlegroups.com
Hello Roy,

I think this is better addressed using a Hadoop based input format. We are currently re-working how that is implemented, but the idea is to provide a distributed read-mode for Titan rather than having one big scanning transaction (which doesn't scale and is slow). We are not yet at the point where we can support Spark, but given its increasing adoption and native support in the hadoop ecosystem and by Cassandra (in DSE at least for now) this should be feasible soon.

Cheers,
Matthias

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/adad4d7c-6d8f-41f2-a506-40fe1c3ba63f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Matthias Broecheler
http://www.matthiasb.com

Roy Levin

unread,
Dec 12, 2014, 12:34:39 AM12/12/14
to aureliu...@googlegroups.com
Thanks for the reply Matthias,

Just to clarify I am not referring to a big scanning transaction but rather to an O(1) readonly snapshot which can be implemented using various versioning techniques (as with other dbs).
I don't have much information about how transactions are implemented in Titan so I don't know if these can be used to create this snapshot, but in theory this could be possible.

Regarding the Hadoop based input format, I guess one could implement an Hadoop based input format which implements such a readonly snapshot of the Titan graph --- is that what you are suggesting?
Is that one of the things that are planned for Titan 1.0?

Thanks,
Roy.

Matthias Broecheler

unread,
Dec 15, 2014, 2:46:11 PM12/15/14
to aureliu...@googlegroups.com
Hello Roy,

I probably don't understand what you are asking for. Titan is not an MVCC database - at least not with the currently supported storage backends unless you are using BDB with transactions enabled. The view or snapshot is given to you by one transaction. With sufficient transaction size, a transaction will give you a repeatable read model.

HTH,
Matthias


For more options, visit https://groups.google.com/d/optout.

Roy Levin

unread,
Dec 15, 2014, 10:26:01 PM12/15/14
to aureliu...@googlegroups.com
Thanks for the reply Matthias.

Actually, you are right, I am referring to something along the lines of MVCC -- but a bit more simplistic.
Is there any relevant documentation about how transactions are implemented in Titan?

Thanks,
Roy.

Matthias Broecheler

unread,
Dec 18, 2014, 9:21:27 PM12/18/14
to aureliu...@googlegroups.com
Hello Roy,

there isn't a document on how transactions are implemented, but essentially, Titan wraps around the transaction mechanism of the underlying storage backend. So, if you use BDB you get whatever transactional guarantees BDB affords you (which in turn depends on the configuration of BDB). Against C* and HBase, transactions don't exist, so Titan transactions don't have any consistency guarantees, but they are atomic and cache read values so it gives you repeatable-read.

HTH,
Matthias


For more options, visit https://groups.google.com/d/optout.

Roy Levin

unread,
Dec 20, 2014, 6:56:56 AM12/20/14
to aureliu...@googlegroups.com
Thanks Matthias,

So with BDB I think the picture is very clear, Titan will delegate Transaction management to BDB and BDB can handle them as I choose.

Yet, with C* and hbase, I don't even see how atomicity can be achieved since atomicy guarantees that transactions cannot be partially committed.
HBase and C* will guarantee that an insert operation (e.g. put in hbase) is atomic but I don't see how titan can implement a transaction using a single insert operation,
especially when a single transaction can cross multiple vertices and edges.
For instance, if a node that is now performing a transaction which inserts multiple vertices and edges (using multiple inserts) 
goes down in the middle of executing the transaction the transaction will be partially committed.
Rollback will not occur when the node is restarted unless it maintains some transactions log.
(or if there is some more sophisticated mechanism behind the scenes which I am not aware of). 
As far as hbase and C* are concerned, they provided atomicity for each insert but this is not enough for graphdb atomocy ... 
Furthermore, even when inserting a single edge between v1 and v2 both vertices need to be updated (v1's outgoing and v2's incoming), 
hence I am not sure even this can be done using a single db insert ...
So potentially, even an addEdge does not guarantee atomocy.
Is this correct or am I missing something?

Regards,
Roy.

boorad

unread,
Dec 22, 2014, 10:16:07 AM12/22/14
to aureliu...@googlegroups.com
We accomplish the snapshot by hosting our Titan graph in a MapR-DB table, which is HBase-compatible.  The snapshot takes a few milliseconds for any size table / cluster, and has a read-only copy of your graph as it existed right then.

BA

Roy Levin

unread,
Dec 23, 2014, 3:28:02 AM12/23/14
to aureliu...@googlegroups.com
Thanks Boorad, this sounds like a good option for what I am looking for expect for the licence issues :(
I suppose you are not aware of any open source alternatives that can accomplish the same task.
Also, I prefer working with Cassandra and Titan rather than with hbase as Titan is more optimized to work with Cassandra.

Thanks,
Roy,

Matthias Broecheler

unread,
Jan 7, 2015, 1:04:05 PM1/7/15
to aureliu...@googlegroups.com
Hello Roy,

So with BDB I think the picture is very clear, Titan will delegate Transaction management to BDB and BDB can handle them as I choose.

correct
 
Yet, with C* and hbase, I don't even see how atomicity can be achieved since atomicy guarantees that transactions cannot be partially committed.

Here, the picture gets more complicated. C* supports atomic_batch_mutation which is what Titan uses under the hood. HBase does not have such functionality yet. For both HBase and C* Titan uses batch operations, so all updates and removes are done in one operation. That operation is only atomic against C* atm.
 

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages