Async writes, batch-loading, auto schema issues

38 views
Skip to first unread message

Ashic Mahtab

unread,
Jun 8, 2016, 8:13:00 PM6/8/16
to Aurelius
I'm using Titan 1.0 with Cassandra. I'm trying to use Scala Futures to write many vertexes to a graph concurrently. A naive attempt failed for most of the writes with "Unable to close transaction". I then set storage.batch-loading to true. But then it fails when writing even a single vertex with:

Exception in thread "main" java.lang.IllegalArgumentException: Property Key with given name does not exist: x
at com.thinkaurelius.titan.graphdb.types.typemaker.DisableDefaultSchemaMaker.makePropertyKey(DisableDefaultSchemaMaker.java:32)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.getOrCreatePropertyKey(StandardTitanTx.java:923)

I tried setting schema.default=default explicitly, but it still seems to go to DisableDefaultSchemaMaker. Reading the docs for storage.batch-loading, I see:

Enables batch loading which improves write performance but assumes that only one thread is interacting with the graph

So, my question is can we really not write to Titan from multiple threads / using Futures? Seeing that there's no async api, this severely limits its uses in a streaming ingestion context. I'm obviously doing something wrong, and surely there's some feature to enable this. For my use case, I *want* to have auto schema, and not create schema entries for each and every property.

Any pointers would be welcome.

-Ashic.

HadoopMarc

unread,
Jun 9, 2016, 3:06:33 PM6/9/16
to Aurelius
Hi Ashic,

I do not know about multiple threads (maybe here is a viable solution too), but bulkloading with multiple Titan instances is possible, see the BulkLoader in TinkerPop3 using Spark/Giraph towards Titan.

HTH,  Marc

Op donderdag 9 juni 2016 02:13:00 UTC+2 schreef Ashic Mahtab:

Stephen Mallette

unread,
Jun 10, 2016, 6:34:39 AM6/10/16
to Aurelius
I don't know that there is any "feature" that allows writing to the graph concurrently, but you did need to understand the transaction model of TinkerPop/Titan in order to do it. You probably should review that documentation again to be sure that you have your transaction and threads executing those transactions doing what is expected of them. 

You probably want to stay away from the batch-loading setting for what you are trying to do, but you should reconsider no defining your schema. Unless you truly won't know what your schema is beforehand, you really won't get the most out of Titan by letting Titan define the types dynamically. You should reconsider that approach. It's almost never a good idea to not define the schema ahead of time. Schema changes also acquire locks I think and if you're letting Titan build the schema dynamically and have some high concurrency while doing so, you might hit more locking exceptions than you would like to see depending on how often new keys are added.

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/b3e9c54e-37c8-4d0a-9ee0-5b1ba0f2d38d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages