Hello
I am using Titan with cassandra as the backend (single node).
I have a file which has graph data in GraphSON format. The file size is around 9GB.
There are 6.2 million vertices and around 5 million edges.
Edges has hashmap as property value. I have used the Serializer found here
https://github.com/pluradj/titan-attribute-serializer It took me 40 hours to load this data into cassandra. I did face out-of-memory error a few times, finally was able to run to completion with around 200GB memory ( fortunately for this experiment i had a machine which had lots of RAM)
I looked at a lot posts which talk about BulkLoaderVertex program. I could not quite see an example configuration which resembles my case.
Most of the posts talk about KryoSerializer. But in my case i have a GraphSON file which is an output from another system. I cannot change that. Going forward i will be not be able to use that machine which has lots of RAM.
The application is written in Java. I use titan embedded ( i.e i use the Titan via the jar, don't have a separate titan installation)
i use
TitanGraph.io(IoCore.graphson()) to read the file currently. Does this construct the entire graph in memory first before persisting it or does it incrementally persist, If it does incremental persistence i should not require a lot of RAM right?
I did try setting ids.block-size to 100000.
Titan does not seem to take user supplied IDs. i did set "
graph.set-vertex-id" to "true" but there is an Exception thrown
Titan 1.0.0
Cassandra ( datastax version) 3.7.0
Tinkerpop 3.0.1-incubating ( since Titan runs only with this version)
I can use Spark, but not allowed to install Hadoop.
I wish to load this data into cassandara using Titan.
Could someone please help me on how to reduce the load time and reduce RAM consumption.
Thank you
Regards