Inserting data to titan using Map reduce job

139 views
Skip to first unread message

Darsh

unread,
Mar 3, 2015, 3:10:14 PM3/3/15
to aureliu...@googlegroups.com
Hi,

I new to titan and i want to create graph from the data stored in HDFS and insert to titan using java. I was going through different posts in this group but still confused with what the best way to do it. Can anyone please guide me  or point me to a correct and latest documentation ? I m going to use the latest version tian-0.5.4-hadoop. Any sample example will be a great help.


Darsh

Daniel Kuppitz

unread,
Mar 3, 2015, 3:38:25 PM3/3/15
to aureliu...@googlegroups.com
Hi,

please see the Powers of Ten blog post (part 2). It shows how to load a ~2B edge graph using Faunus. Titan/Hadoop can use almost the same code (I'm not sure if you have to change anything at all). Read the blog post and the documentation for Script IO Format - that should get you started pretty quickly.

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/62903b18-5450-426e-8992-75d6a6dd9954%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Darsh

unread,
Mar 5, 2015, 3:49:02 PM3/5/15
to aureliu...@googlegroups.com
Thank you for your reply Daniel. Very useful blog to understand the batch graph loading. 

Blog post explains everything using gremlin. Is it possible to do this using java ? Does the titan java api supports all the input, output formats? Or i have to use gremlin only?

Daniel Kuppitz

unread,
Mar 5, 2015, 4:30:04 PM3/5/15
to aureliu...@googlegroups.com
BatchGraph loads can be done completely in Java. Faunus loads can be partially done in Java (the ScriptInputFile has to be a Groovy file).

Cheers,
Daniel


Darsh

unread,
Mar 10, 2015, 3:29:42 PM3/10/15
to aureliu...@googlegroups.com
Thank you Daniel.   

Sorry i am bit confused. I have couple of questions,

  1. I am assuming, the output of map/reduce job (using the titan java api(0.5.4) and faunus) i will be able to insert vertices directly into remote cassandra datastore i m using. Am i missing anything here ?
  2. How can i pass faunus properties to my map/reduce job? inside the setup method () ? 
  3. Someone on the group said  "The output of my MR job is used as input for faunus". If yes, then how can i insert that to cassandra? 
  4. Do we have any sample example written in java using titan-hadoop and java api  ?
Thanks,

Darsh

Daniel Kuppitz

unread,
Mar 10, 2015, 6:04:48 PM3/10/15
to aureliu...@googlegroups.com
Hi Darsh.

I am assuming, the output of map/reduce job (using the titan java api(0.5.4) and faunus) i will be able to insert vertices directly into remote cassandra datastore i m using. Am i missing anything here ?

That's right, all you need is the TitanCassandraOutputFormat.


How can i pass faunus properties to my map/reduce job? inside the setup method () ?

It's not different from the shell examples (e.g. in Getting Started):

HadoopFactory.open(yourProperties)


Someone on the group said  "The output of my MR job is used as input for faunus". If yes, then how can i insert that to cassandra?



Do we have any sample example written in java using titan-hadoop and java api  ?

I don't think so. Even in client projects I (almost?) never used Java for Faunus / Titan/Hadoop jobs; instead I wrote cronjobs to trigger the tasks:

0 0 * * * /usr/local/titan/bin/gremlin.sh -e /scripts/daily-analytics.groovy >> /tmp/analytics.out

However - as mentioned before - if you want to use Java, it shouldn't be much different from the shell samples. Just make sure that all the necessary environment variables (like HADOOP_PREFIX) are set properly.

Cheers,
Daniel



Reply all
Reply to author
Forward
0 new messages