I've been doing some work with Faunus recently on a reasonably large graph and hit a number of problems that I think come up a lot. I think they come up a lot because I did a fair amount of searching for the answers to my problems and either found answers in different places or didn't get clear answers at all. I'm not sure if I have clear answers here, but perhaps it will help someone out if I document a bit.
First of all, some environmental information:
+ Titan+Cassandra.
+ Cassandra cluster consists of 6 m1.xlarge EC2 instances
+ Hadoop cluster consists of 6 m2.xlarge EC2 instances
My first job was to simply get all the data into sequence file... g._(). The first problem I solved was in regard to "Message length exceeded" errors and I think most people know the answer to that...add these lines to your faunus.properties file:
cassandra.thrift.framed.size_mb=49
cassandra.thrift.message.max_size_mb=50
It actually took some trial and error to get these settings right. I just kept bumping the size up until the errors disappeared. Not sure if there is a better way to do that or to know the right setting from the outset. I ended up with 256 as the value for both settings given my graph structure.
The next error message I dealt with was: "TitanException: Could not connect to Cassandra to read partitioner information. Please check the connection". In this case, I fixed that by changing from "cassandrathrift" to:
faunus.graph.input.titan.storage.backend=cassandra
I can never remember which one of those works best in EC2. Always feels like I guess wrong, no matter which I pick first, but in this case "cassandra" was the answer.
Then I started to blow the heap getting "GC limited exceeded" and OutOfMemoryError exceptions. I edited my faunus.properties file as follows:
mapred.map.child.java.opts=-Xmx6144M
The m2.xlarge are memory optimized and have 17G of RAM. Given two mappers per node, I figured 6G was ok to spare to each mapper.
Then I started to get timeout exceptions when connecting to Cassandra (usually happened a good way into the job i was executing). I fixed that with:
cassandra.range.batch.size=256
The default for that value is 4096. If anyone can share why the 4096 number was too high, I'd like to know the reason.
Anyway, since those changes went into play, I've not had any problems executing my Faunus jobs. Hope this helps someone out in the future...even if that person is just me :)
Stephen