Hey AJ,
1) It's both actually - depending on your configuration. Random partitioning is the "safe" option and the one that we are supporting in the alpha release of Titan. "Safe" because it leads to very well balanced partitions, is easy to use, requires no token ring configuration, etc. Titan also supports a vertex-id prefixing mode whereby vertices are slotted into partition buckets by virtue of having the same id prefix. Together with byte ordered partitioner this gives vertex locality in the cassandra cluster that can lead to better performance for traversals.
However, those performance differences are a lot less than one might think. One reason is that distributing the vertices across machines (and RP does that very well) you get the benefit of being able multi-thread your traversals across multiple cassandra machines which actually gives you very low latencies. Titan supports the ThreadedTransactionalGraph interface in Blueprints to enable such multi-threaded traversals.
Then again, some domains might have an obvious vertex partitioning to them which leads to significant performance gains. For those cases we will make the vertex id prefixed based partitioning with BOP easier to use in a future version.
2) RP leads to much less trouble with hotspots. When using vertex id prefixed partitioning with BOP it needs to be ensured that not too much locality is achieved to avoid hotspots. Constantly rebalancing would be too expensive.
3) Two options based on the above:
a) Multi-threaded traversals + RP + topology aware cassandra setup (e.g. amazon cluster groups) = high performance and easy
b) Vertex id prefix and partition buckets for local subgraphs + BOP = very high performance but requires more configuration
==> working on a future version of Titan to make this easy
4) Yes, Titan uses its own format for encoding edges and properties in ByteBuffers to make use of graph-centric compression schemes and generally storing data efficiently and effectively for quick retrieval. However, there is no danger of data corruption as all key-column-value triples stored are independent from each other. If you were to loose one for whatever reason, you would loose an edge in the graph. Titan preserves the data model of the underlying storage backend in that sense.