The main purpose of these experiments was to see what the "best" setup was for a single node server was. There had been rumblings in the Aurelius community that people weren't seeing much improvement in terms of speed, in a variety of metrics, when using Titan embedded versus running Cassandra in a separate jvm process locally.
The main scenario that was tested can be found here (scala formatting seems odd for some reason on github):
The scenarios tested some simple queries that involved traversals and random side effects. Before anyone starts throwing their specific scripts and scenarios to test, please note that there is literally an infinite number of possible scripts we could run, each meant to achieve a various goal. We choose a simple set of scripts that represented a semi realistic workload for a single node set up.
The following scripts were used to start the embedded and local titan setups:
For hardware, I used m3.2xlarge's on ec2 with ubuntu 12.04 LTS. I had an instance for gatling to run on and an instance for titan to run on. These instances all networked with each other via elastic ip's. The embedded instances were given 20g of heap space to work with while the local results had 10g for titan and 10g for cassandra.
Here are some of the results (if you are reading this in the future and the webpages are down, send me an email and I'll put them back up for you):
Local results (along with gc output):
Embedded results (along with gc output):
Local results are generally faster than embedded results, period. More requests/second, smaller means, smaller std deviations, smaller percentiles, all of them were better. These tests were repeated several times and all showed that embedded was consistently worse than local. The Aurelius team theorized that garbage collection was the reason behind the unexpected slowdown. Pavel Yaskevich explained it as follows:
"By separating Titan and Cassandra processes you get separate GC behavior (all ParNew and PSYoungGen are Stop-The-World events so even for 5-10 ms stops like that disrupt the whole pipeline) and help from operating system to buffer producer (Titan) packets while consumer is stopped (Cassandra) via loopback, scheduling also becomes easier task for OS as both now have separate quantum."
In conclusion, we did not see a reason to keep titan embedded in active development. It was expected that embedded would at the least blow local out of the water in terms of speed, but we weren't able to find a scenario where that happened. Thus, while there may be a scenario where running cassandra embedded with titan is a win, titan embedded didn't win in simple scenarios that were more likely to occur. So, it doesn't make sense to maintain and develop titan embedded moving forward. If you have any questions, please shoot them my way and I'll answer as best I can.
-Zack
P.P.S: perf doesn't collect many stats on ec2 because they've compiled the kernel with most of the useful kernel flags off.