Cassandra cluster extremely slow

112 views
Skip to first unread message

CasperCLD

unread,
Mar 4, 2015, 2:58:21 AM3/4/15
to aureliu...@googlegroups.com

I'm doing some prototyping/benchmarking on Titan. I've got one Titan-Cassandra-ES VM running and two cassandra VM's. 
Each of them owning roughly 33% of the data (acquired via bin/nodetool status):

All the machines virtualboxes have 4GB of RAM and 4 i7 cores.
I'm interested in all adjacent nodes, so I call Rexter with: http://192.168.33.10:8182/graphs/graph/vertices/35082496/both
These are the results (in seconds):

Edges

Titan

CC (1N)

Titan

WC (1N)

Titan

CC (2N)

Titan

WC (2N)

Titan

CC (3N)

Titan

WC (3N)

1035

2781

1185

4344

780

5736

2207

1505

1834

1068

3124

1076

3632

2621

2147

1716

1675

2525

1649

4539

3761

3113

2320

2189

2738

2170

5546

4982

5140

3968

3799

4191

3887

9236

8579

100000

79075

76408

80041

76803

176018

167188


Again in fancy graph form:
NOTE: with the two nodes test, the setup was the exact same as described above, except there is one Cassandra nodes less. The two nodes (titan-casssandra and Cassandra) both owned 50% of the data.

Titan is the fastest with 1 node and performance tend to degrade when more nodes are added. This is the opposite of what distribution should accomplish, so obviously I'm doing something wrong, right?
How to improve this performance?
If you need any additional information, don't hesitate to reply.
Thanks

Casper

CasperCLD

unread,
Mar 4, 2015, 5:57:32 AM3/4/15
to aureliu...@googlegroups.com
These are my configurations:
Node 1:
titan-cassandra-es.properties: http://pastebin.com/k92by2yS

Node 2 and node 3 have the exact same YAML file. The only difference is the listen_address (this is equal to the node's IP)

CasperCLD

unread,
Mar 5, 2015, 3:02:10 AM3/5/15
to aureliu...@googlegroups.com
Bump


On Wednesday, March 4, 2015 at 8:58:21 AM UTC+1, CasperCLD wrote:

Bryn Cooke

unread,
Mar 5, 2015, 9:52:07 AM3/5/15
to aureliu...@googlegroups.com
Can you try running your queries through the gremlin console to make sure that rexster isn't the bottleneck.
Make sure you run each query multiple times to give Cassandra time to warm up.

For a low number of concurrent queries I would expect a single node to be faster than multiple nodes as no network communication has to take place.

Other things I have done when using Titan are:
  • Use a standalone version of Cassandra and ElasticSearch. In particular use a recent 2.x version of Cassandra. I have used the community Datastax packages in the past.
  • Use SSDs. This makes a huge difference to performance.

Bryn

CasperCLD

unread,
Mar 6, 2015, 3:50:58 AM3/6/15
to aureliu...@googlegroups.com
Thanks for your reply Bryn! Much appreciated. I'll try out your suggestions.
Reply all
Reply to author
Forward
0 new messages