|Cassandra performance decreases drastically with increase in data size.||srmore||5/29/13 9:32 PM|
and why does my performance go back to normal when I restart Cassandra ?
So, my question is, are there any optimizations that I can do to handle these large datatasets ?
Some hunting landed me to this page http://wiki.apache.org/cassandra/LargeDataSetConsiderations which talks about the large data sets and explains that it might be because I am going through multiple layers of OS cache, but does not tell me how to tune it.
Hello,I am observing that my performance is drastically decreasing when my data size grows. I have a 3 node cluster with 64 GB of ram and my data size is around 400GB on all the nodes. I also see that when I re-start Cassandra the performance goes back to normal and then again starts decreasing after some time.
|Re: Cassandra performance decreases drastically with increase in data size.||Jonathan Ellis||5/29/13 10:05 PM|
Sounds like you're spending all your time in GC, which you can verify
by checking what GCInspector and StatusLogger say in the log.
Fix is increase your heap size or upgrade to 1.2:
Project Chair, Apache Cassandra
|Re: Cassandra performance decreases drastically with increase in data size.||srmore||5/30/13 2:31 PM|
You are right, it looks like I am doing a lot of GC. Is there any short-term solution for this other than bumping up the heap ? because, even if I increase the heap I will run into the same issue. Only the time before I hit OOM will be lengthened.It will be while before we go to latest and greatest Cassandra.
|Re: Cassandra performance decreases drastically with increase in data size.||Bryan Talbot||5/30/13 8:48 PM|
One or more of these might be effective depending on your particular usage
- remove data (rows especially)
- add nodes
- add ram (has limitations)
- reduce bloom filter space used by increasing fp chance
- reduce row and key cache sizes
- increase index sample ratio
- reduce compaction concurrency and throughput
- upgrade to cassandra 1.2 which does some of these things for you
|Re: Cassandra performance decreases drastically with increase in data size.||Aiman Parvaiz||5/30/13 11:47 PM|
I believe you should roll out more nodes as a temporary fix to your problem, 400GB on all nodes means (as correctly mentioned in other mails of this thread) you are spending more time on GC. Check out the second comment in this link by Aaron Morton, he says the more than 300GB can be problematic, though this post is about older version of cassandra but I believe concept still stands true:
|Re: Cassandra performance decreases drastically with increase in data size.||srmore||6/3/13 7:07 AM|
The other interesting thing I noticed was that there were some objects with finalize() methods, this could potentially cause GC issues.
Thanks all for the help.I ran the traffic over the weekend surprisingly, my heap was doing OK (around 5.7G of 8G) but GC activity went nuts and dropped the throughput. I will probably increase the number of nodes.