We have an 18 node Cassandra cluster running on AWS. This cluster has been running for quite some time. We now have a situation where the client times out on some queries and we are looking for some ideas on how to track this problem down.
The general situation is as follows:
We have many tables but two tables are primarily involved in this issue. The application is performing an in application join, per advice we have gotten from DataStax engineers.
A user has interests, which is essentially a wide table that has a partition key of userId and a list of the associated partition keys for the user's interests, which are the partition keys of the interests table. The interests table is where the meta data for the interest resides. This design help us avoid having to update the data in thousands of records in fully materialized views when the meta data for a particular interest changes.
We resolve the meta data for a user interest by performing a second query. The second query is the one that is now timing out.
That query looks like this, with 200 specific interests to fetch (I have edited out many of the id's to keep this succinct):
select * from interests where interest_id in (a1d82d54-136e-4032-8ed6-6a233e13f0c2,38ffa7fb-747b-465c-ae68-eb4652966ce9 ,f3c40c79-4e93-43f6-9029-48cb6554588a ,859b798f-cc36-4961-9165-345c6c8e4363 ,f16f4fcd-2df2-4127-b1e8-83e45bcfebdb ,06411ad8-aff1-429f-855a-3816c87da6e1 ,635ca1cb-1708-44fd-9050-a3e2014e2735);
The meta-data for any interest is quite small: 13 attributes and less than 500 characters total.
I can run this query on one node using the CQL shell and it returns nearly instantly. It times out from the application where it may be hitting other nodes as the coordinator.
What changed... What we believe to be the root cause. We had a team use a REST endpoint to update the interest meta data. The application was written to perform an upsert. This action created tombstones for every interest. That's when it suddenly got slow.
The interest table is a narrow row. Partition key and 13 fields of data.
We have compacted the tables but we are still getting timeouts.
CfStats for the table look like this:
nodetool cfstats interests.interests
Keyspace: interests
Read Count: 303758334
Read Latency: 0.11072883718476016 ms.
Write Count: 7792
Write Latency: 0.031202130390143735 ms.
Pending Tasks: 0
Table: interests
SSTable count: 2
Space used (live), bytes: 7736156
Space used (total), bytes: 7736156
Off heap memory used (total), bytes: 70920
SSTable Compression Ratio: 0.4049112590236776
Number of keys (estimate): 46336
Memtable cell count: 325
Memtable data size, bytes: 100555
Memtable switch count: 114
Local read count: 303758334
Local read latency: 0.104 ms
Local write count: 7792
Local write latency: 0.000 ms
Pending tasks: 0
Bloom filter false positives: 61357
Bloom filter false ratio: 0.00651
Bloom filter space used, bytes: 58752
Bloom filter off heap memory used, bytes: 58736
Index summary off heap memory used, bytes: 10136
Compression metadata off heap memory used, bytes: 2048
Compacted partition minimum bytes: 259
Compacted partition maximum bytes: 1331
Compacted partition mean bytes: 409
Average live cells per slice (last five minutes): 1.0
Average tombstones per slice (last five minutes): 0.0
Any ideas on how to approach this problem?