We saw very high I/O on cassandra disk during Job run. Stats reports disk was 100% utilized and cascading effect created high CPU I/O waits as well.
Jobs is reading about 6 months of data.
Snippets from sar reports during job run ,
tps stats
tps rtps wtps bread/s bwrtn/s
12:00:01 PM 61103.19 60872.85 230.34 2602958.15 1862.07
12:10:01 PM 53627.49 53322.65 304.84 2398993.61 2453.60
12:20:01 PM 51178.42 50967.70 210.72 2242130.22 1699.57
12:30:01 PM 20184.13 19977.62 206.51 840243.49 1672.00
12:40:01 PM 348.86 42.39 306.48 3456.17 2465.28
12:50:02 PM 9299.85 9105.51 194.34 408328.59 1567.97
01:00:01 PM 71104.14 70805.07 299.08 3006636.39 2412.64
01:10:01 PM 55499.15 55259.37 239.78 2464872.79 1931.55
01:20:01 PM 50671.93 50399.19 272.74 2299741.84 2195.87
01:30:02 PM 18517.04 18334.13 182.92 790755.50 1483.49
01:40:01 PM 51815.33 51638.28 177.05 2359860.97 1430.31
01:50:01 PM 60880.49 60303.03 577.46 2593977.88 4633.45
02:00:01 PM 55850.12 55651.77 198.35 2391941.71 1605.65
02:10:01 PM 38998.24 38848.46 149.78 1667650.90 1212.22
CPU stats
CPU %user %nice %system %iowait %steal %idle
12:00:01 PM all 28.13 0.02 15.39 11.51 0.00 44.96
12:10:01 PM all 23.72 0.18 13.63 9.97 0.00 52.50
12:20:01 PM all 21.57 0.00 12.75 9.31 0.00 56.36
12:30:01 PM all 9.63 0.00 5.26 3.48 0.00 81.63
12:40:01 PM all 1.95 0.16 0.59 0.00 0.00 97.29
12:50:02 PM all 5.02 0.00 2.72 1.72 0.00 90.54
01:00:01 PM all 28.76 0.30 17.12 12.27 0.00 41.55
01:10:01 PM all 24.16 0.25 13.98 10.24 0.00 51.38
01:20:01 PM all 22.06 0.02 12.91 9.45 0.00 55.56
01:30:02 PM all 8.99 0.00 4.92 3.23 0.00 82.86
01:40:01 PM all 22.05 0.02 13.15 9.85 0.00 54.94
01:50:01 PM all 25.80 0.83 14.89 10.63 0.00 47.85
Can we tune cassandra caching and heap parameters ?
Cassandra verson. : 2.1.7
Current settings:
cat cassandra.yaml |grep -v "#" |grep cache
key_cache_size_in_mb: 100
key_cache_save_period: 14400
row_cache_size_in_mb: 500
row_cache_save_period: 14400
vm.swappiness =60
Suggested settings on blogs:
Disable /Lower key_cache_save_period: 0
Disable /Lower row_cache_save_period: 0
key_cache_size_in_mb: 512
row_cache_size_in_mb: 10240
row_cache_provider: SerializingCacheProvider
Disable/lower vm.swappiness =10
Its a production cluster. Can someone Please help.