Cassandra reaper takes too long to finish

527 views
Skip to first unread message

Abdul Patel

unread,
Feb 25, 2019, 9:43:01 PM2/25/19
to TLP Apache Cassandra Reaper users
Hi,

U recently moved to reaper in prod, non prod works fine.
Prod being bigger footprint , but loooking at reaper capabilitoes its still small.
I have 18 node 3 dc cluster overlall 1.2TB data.
I installed reaper in each dc and config is EACH.
One of my keyspace took 1day and 6hrs to finish and
Even like system-dusyributed keyspaces is almost 1 day and still running and estimate is another 2 days.
The segements are only 1537.

Another point is its full repair , we never ran full repair earlier .
Do i need to change any config or strategy?
Please advice 

Alexander Dejanovski

unread,
Feb 26, 2019, 1:05:44 AM2/26/19
to Abdul Patel, TLP Apache Cassandra Reaper users
Hi Abdul,

which version of Cassandra and Reaper are you running?
The fact that you have 18 nodes in 3 DCs means 6 nodes per DC, which with RF 3 will only allow at most 2 segments to be processed at the same time.
Each segment though will involve 9 nodes, which could take a fairly long time. 
You could speed things up by using PARALLEL as repair parallelism instead of the default DATACENTER_AWARE, which will make all replicas compute the Merkle trees at the same time instead of one per DC (it's going to put more load on the cluster though).
I don't think system_distributed is really worth repairing if you want to save some time.

Before Cassandra 4.0 comes out, we still recommend to run full repair instead of incremental repair. If you ran incremental repair before, your sstables are probably still marked as repaired which will cause some troubles. Please check my blog post for more informations : http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html

Also, as long as you can run repair within GC grace seconds, you're fine with cyclic repairs. The plan is to be able to repair within gc grace without impacting performance.

Cheers,


--
You received this message because you are subscribed to the Google Groups "TLP Apache Cassandra Reaper users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.
To post to this group, send email to tlp-apache-cassa...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlp-apache-cassandra-reaper-users/CAHEGkNNbRNv_gXmVnuYDxWobigs8beeB6WtWz3rm6Zisji%3DZ6g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting

Abdul Patel

unread,
Feb 26, 2019, 7:13:37 AM2/26/19
to Alexander Dejanovski, TLP Apache Cassandra Reaper users
Hi Alex,

The Replication factor is 4 in each DC as high availability was requirement for the project.
And one of the keyspace deletes data every 48 hrs hence to avoid tombstones the gc grace period is set for around 54 hrs.
I have seen system and system schema keyspaces runs pretty fast .
Also if we schedule a full repair we cannot schedule incremental seems in reaper.


On Tuesday, February 26, 2019, Alexander Dejanovski <al...@thelastpickle.com> wrote:
Hi Abdul,

which version of Cassandra and Reaper are you running?
The fact that you have 18 nodes in 3 DCs means 6 nodes per DC, which with RF 3 will only allow at most 2 segments to be processed at the same time.
Each segment though will involve 9 nodes, which could take a fairly long time. 
You could speed things up by using PARALLEL as repair parallelism instead of the default DATACENTER_AWARE, which will make all replicas compute the Merkle trees at the same time instead of one per DC (it's going to put more load on the cluster though).
I don't think system_distributed is really worth repairing if you want to save some time.

Before Cassandra 4.0 comes out, we still recommend to run full repair instead of incremental repair. If you ran incremental repair before, your sstables are probably still marked as repaired which will cause some troubles. Please check my blog post for more informations : http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html

Also, as long as you can run repair within GC grace seconds, you're fine with cyclic repairs. The plan is to be able to repair within gc grace without impacting performance.

Cheers,


On Tue, Feb 26, 2019 at 3:43 AM Abdul Patel <abd7...@gmail.com> wrote:
Hi,

U recently moved to reaper in prod, non prod works fine.
Prod being bigger footprint , but loooking at reaper capabilitoes its still small.
I have 18 node 3 dc cluster overlall 1.2TB data.
I installed reaper in each dc and config is EACH.
One of my keyspace took 1day and 6hrs to finish and
Even like system-dusyributed keyspaces is almost 1 day and still running and estimate is another 2 days.
The segements are only 1537.

Another point is its full repair , we never ran full repair earlier .
Do i need to change any config or strategy?
Please advice 

--
You received this message because you are subscribed to the Google Groups "TLP Apache Cassandra Reaper users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-reaper-users+unsubscribe@googlegroups.com.
To post to this group, send email to tlp-apache-cassandra-reaper-us...@googlegroups.com.

Alexander Dejanovski

unread,
Feb 26, 2019, 7:23:17 AM2/26/19
to Abdul Patel, TLP Apache Cassandra Reaper users
RF 4 is probably not a great choice because LOCAL_QUORUM will require 3 nodes to be up, which only allows for one node down, which is the same as RF 3.
Are you performing actual DELETE statements every 48 hours or do you have 48 hours TTLs?

Incremental repair is fully supported by Reaper and you can create incremental repair schedules. If you have to go down that path for performance reasons I suggest you only use it on tables using SizeTieredCompactionStrategy. Running incremental with LCS is way too dangerous (and inefficient). Do note that incremental repair is fully broken for tables using Materialized Views (as data goes through the write path for such tables, and all streamed data cannot be marked as repaired).

system is a keyspace that is local to each node, and system_schema doesn't contain much data anyway, so they are indeed likely to be fast. I still don't think you should bother repairing system_distributed as it mostly contains repair history (which is expendable). If you have MVs (which I hope is not the case) then you can repair system_distributed.view_build_status since I'm not sure what it's used for. But repairing repair history seems a bit redundant :)

Cheers,


To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.
To post to this group, send email to tlp-apache-cassa...@googlegroups.com.
--
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting

Abdul Patel

unread,
Feb 26, 2019, 7:48:11 AM2/26/19
to Alexander Dejanovski, TLP Apache Cassandra Reaper users
Those are actual deletes and not ttl.

To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-reaper-users+unsubscribe@googlegroups.com.
To post to this group, send email to tlp-apache-cassandra-reaper-us...@googlegroups.com.
--
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting

Alexander Dejanovski

unread,
Feb 26, 2019, 7:52:03 AM2/26/19
to Abdul Patel, TLP Apache Cassandra Reaper users
I'll assume you know the nitty gritty details of how tombstones get evicted then (and what that means for compaction). In such a short timespan, I guess incremental repair could be the only viable option.

To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.
To post to this group, send email to tlp-apache-cassa...@googlegroups.com.

Abdul Patel

unread,
Feb 26, 2019, 8:21:04 AM2/26/19
to Alexander Dejanovski, TLP Apache Cassandra Reaper users
Ok thanks.
Another question here was if i try to schedule incremental when full is already in schedule for 1 keyspace it gives error so is it i can have only 1 in scheduler either full/incremental?
Also will ot performance impact if we increase the repair threads?
To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-reaper-users+unsubscribe@googlegroups.com.
To post to this group, send email to tlp-apache-cassandra-reaper-us...@googlegroups.com.

Alexander Dejanovski

unread,
Feb 26, 2019, 8:57:17 AM2/26/19
to Abdul Patel, TLP Apache Cassandra Reaper users
increasing repair threads will only help if you're using Cassandra 2.2. In 3.0/3.11 it won't change a thing for Reaper.
I confirm you can only have one type of repair for a keyspace (either full or incremental, not both). It may require you to remove all existing instances of the full repair in the history, and the associated schedule before you can schedule the incremental repair.

To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.
To post to this group, send email to tlp-apache-cassa...@googlegroups.com.

Abdul Patel

unread,
Mar 6, 2019, 10:19:20 AM3/6/19
to Alexander Dejanovski, TLP Apache Cassandra Reaper users
Alex/Team,

Just to provide an update, we found one of node was unresponsive after 1st reaper run<which took 1 day to finish>, with no errors.
Tried to start node but wasnt coming up and was hanging up at startup , finally removed node from cluster and added back and ran cleanup on other nodes.
Now the reaper is working smoothly on all nodes with max time for biggest keyspace is 3hr 30 mins.

Looks like the problematic node was the issues , still digging more to find out why the node never reported any issues earlier.
To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-reaper-users+unsubscribe@googlegroups.com.
To post to this group, send email to tlp-apache-cassandra-reaper-us...@googlegroups.com.

Alexander Dejanovski

unread,
Mar 6, 2019, 10:54:24 AM3/6/19
to Abdul Patel, TLP Apache Cassandra Reaper users
Thanks a lot for following up!
Happy to hear it's working now.

Cheers

To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.
To post to this group, send email to tlp-apache-cassa...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages