Remove 1 Repair per node limit

29 views
Skip to first unread message

Dan Bader

unread,
Nov 6, 2019, 6:53:40 AM11/6/19
to TLP Apache Cassandra Reaper users
Is there a way to remove or change the 1 running repair per node limit?  We would like to run 2 or 4 of them per node right now.  

Alexander Dejanovski

unread,
Nov 6, 2019, 9:45:32 AM11/6/19
to Dan Bader, TLP Apache Cassandra Reaper users
Not yet, but we're considering it for a near future.

-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting


On Wed, Nov 6, 2019 at 12:53 PM Dan Bader <bad...@gmail.com> wrote:
Is there a way to remove or change the 1 running repair per node limit?  We would like to run 2 or 4 of them per node right now.  

--
You received this message because you are subscribed to the Google Groups "TLP Apache Cassandra Reaper users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlp-apache-cassandra-reaper-users/6ebc5c25-07d3-41d3-ab8d-48b0ff4e9897%40googlegroups.com.

Dan Bader

unread,
Nov 7, 2019, 8:31:38 AM11/7/19
to TLP Apache Cassandra Reaper users
If we could have 2 or more repairs running on a node that would greatly help us.  besides the below is there another way to speed up repairs?

repairIntensity: 1
repairRunThreadCount: 100
repairManagerSchedulingIntervalSeconds: 0

we are also using a parallel repair with 4 threads.  

segmentCountPerNode:  128 which gives us about 70k and looks to take about 5 days. 

If we could get the days down to about 3 for a full repair that would be preferred.  


On Wednesday, November 6, 2019 at 9:45:32 AM UTC-5, Alexander Dejanovski wrote:
Not yet, but we're considering it for a near future.

-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting


On Wed, Nov 6, 2019 at 12:53 PM Dan Bader <bad...@gmail.com> wrote:
Is there a way to remove or change the 1 running repair per node limit?  We would like to run 2 or 4 of them per node right now.  

--
You received this message because you are subscribed to the Google Groups "TLP Apache Cassandra Reaper users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-reaper-users+unsubscribe@googlegroups.com.

Alexander Dejanovski

unread,
Nov 7, 2019, 9:28:36 AM11/7/19
to Dan Bader, TLP Apache Cassandra Reaper users
Hi Dan,

repairManagerSchedulingIntervalSeconds set to 0 could be fairly heavy if you use Cassandra as backend for Reaper, especially if you have a lot of segments. I'd set it to 10 at the minimum.
128 segments per node is fairly high already. If you're repairing 3.0/3.11 clusters, you should be able to further reduce this, despite your number of vnodes. Start as low as you can and let Reaper group token ranges that have the same replicas for you (You can try to set 1 segment per node and see what's the lowest Reaper will manage to come up with).

Also, if you're using Cassandra 2.2, raise the number of threads to 4 in the advanced settings of the repair, in order to have 4 token ranges processed concurrently (not necessary with 3.0/3.11, Cassandra processes multiple token ranges in a single repair session, which is faster).

Cheers,


-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting

On Thu, Nov 7, 2019 at 2:31 PM Dan Bader <bad...@gmail.com> wrote:
If we could have 2 or more repairs running on a node that would greatly help us.  besides the below is there another way to speed up repairs?

repairIntensity: 1
repairRunThreadCount: 100
repairManagerSchedulingIntervalSeconds: 0

we are also using a parallel repair with 4 threads.  

segmentCountPerNode:  128 which gives us about 70k and looks to take about 5 days. 

If we could get the days down to about 3 for a full repair that would be preferred.  

On Wednesday, November 6, 2019 at 9:45:32 AM UTC-5, Alexander Dejanovski wrote:
Not yet, but we're considering it for a near future.

-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting


On Wed, Nov 6, 2019 at 12:53 PM Dan Bader <bad...@gmail.com> wrote:
Is there a way to remove or change the 1 running repair per node limit?  We would like to run 2 or 4 of them per node right now.  

--
You received this message because you are subscribed to the Google Groups "TLP Apache Cassandra Reaper users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "TLP Apache Cassandra Reaper users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlp-apache-cassandra-reaper-users/938aca5e-6905-4fca-b372-5a82957f22cb%40googlegroups.com.

Dan Bader

unread,
Nov 7, 2019, 9:58:44 AM11/7/19
to TLP Apache Cassandra Reaper users
our vnodes are 32 per host with 480 hosts.  what segment size would you use?  we are using a newer 3.11 cluster.  

repairs are taking about 5 minutes to complete with 128 segments per node right now.
To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-reaper-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "TLP Apache Cassandra Reaper users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-reaper-users+unsubscribe@googlegroups.com.

Alexander Dejanovski

unread,
Nov 7, 2019, 10:03:40 AM11/7/19
to Dan Bader, TLP Apache Cassandra Reaper users
Well, it's going to be a matter of tradeoff between speed and overstreaming (which will create compaction pressure).
I'd go down to the number of vnodes per node as segment count per node and see how that goes (32 segments per node here, but you'll probably end up with more segments than that given the size of the cluster). It also depends on your node density and the other repair settings (especially parallelism, which I'd set to DC_AWARE in order to relieve pressure on the cluster, but you can put more pressure and be faster by setting it to PARALLEL).

5 minutes is the time it takes per segment, right? You still have some margin to make that larger as the default timeout in Reaper is 30 minutes.


-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting

To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "TLP Apache Cassandra Reaper users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "TLP Apache Cassandra Reaper users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlp-apache-cassandra-reaper-users/889d0ccb-f589-46c8-a0c8-1ba28de002ec%40googlegroups.com.

Reid Pinchback

unread,
Nov 7, 2019, 10:16:02 AM11/7/19
to Dan Bader, TLP Apache Cassandra Reaper users

Dan, if it helps, we found that the 4 parallel repairs seemed counterproductive. The TLP folks can speak better to the details, but from what I could see in the source, the way of telling if a node is busy with a repair is pretty simplistic right now. It was better to not just load up on the reasons for skipping segments because a node was deemed in repair.  The real problem we could see was that the time to complete the repair on any given segment was what we had to fight.

 

The biggest obstacle was really the memory load of the merkle tree height.  Going to a parallelism of 1 and applying a backpatch from 4.0 to reduce the memory footprint of the merkle trees, with intensity 1, 64 segments per node (4 vnodes) worked out well. The other thing was to be careful to not have hangingRepairTimeoutMins be set too aggressively low, or the cluster burned time in repairs we just give up on, and did again later.  Once we had things tuned to the point that the number of attempts per segment was almost always 1, we in pretty good shape.  Initially repairs were taking about 9 to 10 days, once all of the above was done, we were down to about 2.  We’re on 3.11.4 so your situation could have similarities.

 

R

 

 

From: <tlp-apache-cassa...@googlegroups.com> on behalf of Dan Bader <bad...@gmail.com>
Date: Thursday, November 7, 2019 at 9:58 AM
To: TLP Apache Cassandra Reaper users <tlp-apache-cassa...@googlegroups.com>
Subject: Re: Remove 1 Repair per node limit

 

Message from External Sender

To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "TLP Apache Cassandra Reaper users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "TLP Apache Cassandra Reaper users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlp-apache-cassandra-reaper-users/889d0ccb-f589-46c8-a0c8-1ba28de002ec%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages