Questions on sidecar mode

47 views
Skip to first unread message

Dan Sarisky

unread,
Feb 21, 2020, 9:57:26 AM2/21/20
to TLP Apache Cassandra Reaper users
I have a series of Cassandra clusters where external JMX access is not allowed.  Sidecar mode looks like a great fit.  However:

Question 1:  If a single node is down, no segments will get processed.  The job is just waiting with "Postponed repair segment 6aa8c238-4f64-11ea-a302-db24c380e2af on repair run with id 6aa23280-4f64-11ea-a302-db24c380e2af because we couldn't get hosts metrics on 13.1.231.10".  Is that just a current limitation - all nodes have to be up for any of them to make progress in repair?  This is on a 2-datacenter 18-node Cassandra 3.11.3 grid.

Question 2:  Even on a small grid (1-datacenter 3-node Cassandra 3.11.3 grid) I see stalls where reaper has segments to process, but it does not launch any for many hours on end.  No validation compactions or repair processes were running in cassandra, it like Reaper wasn't even examining the node to try and launch segments.  After letting it sit for 24 hours in that state, I speculatively restarted the reaper process on each node and then it immediately started chewing thru segments again.  Are there any known issues in sidecar mode where it gets stuck?

Reid Pinchback

unread,
Feb 21, 2020, 10:48:56 AM2/21/20
to Dan Sarisky, TLP Apache Cassandra Reaper users

For question 1, did progress just freeze at that point and not attempt any other segments?  Or is it just that you saw a lot of these messages for different segments but complaining about the same host being down?  The question wording suggests the former, my expectation would be that you would see the latter.

 

R

 

 

From: <tlp-apache-cassa...@googlegroups.com> on behalf of Dan Sarisky <dsar...@gmail.com>
Date: Friday, February 21, 2020 at 9:57 AM
To: TLP Apache Cassandra Reaper users <tlp-apache-cassa...@googlegroups.com>
Subject: Questions on sidecar mode

 

Message from External Sender

I have a series of Cassandra clusters where external JMX access is not allowed.  Sidecar mode looks like a great fit.  However:

 

Question 1:  If a single node is down, no segments will get processed.  The job is just waiting with "Postponed repair segment 6aa8c238-4f64-11ea-a302-db24c380e2af on repair run with id 6aa23280-4f64-11ea-a302-db24c380e2af because we couldn't get hosts metrics on 13.1.231.10".  Is that just a current limitation - all nodes have to be up for any of them to make progress in repair?  This is on a 2-datacenter 18-node Cassandra 3.11.3 grid.

 

Question 2:  Even on a small grid (1-datacenter 3-node Cassandra 3.11.3 grid) I see stalls where reaper has segments to process, but it does not launch any for many hours on end.  No validation compactions or repair processes were running in cassandra, it like Reaper wasn't even examining the node to try and launch segments.  After letting it sit for 24 hours in that state, I speculatively restarted the reaper process on each node and then it immediately started chewing thru segments again.  Are there any known issues in sidecar mode where it gets stuck?

--
You received this message because you are subscribed to the Google Groups "TLP Apache Cassandra Reaper users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlp-apache-cassandra-reaper-users/90e2f6b8-19d7-49c5-aea8-03a9f864f123%40googlegroups.com.

Sarisky, Dan

unread,
Feb 21, 2020, 10:54:23 AM2/21/20
to Reid Pinchback, TLP Apache Cassandra Reaper users
"did progress just freeze at that point and not attempt any other segments?"
That's what I see.
I expect segments on the down node will fail or otherwise have a bad behavior.
I did not expect a single down node to essentially stop all progress to where no new segments are launched.




On 2/21/2020 10:48 AM, Reid Pinchback wrote:
> For question 1, did progress just freeze at that point and not attempt any other segments?  Or is it just that you saw a lot of these messages for different segments but
> complaining about the same host being down?  The question wording suggests the former, my expectation would be that you would see the latter.
>
> R
>
> *From: *<tlp-apache-cassa...@googlegroups.com> on behalf of Dan Sarisky <dsar...@gmail.com>
> *Date: *Friday, February 21, 2020 at 9:57 AM
> *To: *TLP Apache Cassandra Reaper users <tlp-apache-cassa...@googlegroups.com>
> *Subject: *Questions on sidecar mode
>
> *Message from External Sender*
>
> I have a series of Cassandra clusters where external JMX access is not allowed.  Sidecar mode looks like a great fit.  However:
>
> Question 1:  If a single node is down, no segments will get processed.  The job is just waiting with "Postponed repair segment 6aa8c238-4f64-11ea-a302-db24c380e2af on repair run
> with id 6aa23280-4f64-11ea-a302-db24c380e2af because we couldn't get hosts metrics on 13.1.231.10".  Is that just a current limitation - all nodes have to be up for any of them to
> make progress in repair?  This is on a 2-datacenter 18-node Cassandra 3.11.3 grid.
>
> Question 2:  Even on a small grid (1-datacenter 3-node Cassandra 3.11.3 grid) I see stalls where reaper has segments to process, but it does not launch any for many hours on end.
> No validation compactions or repair processes were running in cassandra, it like Reaper wasn't even examining the node to try and launch segments.  After letting it sit for 24
> hours in that state, I speculatively restarted the reaper process on each node and then it immediately started chewing thru segments again. Are there any known issues in sidecar
> mode where it gets stuck?
>
> --
> You received this message because you are subscribed to the Google Groups "TLP Apache Cassandra Reaper users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tlp-apache-cassandra-r...@googlegroups.com
> <mailto:tlp-apache-cassandra-r...@googlegroups.com>.
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_msgid_tlp-2Dapache-2Dcassandra-2Dreaper-2Dusers_90e2f6b8-2D19d7-2D49c5-2Daea8-2D03a9f864f123-2540googlegroups.com-3Futm-5Fmedium-3Demail-26utm-5Fsource-3Dfooter&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=iFTFTm1TkbnFFCifWqOALgqEoMcdpFxCSzVEuXC7hIE&s=cW6ck8tXAy-TxG8w-vRTaN3_tJtl8HlIyIpYtA58V4I&e=>.
>
Reply all
Reply to author
Forward
0 new messages