Cleaning up the Stargate cassandra node after a pod is restarted

19 views
Skip to first unread message

CameronG

unread,
Feb 9, 2021, 11:19:33 PMFeb 9
to Stargate Mailing List
Hi All,

    During testing of stargate in K8 I was often restarting the pods manually. I noticed after the pod was restarted that it would often not connect. Calling

> nodetool status

would show the old node as down in cassandra. When the new pod starts up if it gets the same ip address you could see the error "A node with address /x.x.x.x already exists, cancelling join. Use cassandra.replace_address if you want to replace this node."    To resolve this issue I was calling "nodetool assassinate" to get rid of the dead cassandra node.

  1. Is there a more elegant solution in K8 to automatically cleanup the dead cassandra nodes e.g. call something to remove the node from the cluster before it is shutdown?
  2. If I have to manually cleanup the node is there a better way then the assassinate command? (I assume that as the stargate node is not supposed to hold any data then this option should be ok)
    Thanks,

    Cameron

Dmitri Bourlatchkov

unread,
Feb 10, 2021, 12:10:22 PMFeb 10
to CameronG, Stargate Mailing List
(including the mailing list)

Hi Cameron,

We have https://github.com/stargate/stargate/issues/587 to track improvements in that area.

At this moment, the easiest option may be to delay the Stargate node restart about 30-40 sec. Then, storage nodes will automatically remove data about the old node from their local gossip state and the restarted process should be able to connect as a "fresh" node. During this process you will notice messages about "Fat" clients in the storage nodes' logs - that Fat Client is the Stargate node.

Also, the latest Stargate version should exit with an error code if it hits the IP address collision problem, so if k8s auto-restarts failed pods SG should be able to reconnect after a few restarts when that 30 sec timeout happens on storage nodes.

Cheers,
Dmitri.

--
You received this message because you are subscribed to the Google Groups "Stargate Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stargate-user...@lists.stargate.io.


--
Dmitri Bourlatchkov

CameronG

unread,
Feb 10, 2021, 3:24:26 PMFeb 10
to Stargate Mailing List, dmitri.bo...@datastax.com, Stargate Mailing List, CameronG
Hi Dmitri,

    Thanks for the information.

    Is my understanding of how this would work correct?
  1. Cassandra running 3 nodes. nodetool status shows the 3 cassandra nodes
  2. SG pod starts up and joins the cluster. nodetool status will show 4 nodes
  3. To simulate SG pod going down I manually stop SG pod e.g. set replicas to 0. nodetool status shows 4 nodes with the SG node DN
  4. Wait 30-40 seconds
  5. nodetool status should now show only the 3 cassandra nodes i.e. SG node removed from list of nodes
    Regards,

    Cameron 

CameronG

unread,
Feb 10, 2021, 4:14:56 PMFeb 10
to Stargate Mailing List, CameronG, dmitri.bo...@datastax.com, Stargate Mailing List
Hi Dmitri,

    I tested the steps above. After stopping the SG pod I see the following in the cassandra logs (where 192.168.230.101 is the SG node):

INFO [HANDSHAKE-cassandra-0/192.168.230.98] 2021-02-10 20:57:56,719 OutboundTcpConnection.java:561 - Handshaking version with cassandra-0/192.168.230.98
INFO [GossipStage:1] 2021-02-10 20:58:45,329 Gossiper.java:1120 - InetAddress /192.168.230.101 is now DOWN
INFO [HANDSHAKE-/192.168.230.101] 2021-02-10 20:58:45,784 OutboundTcpConnection.java:561 - Handshaking version with /192.168.230.101

After 15mins I haven't seen any messages about the FatClient being removed from gossip. Could this be a setting in cassandra that needs to be enabled? 

Regards,

Cameron 

Dmitri Bourlatchkov

unread,
Feb 10, 2021, 4:26:03 PMFeb 10
to CameronG, Stargate Mailing List
Hi Cameron,

> I tested the steps above. After stopping the SG pod I see the following in the cassandra logs (where 192.168.230.101 is the SG node):

> INFO [HANDSHAKE-cassandra-0/192.168.230.98] 2021-02-10 20:57:56,719 OutboundTcpConnection.java:561 - Handshaking version with cassandra-0/192.168.230.98
> INFO [GossipStage:1] 2021-02-10 20:58:45,329 Gossiper.java:1120 - InetAddress /192.168.230.101 is now DOWN
> INFO [HANDSHAKE-/192.168.230.101] 2021-02-10 20:58:45,784 OutboundTcpConnection.java:561 - Handshaking version with /192.168.230.101

This looks a bit strange. The "Handshaking version with /192.168.230.101" message hints that the Stargate node might have come back up... is that a possibility?

Cheers,
Dmitri.

Dmitri Bourlatchkov

unread,
Feb 10, 2021, 4:33:10 PMFeb 10
to CameronG, Stargate Mailing List
Hi Cameron,

On Wed, Feb 10, 2021 at 3:24 PM CameronG <cameron...@elafent.com> wrote:
    Is my understanding of how this would work correct?
  1. Cassandra running 3 nodes. nodetool status shows the 3 cassandra nodes
  2. SG pod starts up and joins the cluster. nodetool status will show 4 nodes
  3. To simulate SG pod going down I manually stop SG pod e.g. set replicas to 0. nodetool status shows 4 nodes with the SG node DN
  4. Wait 30-40 seconds
  5. nodetool status should now show only the 3 cassandra nodes i.e. SG node removed from list of nodes

Stargate nodes being "fat clients" do not normally show up in "nodetool status", i.e. they are not part of the "ring",

You can use "nodetool describecluster" to check whether SG nodes are connected.

Cheers,
Dmitri.

CameronG

unread,
Feb 10, 2021, 5:17:49 PMFeb 10
to Stargate Mailing List, dmitri.bo...@datastax.com, Stargate Mailing List, CameronG
Hi Dmitri,

    I did think that was strange as well. I re-ran the test again. After SG was connected I set replicas to 0 for the SG Deployment. The cassandra logs printed out

INFO [GossipStage:1] 2021-02-10 22:05:52,788 Gossiper.java:1120 - InetAddress /192.168.230.79 is now DOWN
INFO [HANDSHAKE-/192.168.230.79] 2021-02-10 22:05:53,145 OutboundTcpConnection.java:561 - Handshaking version with /192.168.230.79

   There are no SG pods running

   I've found that the only way I can get SG to successfully connect again after an SG pod restart is:
  1. Set SG deployment to 0 replicas to close down all pods
  2. Remove the cassandra node using assassinate
  3. Start up SG pods again
    If the SG pod is running but can't connect due to the cassandra node being DN, if I call assassinate, then I get one of the two errors depending on whether the pod has restarted on the same ip address or a different ip address
    
    Different IP: Caused by: java.lang.RuntimeException: A node required to move the data consistently is down (/192.168.230.101). If you wish to move the data from a potentially inconsistent replica, restart the node with -Dcassandra.consistent.rangemovement=false

    or 

    Same IP: Caused by: java.lang.RuntimeException: A node with address /192.168.230.108 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.

    Regards,

    Cameron

Dmitri Bourlatchkov

unread,
Feb 10, 2021, 5:49:26 PMFeb 10
to CameronG, Stargate Mailing List
Hi Cameron,

Sorry, I'm a bit lost at this point.

On Wed, Feb 10, 2021 at 5:17 PM CameronG <cameron...@elafent.com> wrote:
    I did think that was strange as well. I re-ran the test again. After SG was connected I set replicas to 0 for the SG Deployment. The cassandra logs printed out

INFO [GossipStage:1] 2021-02-10 22:05:52,788 Gossiper.java:1120 - InetAddress /192.168.230.79 is now DOWN
INFO [HANDSHAKE-/192.168.230.79] 2021-02-10 22:05:53,145 OutboundTcpConnection.java:561 - Handshaking version with /192.168.230.79

   There are no SG pods running

   I've found that the only way I can get SG to successfully connect again after an SG pod restart is:
  1. Set SG deployment to 0 replicas to close down all pods
  2. Remove the cassandra node using assassinate
  3. Start up SG pods again

I believe the "assassinate" should not be required at all.

    If the SG pod is running but can't connect due to the cassandra node being DN, if I call assassinate, then I get one of the two errors depending on whether the pod has restarted on the same ip address or a different ip address
    
    Different IP: Caused by: java.lang.RuntimeException: A node required to move the data consistently is down (/192.168.230.101). If you wish to move the data from a potentially inconsistent replica, restart the node with -Dcassandra.consistent.rangemovement=false

Stargate nodes do not store data, they act only as coordinators. Why would SG restarting under a different IP cause a data move error?.. I've never seen this error with Stargate :)

    Same IP: Caused by: java.lang.RuntimeException: A node with address /192.168.230.108 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.

In this situation SG should eventually be able to reconnect (after the 30 sec timeout). If it is restarted automatically a few restarts may be required, if manually, it is generally easier to wait.

Could you share your "nodetool status" and "nodetool describecluster" outputs when SG is running and when it is down?

Thanks,
Dmitri.



CameronG

unread,
Feb 10, 2021, 6:09:17 PMFeb 10
to Stargate Mailing List, dmitri.bo...@datastax.com, Stargate Mailing List, CameronG
Hi Dmitri,

    Here you go. I have taken the information at 3 stages:
  1. When SG is running and successfully connected to cassandra
  2. After SG Deployment is stoped i.e. replicas 0
  3. After starting SG Deployment up again after stopping it. The pod has a different ip address. This fails and continuously restarts. 
    Regards,

    Cameron


SG Running
----------

I have no name!@cassandra-0:/$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens       Owns (effective)  Host ID                               Rack
UN  192.168.108.14   232.56 KiB  256          25.6%             568cc89e-a051-4f9c-83d8-342eb46526cf  rack1
UN  192.168.230.82   133.81 KiB  256          25.1%             c0ae4ac9-e359-4255-8d00-08628bd93884  rack1
UN  192.168.230.98   316.92 KiB  256          25.9%             b482ab94-4dd2-4553-9b65-bb405b9a2521  rack1
UN  192.168.127.215  239.67 KiB  256          23.3%             72a835c5-275c-4973-addb-27efd159a19f  rack1


I have no name!@cassandra-0:/$ nodetool describecluster
Cluster Information:
        Name: cassandra
        Snitch: org.apache.cassandra.locator.SimpleSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
                99eb09ab-4954-37db-9a9b-da222b57f29e: [192.168.108.14, 192.168.230.82, 192.168.230.98, 192.168.127.215]




SG Deployment Shutdown (Replicas 0)
-----------------------------------

I have no name!@cassandra-0:/$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens       Owns (effective)  Host ID                               Rack
UN  192.168.108.14   297.17 KiB  256          25.6%             568cc89e-a051-4f9c-83d8-342eb46526cf  rack1
DN  192.168.230.82   133.81 KiB  256          25.1%             c0ae4ac9-e359-4255-8d00-08628bd93884  rack1
UN  192.168.230.98   372.24 KiB  256          25.9%             b482ab94-4dd2-4553-9b65-bb405b9a2521  rack1
UN  192.168.127.215  239.67 KiB  256          23.3%             72a835c5-275c-4973-addb-27efd159a19f  rack1

I have no name!@cassandra-0:/$ nodetool describecluster
Cluster Information:
        Name: cassandra
        Snitch: org.apache.cassandra.locator.SimpleSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
                99eb09ab-4954-37db-9a9b-da222b57f29e: [192.168.108.14, 192.168.230.98, 192.168.127.215]

                UNREACHABLE: [192.168.230.82]


SG Deployment Restarted with 1 replica
--------------------------------------

Note: this fails to startup with the following error:

Caused by: java.lang.RuntimeException: A node required to move the data consistently is down (/192.168.230.82). If you wish to move the data from a potentially inconsistent replica, restart the node with -Dcassandra.consistent.rangemovement=false


I have no name!@cassandra-0:/$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens       Owns (effective)  Host ID                               Rack
UN  192.168.108.14   287.56 KiB  256          25.6%             568cc89e-a051-4f9c-83d8-342eb46526cf  rack1
DN  192.168.230.82   133.81 KiB  256          25.1%             c0ae4ac9-e359-4255-8d00-08628bd93884  rack1
UN  192.168.230.98   372.24 KiB  256          25.9%             b482ab94-4dd2-4553-9b65-bb405b9a2521  rack1
UJ  192.168.230.103  15.4 KiB   256          ?                 758e62fb-efe2-42ac-a402-67e5087abf2d  rack1
UN  192.168.127.215  294.76 KiB  256          23.3%             72a835c5-275c-4973-addb-27efd159a19f  rack1

I have no name!@cassandra-0:/$ nodetool describecluster
Cluster Information:
        Name: cassandra
        Snitch: org.apache.cassandra.locator.SimpleSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
                99eb09ab-4954-37db-9a9b-da222b57f29e: [192.168.108.14, 192.168.230.98, 192.168.127.215, 192.168.230.103]

                UNREACHABLE: [192.168.230.82]

CameronG

unread,
Feb 10, 2021, 6:12:41 PMFeb 10
to Stargate Mailing List, CameronG, dmitri.bo...@datastax.com, Stargate Mailing List
Hi Dmitri,

 I forgot to mention in the previous message, I've seen this issue before when I was doing some simple testing in a local minikube environment running the bitnami/cassandra helm charts and stargate. If it helps I can confirm this setup again and send you the details if you want to reproduce it.

    Regards,

    Cameron

Dmitri Bourlatchkov

unread,
Feb 11, 2021, 10:35:10 AMFeb 11
to CameronG, Stargate Mailing List
Hi Cameron,

I'm hesitant to say this, but it looks like a configuration mistake to me. If I understand correctly 192.168.230.82 is the address of a Stargate node, right? Well, it appears to join the C* ring and claim ownership over some data ranges (according to the nodetool status output below). This should not be the case. Stargate nodes should not own any data, normally.

Please double check your setup. Is it a possibility that your Stargate runs in "development" mode (thus actually acting as storage node)?

Thanks,
Dmitri.

--
You received this message because you are subscribed to the Google Groups "Stargate Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stargate-user...@lists.stargate.io.

CameronG

unread,
Feb 11, 2021, 5:08:18 PMFeb 11
to Stargate Mailing List, dmitri.bo...@datastax.com, Stargate Mailing List, CameronG
Hi Dmitri,

    I think that was it!

    In my deployment.yaml I had the following:

        - env:
            - name: DEVELOPER_MODE
              value: "false"

    Based on the documentation here: https://stargate.io/docs/stargate/1.0/developers-guide/install/starctl.html#_starctl_options developer mode should only be enabled if the value is 'true'. Anyway, I removed the entry completely from the yaml and tried the same test. I got this in the cassandra logs this time:

INFO [GossipStage:1] 2021-02-11 21:12:45,312 Gossiper.java:1120 - InetAddress /192.168.230.109 is now DOWN
INFO [GossipTasks:1] 2021-02-11 21:12:58,368 Gossiper.java:894 - FatClient /192.168.230.109 has been silent for 30000ms, removing from gossip

    It also looks like after a restart of the pod that it it re-connecting correctly this time.

    Thanks very much for your help with this issue.

    Is it worth me posting an issue regarding the DEVELOPER_MODE variable? It looks like it's getting enabled for values other than true?

    Regards,

    Cameron  

Dmitri Bourlatchkov

unread,
Feb 11, 2021, 5:26:44 PMFeb 11
to CameronG, Stargate Mailing List
Hi Cameron,

>  Based on the documentation here: https://stargate.io/docs/stargate/1.0/developers-guide/install/starctl.html#_starctl_options developer mode should only be enabled if the value is 'true'. [...]
>    Is it worth me posting an issue regarding the DEVELOPER_MODE variable? It looks like it's getting enabled for values other than true?

Yes, please open an issue about this.

Thanks,
Dmitri.
Reply all
Reply to author
Forward
0 new messages