Failed to update node with citus_update_node

130 views
Skip to first unread message

Timur Luchkin

unread,
Mar 7, 2023, 8:36:04 AM3/7/23
to citus-users
Hi,

I have next cluster:
mydb=# SELECT * FROM pg_dist_node ORDER BY nodecluster, noderole;
 nodeid | groupid |   nodename   | nodeport | noderack | hasmetadata | isactive | noderole  | nodecluster  | metadatasynced | shouldhaveshards
--------+---------+--------------+----------+----------+-------------+----------+-----------+--------------+----------------+------------------
      1 |       0 | 10.22.14.80  |     5454 | default  | t           | t        | primary   | default      | t              | f

      2 |      11 | 10.22.14.90  |     5454 | default  | t           | t        | primary   | default      | t              | t
      3 |      12 | 10.22.14.92  |     5454 | default  | t           | t        | primary   | default      | t              | t

      4 |      11 | 10.22.214.90 |     5454 | default  | f           | t        | secondary | read_only_bo | f              | t
      5 |      12 | 10.22.114.92 |     5454 | default  | f           | t        | secondary | read_only_bo | f              | t
(5 rows)


I want replace failed "secondary" worker (10.22.114.92) with another worker (10.22.14.92), which is currently also in "primary" group. 
Unfortunately, I'm getting error:
mydb=# SELECT 1 FROM citus_update_node(5, '10.22.14.92', 5454);
ERROR:  there is already another node with the specified hostname and port


I believe this is a wrong behavior. I'm trying to switch "read_only_bo" cluster with "secondary" role, so it shouldn't interfere with "primary, default". 

The same error for switching master to backup node:
SELECT 1 FROM citus_update_node(3, '10.22.114.92', 5454);
So, it will be impossible to do a failover in case of worker failure. 

P.s.
Is it possible to fix it pls? I don't want to use multiple aliases from /etc/hosts to bypass it. 

Env:
Citus 11.2.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
PostgreSQL 14.7 (Ubuntu 14.7-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit

Timur Luchkin

unread,
Mar 15, 2023, 8:17:33 AM3/15/23
to citus-users
Anyone  please?

Teja M

unread,
Mar 15, 2023, 8:02:40 PM3/15/23
to citus-users
Let me take a look and we will get back to you.

Thanks
Teja

Teja M

unread,
Mar 16, 2023, 7:08:31 PM3/16/23
to citus-users
Hi,

In the code we have an UNIQUE constraint on (hostname, port), so, the same node can't be registered as the primary and secondary at the same time.  Trying to get more details to see if we can work around this.

At the end, you want both 3 and 5 nodeid point to the same node(VM) or primary and secondary point to the same IP and port? If the scenario is former, we can start the secondary server with a different port on the same IP. Can you please provide more specifics.

Regards
Teja

Timur Luchkin

unread,
Mar 21, 2023, 6:24:59 AM3/21/23
to citus-users
Hi.

Both yes and no.
yes, because 3 and 5 nodeid will point to the same IP/port for some amount of time (this is a typical failover solution), while original server is unavailable due to failover or maintenance.
no, because I can't start another instance on the same highly loaded host.
Finally, I need to point two (or more) clusters to the same host/ip, because original node may be unavailable during some time.

Example: let's assume we have a single Citus WORKER with primary DB (0.0.0.1), and two replicas (0.0.0.2 and 0.0.0.3).
And I have 3 clusters configured in pg_dist_node:
default (primary) points to 0.0.0.1,
read_only_bo (secondary) points to 0.0.0.2,
adhoc_sql_db (secondary) points to 0.0.0.3.

Lets say there was a disaster and both 0.0.0.2 and 0.0.0.3 are destroyed.
I need temporary point "read_only_bo" and "adhoc_sql_db" to 0.0.0.1 (until 0.0.0.2 and 0.0.0.3 will be recovered).

Teja M

unread,
Mar 21, 2023, 1:29:16 PM3/21/23
to citus-users
The same node can't be registered as the primary and secondary at the same time, instead we can do.

//If a node failed (assuming primary node-3 here)

BEGIN;

SELECT citus_update_node(3, '10.22.14.92-demoted', 5454); << Temporarily move primary to a new IP

SELECT citus_update_node(5, '10.22.14.92', 5454); << Point secondary to the primary IP

SELECT citus_update_node(3, '10.22.114.92', 5454); 

COMMIT;

Regards
Teja

Timur Luchkin

unread,
Mar 22, 2023, 1:10:26 AM3/22/23
to citus-users
Unfortunately I can't do this .  Or rather, I don't need to. In your example, you want me to do a switch between primary and secondary IPs, but that's not what I need. 
Imagine  that '10.22.114.92' is not available anymore, but I need continue to serve clients, connected to it. That's why I want point them to 10.22.14.92. 

>> The same node can't be registered as the primary and secondary at the same time
Can two or more secondary nodes be registered on the same ip/port?

Teja M

unread,
Mar 22, 2023, 8:46:48 PM3/22/23
to citus-users
Unfortunately, citus_update_node() doesn't distinguish between the types of nodes, if you have another node with the same ip-port, it errors out.

Regards
Teja

Teja M

unread,
Mar 22, 2023, 8:51:05 PM3/22/23
to citus-users
Going back to your original requirement.

>> I want replace failed "secondary" worker (10.22.114.92) with another worker (10.22.14.92), which is currently also in "primary" group. 

My understanding is, your secondary worker has gone bad, and you want to replace it with another worker, is my understanding correct? If so, the above replacement you are trying to do, is that something quick and temporary or you want to keep this replacement for a while?

Regards
Teja

Timur Luchkin

unread,
Mar 24, 2023, 4:42:41 AM3/24/23
to citus-users
Hi, 
>> My understanding is, your secondary worker has gone bad, and you want to replace it with another worker, is my understanding correct? 
This is correct. "Secondary" is a special case,  it can be a primary too. The overall idea - inform coordinator (pg_dist_node) to redirect connections to a healthy worker in case of issues with the current worker. 

>> If so, the above replacement you are trying to do, is that something quick and temporary or you want to keep this replacement for a while?
This is a temporary option mostly. Several minutes or hours (may be even days). It depends how fast we can return a broken worker back to the cluster. 

There is also an option when it can be used for a longer time. For example, let's say we have a single worker group (primary-secondary-secondary). 
 nodeid | groupid |   nodename   | nodeport | noderack | hasmetadata | isactive | noderole  | nodecluster
--------+---------+--------------+----------+----------+-------------+----------+-----------+--------------

      1 |       0 | 10.22.14.80  |     5454 | default  | t           | t        | primary   | default

      3 |      12 | 10.22.14.92  |     5454 | default  | t           | t        | primary   | default
      5 |      12 | 10.22.114.92 |     5454 | default  | f           | t        | secondary | read_only_bo
      9 |      12 | 10.22.314.92 |     5454 | default  | f           | t        | secondary | adhoc_sql_db

If we have environment with a huge load I want to point "adhoc_sql_db" and "read_only_bo" to a dedicated worker ("10.22.114.92" and "10.22.314.92" correspondingly). 
For a tiny ENV I want point "adhoc_sql_db" and "read_only_bo" to single worker (say "10.22.114.92") .  
"10.22.314.92" will not even be deployed. 

BUT!!! Primary case, of course, is a failover of one of the workers. 

P.s.
I perfectly understand why U don't want to allow several "primary" workers to be on the same host/port, but as U can see there are a lot of cases when several "secondary"(ies) can use the same worker. 
Also "primary" and "secondary"(ies) can share same host/port. 

Is it hard to change existing constraint and add unique "nodename + nodeport" WHERE noderole = 'primary' ?

Timur Luchkin

unread,
Apr 3, 2023, 8:30:03 AM4/3/23
to citus-users
Hi,

Can I get your decision please? Can it be fixed? 

Teja M

unread,
Apr 5, 2023, 2:36:54 PM4/5/23
to citus-users
>> Also "primary" and "secondary"(ies) can share same host/port. 

This statement doesn't sound correct. The unique constraint is there to prevent such things.

regards
Teja

Timur Luchkin

unread,
Apr 6, 2023, 9:32:26 AM4/6/23
to citus-users
>>  The unique constraint is there to prevent such things
But why? What does it protect? 'primary' and 'secondary' roles don't  interfere. 

Teja M

unread,
Apr 12, 2023, 1:20:14 PM4/12/23
to citus-users
  1. it is just not possible to have two Postgres instances running on the same host:port combination. Only the fist one that started listening will run, the second will fail with the error.
  2. If a user configures a Postgres instance to run as a primary and secondary at the same time, but this is also wrong. By definition the secondary node is a standby (or replica), running in read-only mode, and replays WAL stream from the real primary using physical replication.
  3. if the user wants to "register" the real primary as a fake secondary in the metadata in order to use it for read scaling. It could be a valid use-case, but at the same dangerous and not recommended to be used on practice. An excessive read traffic may cause performance degradation and impact the read-write workload, which is main responsibility of the primary. 
Regards
Teja

Timur Luchkin

unread,
Apr 14, 2023, 3:33:07 AM4/14/23
to citus-users
Hi,
I feel like we're going in circles...
Let's not teach each other the basics of the IT/DB industry (what a port and replication are).
I'm telling you, as a database expert, that this functionality is necessary to build a reliable and fault-tolerant cluster.
I want to control the "placement" of workers, it solves many problems of fault tolerance.
I would like your product to be a complete production ready scale-out solution.
Why is it so hard to change this restriction/constraint to UNIQUE pair "nodename + nodeport" WHERE noderole = 'primary' ?
What will it break?

Teja M

unread,
Apr 14, 2023, 1:27:42 PM4/14/23
to citus-users
Thanks, can you please open a new issue (add label as feature request) and expand on both the points.

functionality is necessary to build a reliable and fault-tolerant cluster.
I want to control the "placement" of workers, it solves many problems of fault tolerance
.

Timur Luchkin

unread,
Apr 17, 2023, 3:07:46 AM4/17/23
to citus-users
Hi,
Ok. 10x. Registered: url

Reply all
Reply to author
Forward
0 new messages