Hi,
>> My understanding is, your secondary worker has gone bad, and you want to replace it with another worker, is my understanding correct?
This is correct. "Secondary" is a special case, it can be a primary too. The overall idea - inform coordinator (pg_dist_node) to redirect connections to a healthy worker in case of issues with the current worker.
>> If so, the above replacement you are trying to do, is that something quick and temporary or you want to keep this replacement for a while?
This is a temporary option mostly. Several minutes or hours (may be even days). It depends how fast we can return a broken worker back to the cluster.
There is also an option when it can be used for a longer time. For example, let's say we have a single worker group (primary-secondary-secondary).
nodeid | groupid | nodename | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster
--------+---------+--------------+----------+----------+-------------+----------+-----------+--------------
1 | 0 | 10.22.14.80 | 5454 | default | t | t | primary | default
3 | 12 | 10.22.14.92 | 5454 | default | t | t | primary | default
5 | 12 | 10.22.114.92 | 5454 | default | f | t | secondary | read_only_bo
9 | 12 | 10.22.314.92 | 5454 | default | f | t | secondary | adhoc_sql_db
If we have environment with a huge load I want to point "adhoc_sql_db" and "read_only_bo" to a dedicated worker ("10.22.114.92" and "10.22.314.92" correspondingly).
For a tiny ENV I want point "adhoc_sql_db" and "read_only_bo" to single worker (say "10.22.114.92") .
"10.22.314.92" will not even be deployed.
BUT!!! Primary case, of course, is a failover of one of the workers.
P.s.
I perfectly understand why U don't want to allow several "primary" workers to be on the same host/port, but as U can see there are a lot of cases when several "secondary"(ies) can use the same worker.
Also "primary" and "secondary"(ies) can share same host/port.
Is it hard to change existing constraint and add unique "nodename + nodeport" WHERE noderole = 'primary' ?