data updation during failure of Worker Node

46 views

Skip to first unread message

Rishabh Chaudhary

unread,

Oct 12, 2015, 8:18:38 AM10/12/15

to pg_shard users

Hi,

I have set up postgresql and pg_shard.

I have one master and two workers. Total 16 shards and 2 replicas. same table as here: https://www.citusdata.com/citus-products/pg-shard/pg-shard-quick-start-guide

I do three insertions as mentioned in the guide and everything is fine until now.

I stop worker 1.

I update one of the entries.

I start worker 1.

I stop worker 2.

i query through master: "select * from customer_reviews;"

I get this:

WARNING: Bad result from ss-postgres-slave2:5432

DETAIL: Remote message: terminating connection due to administrator command

WARNING: Connection failed to ss-postgres-slave2:5432

DETAIL: Remote message: could not connect to server: Connection refused

Is the server running on host "ss-postgres-slave2" (172.19.10.91) and accepting

TCP/IP connections on port 5432?

WARNING: Connection failed to ss-postgres-slave2:5432

DETAIL: Remote message: could not connect to server: Connection refused

Is the server running on host "ss-postgres-slave2" (172.19.10.91) and accepting

TCP/IP connections on port 5432?

WARNING: Connection failed to ss-postgres-slave2:5432

DETAIL: Remote message: could not connect to server: Connection refused

Is the server running on host "ss-postgres-slave2" (172.19.10.91) and accepting

TCP/IP connections on port 5432?

WARNING: Connection failed to ss-postgres-slave2:5432

DETAIL: Remote message: could not connect to server: Connection refused

Is the server running on host "ss-postgres-slave2" (172.19.10.91) and accepting

TCP/IP connections on port 5432?

WARNING: Connection failed to ss-postgres-slave2:5432

DETAIL: Remote message: could not connect to server: Connection refused

Is the server running on host "ss-postgres-slave2" (172.19.10.91) and accepting

TCP/IP connections on port 5432?

WARNING: Connection failed to ss-postgres-slave2:5432

DETAIL: Remote message: could not connect to server: Connection refused

Is the server running on host "ss-postgres-slave2" (172.19.10.91) and accepting

TCP/IP connections on port 5432?

here ss-postgres-slave2 is my worker 2.

Isnt the update made during the time when worker 1 was off auto synced to worker 1 when it is switched on again?

Its good that worker 1 doesnt show bad(un-updated) data, but when it is up again, how to make sure the data is auto synced?

Onder Kalaci

unread,

Oct 13, 2015, 3:28:37 AM10/13/15

to pg_shard users

Hi Rishabh,

Thanks for reaching out. Let me frame your question with some more abstraction:

Time 0:

Both worker1 and worker2 are up.

Time 1:

worker1 is down.

worker2 is up.

An update is executed. Since worker2 is up, the data in the shard_placement on worker2 is updated.

The shard_placement on worker1 is not updated, and the corresponding shard_placement row in the metadata is marked as invalid not to accept any queries when its up again. At this step, you can check status of each shard_placement from the metadata. You'll see that pg_shard marks the shard placement on worker-1 as invalid, where the operation is supposed to be executed.

Time 2:

Start worker1.

Stop worker2.

Issue any query. Since one of the shard_placement on worker1 marked as invalid and the replica of it on worker2 is unavailable, the query fails. In other words, pg_shard decides that there is no valid copy of some part of the data and errors out.

For instance, if your query is "SELECT * from customer_reviews WHERE customer_id=x"; where x is an customer_id whose value is hashed to healthy shard, you could be able to get query results.

Answer to your question on sync

pg_shard does not do auto-sync, instead it provides a UDF which can be called to repair an inactive shard placement using data from a healthy placement. Note that you can run the UDF when both workers are up.

Onder,

Thanks

Rishabh Chaudhary

unread,

Oct 13, 2015, 4:39:04 AM10/13/15

to pg_shard users

Hi Onder,

Thanks. That was quite helpful.

However, the repairing of shards doesnt seem to be happening for me. I get this:

WARNING: Bad result from ss-postgres-slave1:5432

DETAIL: Remote message: function worker_copy_shard_placement(unknown, unknown, integer) does not exist

ERROR: could not copy shard data

HINT: Consult recent messages in the server logs for details.

here ss-postgres-slave1 is my faulty node.

Onder Kalaci

unread,

Oct 13, 2015, 4:43:42 AM10/13/15

to pg_shard users

Hi Rishabh,

As noted in the documentation, in order for this function to operate, pg_shard must be installed on all worker nodes and not just the master node. Have you created pg_shard on all workers?

Onder,

Thanks

Rishabh Chaudhary

unread,

Oct 13, 2015, 4:55:52 AM10/13/15

to pg_shard users

Hi,

Yes I did. and then i restarted the postgres service. Do i need to create the extension from inside the worker node? as in: "create extension pg_shard;"?

Onder Kalaci

unread,

Oct 13, 2015, 5:01:09 AM10/13/15

to pg_shard users

Hi,

You need to do the following two on all workers to successfully create pg_shard:

First, you will need to add pg_shard to shared_preload_libraries in your postgresql.conf for each worker:

shared_preload_libraries = 'pg_shard' # (change requires restart)

Connect to each worker via psql and run:

CREATE EXTENSION pg_shard;

Thanks

Rishabh Chaudhary

unread,

Oct 13, 2015, 5:09:07 AM10/13/15

to pg_shard users

Hi,

Thanks a lot!

I missed out on the last step.

That was a great help. :)

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu