data updation during failure of Worker Node

46 views
Skip to first unread message

Rishabh Chaudhary

unread,
Oct 12, 2015, 8:18:38 AM10/12/15
to pg_shard users
Hi,

I have set up postgresql and pg_shard.

I have one master and two workers. Total 16 shards and 2 replicas. same table as here: https://www.citusdata.com/citus-products/pg-shard/pg-shard-quick-start-guide

I do three insertions as mentioned in the guide and everything is fine until now.

I stop worker 1.

I update one of the entries.

I start worker 1.
I stop worker 2.

i query through master: "select * from customer_reviews;"

I get this:

WARNING:  Bad result from ss-postgres-slave2:5432
DETAIL:  Remote message: terminating connection due to administrator command
WARNING:  Connection failed to ss-postgres-slave2:5432
DETAIL:  Remote message: could not connect to server: Connection refused
Is the server running on host "ss-postgres-slave2" (172.19.10.91) and accepting
TCP/IP connections on port 5432?
WARNING:  Connection failed to ss-postgres-slave2:5432
DETAIL:  Remote message: could not connect to server: Connection refused
Is the server running on host "ss-postgres-slave2" (172.19.10.91) and accepting
TCP/IP connections on port 5432?
WARNING:  Connection failed to ss-postgres-slave2:5432
DETAIL:  Remote message: could not connect to server: Connection refused
Is the server running on host "ss-postgres-slave2" (172.19.10.91) and accepting
TCP/IP connections on port 5432?
WARNING:  Connection failed to ss-postgres-slave2:5432
DETAIL:  Remote message: could not connect to server: Connection refused
Is the server running on host "ss-postgres-slave2" (172.19.10.91) and accepting
TCP/IP connections on port 5432?
WARNING:  Connection failed to ss-postgres-slave2:5432
DETAIL:  Remote message: could not connect to server: Connection refused
Is the server running on host "ss-postgres-slave2" (172.19.10.91) and accepting
TCP/IP connections on port 5432?
WARNING:  Connection failed to ss-postgres-slave2:5432
DETAIL:  Remote message: could not connect to server: Connection refused
Is the server running on host "ss-postgres-slave2" (172.19.10.91) and accepting
TCP/IP connections on port 5432?

here ss-postgres-slave2 is my worker 2.

Isnt the update made during the time when worker 1 was off auto synced to worker 1 when it is switched on again?
Its good that worker 1 doesnt show bad(un-updated) data, but when it is up again, how to make sure the data is auto synced?

Onder Kalaci

unread,
Oct 13, 2015, 3:28:37 AM10/13/15
to pg_shard users
Hi Rishabh, 

Thanks for reaching out. Let me frame your question with some more abstraction:

Time 0: 
     Both worker1 and worker2 are up.
Time 1: 
     worker1 is down.
     worker2 is up.
     An update is executed. Since worker2 is up, the data in the shard_placement on worker2 is updated.
     The shard_placement on worker1 is not updated, and the corresponding shard_placement row in the metadata is marked as invalid not to accept any queries when its up again. At this step, you can check status of each shard_placement from the metadata. You'll see that pg_shard marks the shard placement on worker-1 as invalid, where the operation is supposed to be executed.
Time 2:
   Start worker1.
   Stop worker2.
   Issue any query. Since one of the shard_placement on worker1 marked as invalid and the replica of it on worker2 is unavailable, the query fails. In other words, pg_shard decides that there is no valid copy of some part of the data and errors out. 
   For instance, if your query is "SELECT * from customer_reviews WHERE customer_id=x"; where x is an customer_id whose value is hashed to healthy shard, you could be able to get query results.

Answer to your question on sync
pg_shard does not do auto-sync, instead it provides a UDF which can be called to repair an inactive shard placement using data from a healthy placement. Note that you can run the UDF when both workers are up.

Onder,
Thanks

Rishabh Chaudhary

unread,
Oct 13, 2015, 4:39:04 AM10/13/15
to pg_shard users
Hi Onder,

Thanks. That was quite helpful.

However, the repairing of shards doesnt seem to be happening for me. I get this:

WARNING:  Bad result from ss-postgres-slave1:5432
DETAIL:  Remote message: function worker_copy_shard_placement(unknown, unknown, integer) does not exist
ERROR:  could not copy shard data
HINT:  Consult recent messages in the server logs for details.

here ss-postgres-slave1 is my faulty node.

Onder Kalaci

unread,
Oct 13, 2015, 4:43:42 AM10/13/15
to pg_shard users
Hi Rishabh,

As noted in the documentation, in order for this function to operate, pg_shard must be installed on all worker nodes and not just the master node. Have you created pg_shard on all workers?

Onder,
Thanks

Rishabh Chaudhary

unread,
Oct 13, 2015, 4:55:52 AM10/13/15
to pg_shard users
Hi,

Yes I did. and then i restarted the postgres service. Do i need to create the extension from inside the worker node? as in: "create extension pg_shard;"?

Onder Kalaci

unread,
Oct 13, 2015, 5:01:09 AM10/13/15
to pg_shard users
Hi, 

You need to do the following two on all workers to successfully create pg_shard:
  • First, you will need to add pg_shard to shared_preload_libraries in your postgresql.conf for each worker:
    • shared_preload_libraries = 'pg_shard' # (change requires restart)
  • Connect to each worker via psql and run:
    • CREATE EXTENSION pg_shard;
Thanks

Rishabh Chaudhary

unread,
Oct 13, 2015, 5:09:07 AM10/13/15
to pg_shard users
Hi,

Thanks a lot!

I missed out on the last step.

That was a great help. :)
Reply all
Reply to author
Forward
0 new messages