Is replication supported?

155 views
Skip to first unread message

n0ctu...@gmail.com

unread,
Mar 13, 2015, 12:06:18 PM3/13/15
to pg_shar...@googlegroups.com
Hello,

I am currently running a master-hot standby replication system, I would like to know how pg_shard interacts with this.

In the actual setup, writes are propagated from master to all the slave, as usual. 

I was wondering what happens if I install pg_shard on the current master and I create a partitioned table, adding the slaves as workers.


I'm looking for a good way to scale out write speed.

Thank you

Jason Petersen

unread,
Mar 13, 2015, 5:32:13 PM3/13/15
to n0ctu...@gmail.com, pg_shar...@googlegroups.com
On Mar 13, 2015, at 10:06 AM, n0ctu...@gmail.com wrote:

I am currently running a master-hot standby replication system, I would like to know how pg_shard interacts with this.

pg_shard is presently entirely agnostic about details of replication. It uses real PostgreSQL objects for all its bookkeeping, so they would be propagated to the standby as expected (in fact, that’s one method of implementing higher availability, so you can fail over to a new master if the main one fails).

In the actual setup, writes are propagated from master to all the slave, as usual. 

I was wondering what happens if I install pg_shard on the current master and I create a partitioned table, adding the slaves as workers.

Can you describe what you mean by this? You say adding the slaves as workers: do you mean the replication slaves?

pg_shard actually creates new tables on each of the worker nodes, so they basically exist entirely independent of one another as plain old PostgreSQL servers. Writes come into the master and it hashes and routes rows to the workers covering those values in the hash space. I gave a talk at SFPUG about pg_shard and covered this logic.

In short, the writes go through the master and are routed to shards based on its knowledge of where those shards are placed. Having replication on the pg_shard master or workers won’t interfere with this, but it wouldn’t really add anything, either… unless I’m missing something.

I'm looking for a good way to scale out write speed.

Because pg_shard uses multiple backend workers for storing data, with a good configuration (high speed networking between the master and workers) it is possible to exceed any write throughput available from a single worker node. We have been developing some AWS CloudFormation templates to quickly bring up pg_shard clusters, I’ll ask the developer whether there are any pre-tuned for high write throughput.

--
Jason Petersen
Software Engineer | Citus Data

signature.asc

n0ctu...@gmail.com

unread,
Mar 14, 2015, 6:02:05 AM3/14/15
to pg_shar...@googlegroups.com, n0ctu...@gmail.com
Hello Jason, 

I'll try to be more detailed.

- The current postgres setup is 1 master with streaming replication to 3 slaves as described in https://wiki.postgresql.org/wiki/Streaming_Replication
- There are N workers process generating data and adding it to a main queue
- A single process reads from the main queue and writes to postgres. The master server is hitting write capacity, because is unable to process queued data fast enough to keep the queue empty.

My question regard the ability to deploy a sharded AND replicated configuration, in order to gain in write speed and keep high availability.

I was thinking on a configuration like this: 

                   Shard master              (where the table is created with pg_shard, and also master)
                    /        \  
         Shard worker     Shard worker       (Each one implements a streaming replication, and is also a master of it)
           /     \         /     \   
        Slave   Slave    Slave    Slave      (Each data node keeps a full replica of its master, meaning that each has the exact                                                   copy of the partition of sharded table)


Is such configuration possible ? 
If so, how would it be possible to read sharded tables from slave servers ?

Thank you

Jason Petersen

unread,
Apr 6, 2015, 8:04:53 PM4/6/15
to n0ctu...@gmail.com, pg_shar...@googlegroups.com
Sorry for not replying sooner… Responses inline.

On Mar 14, 2015, at 4:02 AM, n0ctu...@gmail.com wrote:

- The current postgres setup is 1 master with streaming replication to 3 slaves as described in https://wiki.postgresql.org/wiki/Streaming_Replication

Nothing in pg_shard will interfere with streaming replication, though to do what you describe you’d probably just tell pg_shard about the master nodes and turn off its own replication logic (i.e. set replication factor to one).

My question regard the ability to deploy a sharded AND replicated configuration, in order to gain in write speed and keep high availability.

While you could configure such a system, pg_shard doesn’t have any built-in way of knowing about the read replicas, so it wouldn’t do anything special with them.

If so, how would it be possible to read sharded tables from slave servers ?

In theory you could manually configure a second pg_shard master pointing at the read replicas instead of the master shards and perform all reads through that pg_shard master. In practice, this would be pretty hacky, so it might be best if you could open a feature request on GitHub to get first-class read replica awareness.

To sum up, it sounds like you want pg_shard to be aware of read replicas created by streaming replication and use them when desired. We haven’t had anyone ask about this particular configuration yet, so having a ticket would help us prioritize and track it moving forward.
signature.asc

khan aamir

unread,
Apr 5, 2016, 8:22:57 AM4/5/16
to pg_shard users
can u plz help us how to install the pg_shard, i am getting errors.

on create extension pg_shard ' 

do we need to have postgres installed ?
do we need to install citus data with pg_shard?

Murat Tuncer

unread,
Apr 5, 2016, 8:55:44 AM4/5/16
to pg_shard users
Hello Khan

You could follow the instructions at pg_shard github page at https://github.com/citusdata/pg_shard
to build and install pg_shard.

You will need install PostgresSQL 9.3 or above prior to pg_shard. Citus is not required to for pg_shard to work.

Please be aware that pg_shard has now reached end of life status, and it is included in Citus 5.0.  You could just install Citus 5.0 on top of an existing PostgreSQL (version 9.3 and above).  Our documentation page describes how to install and configure Citus.

Best

Murat
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages