Is insertion speed for master with shards will always slower than master without shard's?

Skip to first unread message

Yan Cheng Cheok

Feb 23, 2016, 12:49:33 AM2/23/16
to pg_shard users
From the discussion on!topic/pg_shard-users/B-o5OOQWsok , by having multiple write clients, I thought I can I have higher throughput compared with case without using any shard.

I have 5 servers, which each server is having 4 CPU cores.

I experiment 2 cases.
  • 1 master. No shard.
  • 1 master. 16 shards distributed among 4 servers. Created via master_create_worker_shards('events', 16, 2)

I perform insertion of total 500k row of data. Initially, I was using 1 write client. I realize 16 shards case is significant slower than no shard case.

Soon, from!topic/pg_shard-users/B-o5OOQWsok , I realize I need to have multiple write clients, to take advantage of 16 shards.

Although I had increased number of write clients to 32, I still doesn't find 16 shard performance better than no share.

Here's my experiment result.


Number of write worker at master

Time (s)

No shard. 1 master



No shard. 1 master



No shard. 1 master



16 shard distributed among 4 servers. 2 Replica each.



16 shard distributed among 4 servers. 2 Replica each.



16 shard distributed among 4 servers. 2 Replica each.



I was wondering, is this the expected result? Is it possible to make 16 shard case faster than no shard case?

Onder Kalaci

Feb 23, 2016, 10:22:16 PM2/23/16
to pg_shard users, Marco Slot
Hi Yen,

Thanks for reaching out us.

Yes, this is the expected behaviour. There are two reasons leading to this in your case:

(a) Replication factor is set to 2. This means that each INSERT you issued is propagated to two shards. So basically, we're INSERTing two identical rows to the two different shard placements on two different worker nodes. 

(b) In the distributed case, the master has to find the correct shard that the INSERT should go into. This includes fetching the partition column's value from the query, hashing it's value and routing the INSERT to the corresponding worker node. So, in summary, executing an INSERT statement on a distributed tables has some overhead.

Because of the above items (and some others), in a single master case, it is not likely to hit the non-distributed equivalent in terms of the INSERT rate. Does this explanation make sense?

Currently, we are working on a masterless version of our product. In that case, each node in the cluster will be able to execute INSERTs, which will allow pg_shard (or CitusDB) exceed non-sharded table's INSERT rates. 

Btw, what kind of use case are you targeting? 


Yan Cheng Cheok

Feb 24, 2016, 12:54:54 AM2/24/16
to pg_shard users,
Hi Onder,

Thank you for your info.

In short, we tend to store "view", "lead" and "sales" information of high traffic web-page. Later, we are going to generate marketing report, to visualize relation between "view", "lead" and "sales".

Currently, there are 4 millions row views data for a country, per day. We can foresee data size will grow, when we roll out to more country. Hence, we tend to look for a DB solution, which we can scale up with minimal effort, when business grows.

What our concern is scalable. When data grows :-
  • Disk space. When running out of disk space, can we add in more nodes to expand disk space?
  • Write speed. When write speed goes slow, can we add in more nodes to prevent write speed from getting slower?
  • Read speed. When read speed goes slow, can we add in more nodes to prevent read speed from getting slower?
Reply all
Reply to author
0 new messages