Some Basic pointers

82 views
Skip to first unread message

Jeff Mundine

unread,
Aug 21, 2015, 8:50:12 PM8/21/15
to pg_shard users
Most here seem very familiar with postgresql; Unfortunately I'm used to other databases, and haven't messed with it for years.
I need some basic rules of thumb for setting up pg_shard (hardware).
Basically I'm looking at a single database, about 1.5TB in size; best guess is around 5 billion rows. Most of those 
spread between 2 tables.
Planning to do testing, but people can't wait and want some rough specs to go with it, like yesterday.
Hardware for master? (i'll be doing at least 2)
How many workers? hardware specs for that-
how many shards per server (probably VMs), how many servers for shards?

I understand no more than 16 cores for the master (someone reported degradation for anything higher), 
but still, keep the cores up on that.


Sorry; there isn't really much around for information on this; You don't have to spec it, but some general suggestions,
like X amount of memory per X amount of storage per worker would be great.

THX!

Samay Sharma

unread,
Aug 25, 2015, 3:21:55 PM8/25/15
to pg_shard users
Hey Jeff,

Thanks for reaching out to us.

In general, hardware requirements for the master node depend on the INSERT rates you want to achieve. To maximize INSERT throughput, you should run concurrent statements in parallel and hence having more CPU cores helps.

For the worker node configuration, are you looking to maximize write throughput or parallelize analytics queries? If your workload is more analytical rather than operational, then CitusDB has more comprehensive analytics capabilities.

In terms of number of shards, we generally recommend creating more shards than the number of CPU cores. This way, when you want to scale out your cluster by adding more nodes, you can easily move those shards around to effectively use the cores on those machines.

I find that we can give better recommendations once we know more about the use case. If you'd like, I'd be happy to hop on a quick call to understand your use case and provide suggestions accordingly. What do you think?

Regards,
Samay

Jeff Mundine

unread,
Aug 25, 2015, 7:41:53 PM8/25/15
to pg_shard users
Thx for the info; We don't really do any analytics, just standard operational work. 
I was just trying to size up some base specs, basically get an idea of some good ballpark numbers.
When I first started out, I was like 22 individual nodes with 1 shard on each node, using SSDs and 8G on each node;
This was for a 1.2 TB db.
I picked up a few things around, like multiple shards per core, etc. But there's little info on memory usage or any example setups for
different needs. 
I'm now down to 4 shard nodes, 1 cpu core each, still at 8GB each, and using about 4-6 shards per node (all VMs), that's just per replication factor.
Just trying to get some validation; this is for a future project that needs specs now..

Appreciate the feedback, it was helpful!

Ozgun Erdogan

unread,
Aug 27, 2015, 9:10:33 PM8/27/15
to pg_shard users
Hi Jeff,

We're working on improving our documentation and providing more examples. Your observations are correct, and I'm summarizing a few points for operational workloads below.
  • pg_shard currently has one master and multiple worker nodes. The master node routes and replicates queries to the workers, and having more CPU on the master helps. The worker nodes are regular PostgreSQL instances. Each worker node will manage its memory and disk, and you can resolve those bottlenecks by adding more workers.
  • As Samay noted, running concurrent INSERTs helps the write throughput. Roughly, I found that I could sustain 25-40K inserts/sec through the master with four worker nodes. Your mileage may vary.
  • We have exciting announcements around write scalability coming up in our pg_shard v2.0.
  • (For completeness, if you're planning to run analytical queries in the future, the following section could help.)
Hope this helps somewhat. If you have any questions or feedback for pg_shard, please send them our way!

Best,
Ozgun
Reply all
Reply to author
Forward
0 new messages