Performance question

83 views
Skip to first unread message

armand pirvu

unread,
Aug 12, 2015, 5:10:30 PM8/12/15
to pg_shard users
Hi,

A dirty test .

Same table , same data to be inserted (using insert into not copy) gets me the following results

50k inserts using sharding
Start Time: 12:59:51
End Time: 13:02:26
Total Time Taken 155 seconds = 2:35

Same 50k inserts without sharding 
Start Time: 13:08:36
End Time: 13:09:20
Total Time Taken 44 seconds

Same server

Am I wrong to assume that sharding should have been faster ? After all in my mind it is about disrtibuting. 
What am I missing ?

Thanks
-- Armand

Jason Petersen

unread,
Aug 13, 2015, 5:59:04 PM8/13/15
to armand pirvu, pg_shard users
On Aug 12, 2015, at 3:10 PM, armand pirvu <armand...@gmail.com> wrote:

Am I wrong to assume that sharding should have been faster ?

To fully answer that question I’d need to know a few things:

  • How many machines are in your cluster?

  • How many shards did you create?

  • What’s the replication level?

  • How many clients were you using to INSERT data?

What am I missing ?

Without any other information, I’m guessing that you did your benchmark using a single writer process. Under typical conditions, that’s not even sufficient to saturate the write capacity of a normal PostgreSQL box, let alone a cluster.

With many writers, we’ve easily observed 50k rows/second using pg_shard. One of my colleagues (Marco) was able to achieve far greater throughput in an experiment to test what could happen if we remove the single master node restriction.

Whether using pg_shard or not, you’ll want to exploit writer parallelism to see the highest throughput. Because of the planner overhead in pg_shard, it’s unlikely it will defeat stock PostgreSQL in a head-to-head comparison using a single writer; however, when parallel writers come into play it will exceed anything a single PostgreSQL box can do.

--
Jason Petersen
Software Engineer | Citus Data

signature.asc

armand pirvu

unread,
Aug 14, 2015, 3:02:01 PM8/14/15
to pg_shard users, armand...@gmail.com
Hi Jason

- How many machines are in your cluster?
3 - one master and  2 workers
- How many shards did you create?
4 shards
How many clients were you using to INSERT data?
1 client, psql straight on the master
When you mention writer, I presume you mean background writers ?

Thanks
Armand

Jason Petersen

unread,
Aug 17, 2015, 3:29:33 PM8/17/15
to armand pirvu, pg_shard users
On Aug 14, 2015, at 12:02 PM, armand pirvu <armand...@gmail.com> wrote:

How many clients were you using to INSERT data?
1 client, psql straight on the master
When you mention writer, I presume you mean background writers ?

No, I meant how many clients are connecting to the master and performing INSERTs. It sounds like the answer is “one”. Since the next INSERT won’t begin processing until the previous one has returned successfully, a single client is very limited. Since we’re doing additional processing above and beyond what PostgreSQL does to plan a query, the time to process a single row will be higher than in typical PostgreSQL.

In other words, even though you have two workers, your client is only ever talking to one at a time. To see better performance, use multiple clients. The -c and -j arguments to pgbench can help here, or you can do something like what’s described in this blog post and use xargs -P to parallelize a script that talks to your cluster. In that case, Marco saw 50,000 rows/second using 64 processes to ingest data. Your mileage may vary, but you’ll never get the best throughput with a single client… parallelism is a must to load data quickly into a distributed system.
signature.asc

armand pirvu

unread,
Aug 17, 2015, 4:21:44 PM8/17/15
to pg_shard users, armand...@gmail.com
Got it Jason

Now, I do have another question about AWS cloud formation. While it is performance based, I think another thread would be needed. The question is whether I could post in this group and if not where , how do I get in touch with Marco ?


Many thanks
Armand

armand pirvu

unread,
Aug 17, 2015, 4:27:21 PM8/17/15
to pg_shard users, armand...@gmail.com
Well duh me I noticed the page has a comment box

Will use that, although truth to be told i would prefer an approach list this forum or direct connection 
Reply all
Reply to author
Forward
0 new messages