Column value to shard mappings in pg_shard

57 views
Skip to first unread message

Ozgun Erdogan

unread,
Aug 12, 2015, 2:50:41 PM8/12/15
to pg_shard users
Hi all,

I'm reposting this question on behalf of an advanced CitusDB + pg_shard user.

"We have a formula to map a column value to a particular shard in pg_shard. When we apply this formula, we're now seeing a slightly uneven distribution for hash token ranges.

Here's the breakdown of the shard ranges that led me to having to derive a formula.  The number of shards is 256.

select range_size, count(*) as shard_count from (select shardid, (shardmaxvalue::bigint - shardminvalue::bigint) + 1 as range_size  from pg_dist_shard where logicalrelid = 94514) as t group by 1 order by 1;

 range_size | shard_count 
------------+-------------
   16777215 |         255
   16777471 |           1
(2 rows)

This is for sharding keys based on BIGINT.  I can show that this is consistent for all four tables on each cluster that are sharded on the same key.

The formula in SQL for column "x":

CASE WHEN hashint8(x) >= 2130706177 THEN 255 ELSE (hashint8(x) + 2147483648) / 16777215 END AS shardid

(A quick note on why this formula incorporates additional IF / ELSE checks. When PostgreSQL applies hashint8 on a Datum vs human readable text, you may get different reasonable. This formula therefore applies some transformations.)

It looks like the width of each shard other than the last one is (2^32 / 2^8) - 1. Is this expected?"

Best,
Ozgun

Jason Petersen

unread,
Aug 24, 2015, 12:20:02 PM8/24/15
to Ozgun Erdogan, pg_shard users
On Aug 12, 2015, at 12:50 PM, Ozgun Erdogan <oz...@citusdata.com> wrote:

It looks like the width of each shard other than the last one is (2^32 / 2^8) - 1. Is this expected?"

Turns out that’s a bug! After looking at the code, I would expect it to do that, but I doubt it was what was intended…

We inadvertently used UINT_MAX to calculate the distance between shards, which is 2^32 - 1. This is incorrect because there are actually 2^32 hash values (something inexpressible using 32-bit numbers). If we do all the calculations using 64-bit integers and change the number of tokens to be 2^32, the problem goes away.

--
Jason Petersen
Software Engineer | Citus Data

signature.asc
Reply all
Reply to author
Forward
0 new messages