Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Determine which shard data ends up in

99 views
Skip to first unread message

Sanjana Sridhar

unread,
Jun 30, 2015, 6:04:29 PM6/30/15
to pg_shar...@googlegroups.com
Hello,

This might be a stupid question but I was wondering if there is anyway to determine which logical shard a particular row of data will end up in?

Regards,
Sanjana

Alibek Beldinov

unread,
Jul 1, 2015, 12:32:25 AM7/1/15
to pg_shar...@googlegroups.com
SELECT * FROM pgs_distribution_metadata.partition;
SELECT * FROM pgs_distribution_metadata.shard;
SELECT * FROM pgs_distribution_metadata.shard_placement;

+

pgAdmin

Sanjana Sridhar

unread,
Jul 2, 2015, 2:09:43 PM7/2/15
to pg_shar...@googlegroups.com
Thanks Alibek! I finally went with the following query:

SELECT id FROM pgs_distribution_metadata.shard

WHERE hashtext('G00CD27Bby') BETWEEN min_value::integer AND max_value::integer AND relation_id = 'tablename'::regclass;

Jason Petersen

unread,
Jul 7, 2015, 6:56:05 PM7/7/15
to Sanjana Sridhar, pg_shar...@googlegroups.com
On Jul 2, 2015, at 12:09 PM, Sanjana Sridhar <sanjana....@gmail.com> wrote:

SELECT id FROM pgs_distribution_metadata.shard WHERE hashtext('G00CD27Bby') BETWEEN min_value::integer AND max_value::integer AND relation_id = 'tablename'::regclass;

Though that might work with the text type as your partition key, pg_shard is actually looking up the appropriate hash method for whatever type is used by the partition column. So hashtext would be inappropriate if you were sharding by time, or a biginteger.

I think using hash_array with a single-element array consisting of your partition value would give the same hash value as our internal implementation, but there isn’t currently any SQL-level function that directly exposes the way we determine which shard contains a given partition value: that’s all internal.

Do you have a specific need for this functionality? How does knowing this information help you?

--
Jason Petersen
Software Engineer | Citus Data

signature.asc

Sanjana Sridhar

unread,
Jul 8, 2015, 3:13:49 PM7/8/15
to Jason Petersen, pg_shar...@googlegroups.com
Thanks for getting back to me Jason. The question was from a purely theoretical perspective.

Ozgun Erdogan

unread,
Aug 12, 2015, 2:58:46 PM8/12/15
to pg_shard users, ja...@citusdata.com
Hey Sanjana,

We just heard a similar question on mapping column values to shards, and I wanted to drop in a quick note.

At a higher level, pg_shard resolves the column's data type, and looks up that type's hash function in PostgreSQL. pg_shard then hashes the value and compares it against the hash token ranges in the shard metadata tables.

There's also one points that makes things slightly more complicated. When PostgreSQL applies a hash function on a Datum (Postgres' internal representation) vs a human readable text value, you may get different results. You may therefore need to do some more transformations.

I posted an example of this transformation formula (thanks to Matt) in this thread: https://groups.google.com/forum/#!topic/pg_shard-users/yrBEOV8iuQ8

Best,
Ozgun

Ozgun Erdogan

unread,
Aug 12, 2015, 3:16:34 PM8/12/15
to pg_shard users, ja...@citusdata.com

I'm inlining a small clarification below.


There's also one points that makes things slightly more complicated. When PostgreSQL applies a hash function on a Datum (Postgres' internal representation) vs a human readable text value, you may get different results. You may therefore need to do some more transformations.

This happens because hash functions internally return unsigned 32-bit integers. However, when called externally such as through psql, the return value becomes a signed 32-bit integer. 
Reply all
Reply to author
Forward
0 new messages