Multi-tenant and UNIQUE constraints, migrating to Citus

Javier Aguirre

unread,

Nov 15, 2016, 4:38:49 AM11/15/16

to citus-users

Hi!

Our company is testing options to scale our database and we think Citus can be the solution we need. :-)

Although we're finding problems on how to adapt our current Postgres database schema to Citus requirements.

We want to create distributed tables based on a company_id, but there are other unique/primary keys on those tables

and that's not allowed.

An example of what we want to achieve using one of the tutorials from Citus documentation.

If we start with this:

CREATE TABLE wikipedia_editors (

editor TEXT UNIQUE, -- The name of the editor

bot BOOLEAN, -- Whether they are a bot (self-reported)

edit_count INT, -- How many edits they've made

added_chars INT, -- How many characters they've added

removed_chars INT, -- How many characters they've removed

first_seen TIMESTAMPTZ, -- The time we first saw them edit

last_seen TIMESTAMPTZ -- The time we last saw them edit

);

But we also have an ID, a UNIQUE key (like slug for example) and our company_id, what would be the best solution to be able to distribute this table?

Example:

CREATE TABLE wikipedia_editors (

id PRIMARY KEY,

slug TEXT UNIQUE,

company_id INT UNIQUE,

editor TEXT UNIQUE, -- The name of the editor

bot BOOLEAN, -- Whether they are a bot (self-reported)

edit_count INT, -- How many edits they've made

added_chars INT, -- How many characters they've added

removed_chars INT, -- How many characters they've removed

first_seen TIMESTAMPTZ, -- The time we first saw them edit

last_seen TIMESTAMPTZ -- The time we last saw them edit

);

We want to do:

SELECT master_create_distributed_table('wikipedia_changes', 'company_id', 'hash');

Thank you!

Onder Kalaci

unread,

Nov 15, 2016, 5:35:15 PM11/15/16

to citus-users

Hi Javier,

Thanks for contacting us. Happy to know that you are doing tests with Citus and working with the tenant based sharding approach. I'm also sharing a link from our blog which discusses some of the details on multi-tenant use case: https://www.citusdata.com/blog/2016/08/10/sharding-for-a-multi-tenant-app-with-postgres/

Regarding your question related UNIQUE/PRIMARY key constraint, our recommendation in tentant based sharding, is to add tenant_id(in your case company_id) to the constraint. So in your case it would be something like, (id,tenant_id).

If you think in terms of correctness, without including the distribution column, it is not possible to enforce uniqueness across multiple shards for non-distribution columns. By adding the distribution column to the unique constraints, citus could actually enfore the uniquness within each shard. This is due to the distributed nature of the data. For instance, given the specific table you've asked, there is no easy way of enforcing uniqeness of "slug" across the shards. Is that clear to you as well?

CREATE TABLE wikipedia_editors (

id int,

slug TEXT,

company_id INT,

editor TEXT, -- The name of the editor

bot BOOLEAN, -- Whether they are a bot (self-reported)

edit_count INT, -- How many edits they've made

added_chars INT, -- How many characters they've added

removed_chars INT, -- How many characters they've removed

first_seen TIMESTAMP, -- The time we first saw them edit

last_seen TIMESTAMP, -- The time we last saw them edit

UNIQUE(company_id, id),

UNIQUE(company_id, slug),

UNIQUE(company_id, editor)

);

-- See our new table creation API for Citus 6.0 which we've just release

-- http://docs.citusdata.com/en/v6.0/reference/user_defined_functions.html#create-distributed-table

SELECT create_distributed_table('wikipedia_editors', 'company_id');

Let me know if the above works for your use case. Otherwise, we could look into other approaches.

And it would be great to share more about your use case? What application are you building? What are the number of tenants and, what is the data model? Just that we could help you with your data modeling with Citus :)

Javier Aguirre

unread,

Nov 16, 2016, 11:11:26 AM11/16/16

to Onder Kalaci, citus-users

El mar., 15 nov. 2016 a las 23:35, Onder Kalaci (<on...@citusdata.com>) escribió:

Hi Javier,

Hi!

Thanks for contacting us. Happy to know that you are doing tests with Citus and working with the tenant based sharding approach. I'm also sharing a link from our blog which discusses some of the details on multi-tenant use case: https://www.citusdata.com/blog/2016/08/10/sharding-for-a-multi-tenant-app-with-postgres/

Regarding your question related UNIQUE/PRIMARY key constraint, our recommendation in tentant based sharding, is to add tenant_id(in your case company_id) to the constraint. So in your case it would be something like, (id,tenant_id).

I came to that conclusion, I’m glad we’re in the right path then. :-)

If you think in terms of correctness, without including the distribution column, it is not possible to enforce uniqueness across multiple shards for non-distribution columns. By adding the distribution column to the unique constraints, citus could actually enfore the uniquness within each shard. This is due to the distributed nature of the data. For instance, given the specific table you've asked, there is no easy way of enforcing uniqeness of "slug" across the shards. Is that clear to you as well?

CREATE TABLE wikipedia_editors (
id int,
slug TEXT,
company_id INT,
editor TEXT, -- The name of the editor
bot BOOLEAN, -- Whether they are a bot (self-reported)

edit_count INT, -- How many edits they've made
added_chars INT, -- How many characters they've added
removed_chars INT, -- How many characters they've removed

first_seen TIMESTAMP, -- The time we first saw them edit
last_seen TIMESTAMP, -- The time we last saw them edit

UNIQUE(company_id, id),
UNIQUE(company_id, slug),
UNIQUE(company_id, editor)
);

-- See our new table creation API for Citus 6.0 which we've just release
-- http://docs.citusdata.com/en/v6.0/reference/user_defined_functions.html#create-distributed-table
SELECT create_distributed_table('wikipedia_editors', 'company_id');

Let me know if the above works for your use case. Otherwise, we could look into other approaches.

Thank you very much for the example, I think after understanding how Citus works I realised there couldn’t be those UNIQUE constraints apart from my tenant_id.

And it would be great to share more about your use case? What application are you building? What are the number of tenants and, what is the data model? Just that we could help you with your data modeling with Citus :)

We have already an application in production, A Postgres master-slave architecture on AWS. The application is developed in Django. We want to migrate the master-slave to Citus.

We have ~150 tenants.

Regarding the data model, It’s kind of long and complex, so maybe It’s better to create a first version and discuss about the problems that could arise?

Thank you very much!

Javi.

--

—

Javier Aguirre

https://javaguirre.net

Reply all

Reply to author

Forward