Multi-tenant and UNIQUE constraints, migrating to Citus

227 views
Skip to first unread message

Javier Aguirre

unread,
Nov 15, 2016, 4:38:49 AM11/15/16
to citus-users
Hi!

Our company is testing options to scale our database and we think Citus can be the solution we need. :-)

Although we're finding problems on how to adapt our current Postgres database schema to Citus requirements.
We want to create distributed tables based on a company_id, but there are other unique/primary keys on those tables
and that's not allowed.

An example of what we want to achieve using one of the tutorials from Citus documentation.

If we start with this:

CREATE TABLE wikipedia_editors (
  editor TEXT UNIQUE, -- The name of the editor
  bot BOOLEAN, -- Whether they are a bot (self-reported)

  edit_count INT, -- How many edits they've made
  added_chars INT, -- How many characters they've added
  removed_chars INT, -- How many characters they've removed

  first_seen TIMESTAMPTZ, -- The time we first saw them edit
  last_seen TIMESTAMPTZ -- The time we last saw them edit
);

But we also have an ID, a UNIQUE key (like slug for example) and our company_id, what would be the best solution to be able to distribute this table?

Example:

CREATE TABLE wikipedia_editors (
  id PRIMARY KEY,
  slug TEXT UNIQUE,
  company_id INT UNIQUE,
  editor TEXT UNIQUE, -- The name of the editor
  bot BOOLEAN, -- Whether they are a bot (self-reported)

  edit_count INT, -- How many edits they've made
  added_chars INT, -- How many characters they've added
  removed_chars INT, -- How many characters they've removed

  first_seen TIMESTAMPTZ, -- The time we first saw them edit
  last_seen TIMESTAMPTZ -- The time we last saw them edit
);

We want to do:

SELECT master_create_distributed_table('wikipedia_changes', 'company_id', 'hash');

Thank you!

Onder Kalaci

unread,
Nov 15, 2016, 5:35:15 PM11/15/16
to citus-users
Hi Javier,

Thanks for contacting us. Happy to know that you are doing tests with Citus and working with the tenant based sharding approach. I'm also sharing a link from our blog which discusses some of the details on multi-tenant use case: https://www.citusdata.com/blog/2016/08/10/sharding-for-a-multi-tenant-app-with-postgres/

Regarding your question related UNIQUE/PRIMARY key constraint, our recommendation in tentant based sharding, is to add tenant_id(in your case company_id) to the constraint. So in your case it would be something like, (id,tenant_id). 

If you think in terms of correctness, without including the distribution column, it is not possible to enforce uniqueness across multiple shards for non-distribution columns. By adding the distribution column to the unique constraints, citus could actually enfore the uniquness within each shard. This is due to the distributed nature of the data. For instance, given the specific table you've asked, there is no easy way of enforcing uniqeness of "slug" across the shards. Is that clear to you as well? 

CREATE TABLE wikipedia_editors (
  id int,
  slug TEXT,
  company_id INT,
  editor TEXT, -- The name of the editor
  bot BOOLEAN, -- Whether they are a bot (self-reported)

  edit_count INT, -- How many edits they've made
  added_chars INT, -- How many characters they've added
  removed_chars INT, -- How many characters they've removed

  first_seen TIMESTAMP, -- The time we first saw them edit
  last_seen TIMESTAMP, -- The time we last saw them edit

  UNIQUE(company_id, id),
  UNIQUE(company_id, slug),
  UNIQUE(company_id, editor)
);

-- See our new table creation API for Citus 6.0 which we've just release 
SELECT create_distributed_table('wikipedia_editors', 'company_id');

Let me know if the above works for your use case. Otherwise, we could look into other approaches.

And it would be great to share more about your use case? What application are you building? What are the number of tenants and, what is the data model? Just that we could help you with your data modeling with Citus :)

Javier Aguirre

unread,
Nov 16, 2016, 11:11:26 AM11/16/16
to Onder Kalaci, citus-users


El mar., 15 nov. 2016 a las 23:35, Onder Kalaci (<on...@citusdata.com>) escribió:
Hi Javier,


Hi!
 
Thanks for contacting us. Happy to know that you are doing tests with Citus and working with the tenant based sharding approach. I'm also sharing a link from our blog which discusses some of the details on multi-tenant use case: https://www.citusdata.com/blog/2016/08/10/sharding-for-a-multi-tenant-app-with-postgres/

Regarding your question related UNIQUE/PRIMARY key constraint, our recommendation in tentant based sharding, is to add tenant_id(in your case company_id) to the constraint. So in your case it would be something like, (id,tenant_id). 


I came to that conclusion, I’m glad we’re in the right path then. :-)
 
If you think in terms of correctness, without including the distribution column, it is not possible to enforce uniqueness across multiple shards for non-distribution columns. By adding the distribution column to the unique constraints, citus could actually enfore the uniquness within each shard. This is due to the distributed nature of the data. For instance, given the specific table you've asked, there is no easy way of enforcing uniqeness of "slug" across the shards. Is that clear to you as well? 

CREATE TABLE wikipedia_editors (
  id int,
  slug TEXT,
  company_id INT,
  editor TEXT, -- The name of the editor
  bot BOOLEAN, -- Whether they are a bot (self-reported)

  edit_count INT, -- How many edits they've made
  added_chars INT, -- How many characters they've added
  removed_chars INT, -- How many characters they've removed

  first_seen TIMESTAMP, -- The time we first saw them edit
  last_seen TIMESTAMP, -- The time we last saw them edit

  UNIQUE(company_id, id),
  UNIQUE(company_id, slug),
  UNIQUE(company_id, editor)
);

-- See our new table creation API for Citus 6.0 which we've just release 
SELECT create_distributed_table('wikipedia_editors', 'company_id');

Let me know if the above works for your use case. Otherwise, we could look into other approaches.


Thank you very much for the example, I think after understanding how Citus works I realised there couldn’t be those UNIQUE constraints apart from my tenant_id.
 
And it would be great to share more about your use case? What application are you building? What are the number of tenants and, what is the data model? Just that we could help you with your data modeling with Citus :)


We have already an application in production, A Postgres master-slave architecture on AWS. The application is developed in Django. We want to migrate the master-slave to Citus.

We have ~150 tenants.

Regarding the data model, It’s kind of long and complex, so maybe It’s better to create a first version and discuss about the problems that could arise?

Thank you very much!
Javi.
--
Javier Aguirre
Reply all
Reply to author
Forward
0 new messages