Citus distribution of Tables with Inheritance

34 views
Skip to first unread message

Krishnamurthy Narayanan

unread,
Dec 20, 2021, 3:12:12 PM12/20/21
to citus-users
Hello all,

We have an application that has been developed in Django that has a considerable amount of concrete table inheritance. The larger tables in our system (along with the associated inheritance table hierarchy) have to be sharded across multiple nodes. 

We face two problems in sharding our tables:
 
1) Django does not play well with composite primary or foreign keys
2) Any tenant or shard-id we create on a base table, may have to be replicated in every derived table, which under the hood has a One-To-One key to its parent (at least that is how I read the documentation). 

To frame my questions, assume we have two tables. First, we have an Employee table with a column called 'institution' which we use as the distribution column. Next we have a Manager Table that inherits (concretely) from Employee with a Django One-To-One field. We might have more tables inheriting from Manager etc.  My questions are:

A) If the Employee table is sharded by 'institution' can the Manager table and other derived tables also be sharded automatically by Citus? In Django, the Manager table will NOT have the 'institution' column and adding it to each table in the hierarchy is painful in the extreme.

B) Even if Citus were to be able to do (A) easily, the simple fact is that the entire app we wrote will have to be reworked for composite primary keys. Our solution for this would be to create a single primary key column in Employee which (at creation) would have the format "<shard-id>|<unique key>". The nice thing about this approach is that it will automatically propagate across the entire concrete inheritance hierarchy of Django. However, I would like to custom hash or range partition the table based on just the <shard-id> part of my primary key. Ideally I could do this easily with a custom HASH function or range function that operated ONLY on the <shard-id> portion of the primary key which I can extract. How can I override the HASH/RANGE functions used in table creation with custom ones? Does this involve any performance penalties (these are VERY large tables) and if so how might I be able to minimize them?

C) If I can create custom HASH/RANGE functions (and from the documentation it appears that Citus allows this for composite types) would queries/foreign keys etc use the same function to determine query distribution across shardsd in the postgres engine? 

Thank you in advance for your input.

Best regards,

Nandu
 

jelte....@microsoft.com

unread,
Dec 27, 2021, 6:44:07 PM12/27/21
to citus-users
When using Django together with Citus, the recommendation is usually to use our django-multitenant library: https://groups.google.com/u/1/g/citus-users
Did you try that, to see if it fits your needs?

Jelte

PS. Sorry if I sent a similar email 2 times, from my end it looked as if the first one was not sent successfully.

Mediphore

unread,
Dec 27, 2021, 7:23:47 PM12/27/21
to citus-users
Migrations with the django multi tenant library are difficult and using it will will not solve my problem of vastly more complex inheritance. 
The primary key is the basis for all inheritance and replacing it with the TenantForeignKey or TenantOneToOneKey is not easily done.
A custom hash function as a parameter with the distribution column is a much easier and simpler solution.

Narayanan

Reply all
Reply to author
Forward
0 new messages