Hello all,
We have an application that has been developed in Django that has a considerable amount of concrete table inheritance. The larger tables in our system (along with the associated inheritance table hierarchy) have to be sharded across multiple nodes.
We face two problems in sharding our tables:
1) Django does not play well with composite primary or foreign keys
2) Any tenant or shard-id we create on a base table, may have to be replicated in every derived table, which under the hood has a One-To-One key to its parent (at least that is how I read the documentation).
To frame my questions, assume we have two tables. First, we have an Employee table with a column called 'institution' which we use as the distribution column. Next we have a Manager Table that inherits (concretely) from Employee with a Django One-To-One field. We might have more tables inheriting from Manager etc. My questions are:
A) If the Employee table is sharded by 'institution' can the Manager table and other derived tables also be sharded automatically by Citus? In Django, the Manager table will NOT have the 'institution' column and adding it to each table in the hierarchy is painful in the extreme.
B) Even if Citus were to be able to do (A) easily, the simple fact is that the entire app we wrote will have to be reworked for composite primary keys. Our solution for this would be to create a single primary key column in Employee which (at creation) would have the format "<shard-id>|<unique key>". The nice thing about this approach is that it will automatically propagate across the entire concrete inheritance hierarchy of Django. However, I would like to custom hash or range partition the table based on just the <shard-id> part of my primary key. Ideally I could do this easily with a custom HASH function or range function that operated ONLY on the <shard-id> portion of the primary key which I can extract. How can I override the HASH/RANGE functions used in table creation with custom ones? Does this involve any performance penalties (these are VERY large tables) and if so how might I be able to minimize them?
C) If I can create custom HASH/RANGE functions (and from the documentation it appears that Citus allows this for composite types) would queries/foreign keys etc use the same function to determine query distribution across shardsd in the postgres engine?
Thank you in advance for your input.
Best regards,
Nandu