The SQL Layer heavily relies on clustering related rows from different tables using a concept called Table-Groups. Logically, Table-Groups organize a set of user-defined tables into a hierarchy that reflects an object structure: a CUSTOMERS table will be the parent of an ORDERS table, which will be the parent of the ITEMS table:
CUSTOMERS
|
+--- ORDERS
|
+--- ITEMS
Tables from the logical schema cannot be arbitrarily interleaved: the child table must have a foreign key to its parent table.
This Table-Group definition is then reflected in the physical organization of data. Related rows from the tables are stored in a hierarchy, interleaved according to the Table-Group structure, where each child table rows are clustered with and interleaved with its parent(s) rows: ACUSTOMER row is stored followed by its first ORDER row, followed by that order’s ITEMS, followed by the next ORDER and ITEMS for that customer, thereby reflecting the hierarchy of the tables:
CUSTOMER ROW
ORDER ROW
ITEM ROW
ITEM ROW
ORDER ROW
ITEM ROW
CUSTOMER ROW
ORDER ROW
ITEM ROW
...
Rows that belong to the CUSTOMER object are clustered together, effectively pre-joining the data. This clustering is critical to minimizing latency and allows direct object access alongside the SQL access.
Taking advantage of this hierarchical row storage requires an algebra and query execution operators that are Table-Group or Clusteringaware. Our GroupScan operator efficiently joins a parent table with descendant tables. Other operators leverage clustering information in indexes, such as the Index Intersection that can intersect indexes from different tables in a Table-Group.