CREATE MATERIALIZED VIEW mv as SELECT * FROM aott DISTRIBUTED BY (j);
main enhancement being
- in-core automation present (incremental refresh of materialized view kind-of) to keep the view updated as base table is changing
- planner modified to make use of projection which possible can be done for MV as well so that users don't have to modify their queries to explicitly use MV
Some notes/questions:
- So essentially with this feature COPY operation (which is loading data to a table) is able to reshuffle/motion the tuples to other nodes (to populate its projection) while loading?
- Partial projections for AOCS should still be helpful to reduce full double size of the table if known for queries which will use projection don't fetch all the columns data
- What all use cases do you envision for PROJECTIONS other than just to avoid MOTIONS during query processing?
- So all schema alteration (AT ADD COLUMN and such) operations have to be mirrored to PROJECTIONS as well?
- Can PROJECTIONS have different storage (table type and tablespace) and compression compared to the base table?
- Does creating PROJECTION by default carry associated objects with it like INDEXES (unique index can't be carried but rest)?
- Not just the table's size on disk doubles also WAL, (physical and logical) backups, etc... Possibly should have code to just exclude them from logical backups for sure
- Creating projection for REPLICATED or RANDOM tables seems aren't helpful
- What could be perceived as direct dispatch before in presence of PROJECTIONS may no more be direct dispatch and require 2PC (though I am sure PROJECTION use case is for OLAP centric workload and not OLTP), just for correction of code though will have to plug the things here
Gives a feeling of a lot more static planning and overhead for usage of PROJECTIONS. Instead something more dynamic like how caching works transparently to users would be great if it can get going (understand its a lot more complicated). First time query performs the motion of tuples based on base table, that gets cached (in some temp table of sorts) and used for ongoing queries. Though goes away based on time retention or restart of cluster and such. It's probably partial caching of the interim query results bucket item.
- Ashwin (Broadcom)