Hi!
There are two modes of data ingestion in MCS. The first is based on top of cpimport. INSERT..SELECT and the UDF for S3-columnstore ingestion belong to this category. cpimport ingestion has almost fixed overhead that depends on number of columns in a target table.
The second mode covers both single and batched INSERT. Single record INSERT is slower b/c the overhead is the same as with cpimport but it happens for every record.
Batched insert doesn't use cpimport but it has the same overhead as previously but for the batch so it is more efficient comparing with a single record INSERT.
Speaking about DELETE. Imagine a deleted row as a space(in multiple columnar files) that MCS can't automatically reclaim w/o reloading the data into the table.
There ways to partially reload the data into the table. The table consist of partitions, every partition has a full set of table columns. Every column in this set has 4 extents of data and extent is 8 000 000 values by default so partition is 32 000 000 values. You can calculate how many values do you have per partition and if the amount of the wasted space is significant you should reload the partition. We will automate this algo but it will take some time.
Regards,
Roman
четверг, 20 июля 2023 г. в 08:42:13 UTC+3, pantonis: