Few inserts per minute

23 views
Skip to first unread message

pantonis

unread,
Jul 14, 2023, 8:51:48 AM7/14/23
to MariaDB ColumnStore
Hello,

I have read that MCS is optimized for batch inserts and I wanted to ask if anyone know.
My system runs every minute extracts, transforms and then load data to a MCS database.
Every minute I have about 10-100 new items to insert. Would that cause any issue to the database because of the small number of data (any extents fragmentation etc...)

Thanks in advance

Allen Herrera

unread,
Jul 19, 2023, 11:50:19 AM7/19/23
to pantonis, MariaDB ColumnStore
High level not an issue,
Loading via csv with cpimport, or load_from_s3 UDF are most scalable.
insert into select * from x is most convenient.
avoid single inserts because of scalability issues.
An append only strategy is best and most scalable.

Extent fragmentation comes from running updates/deletes, which should be heavily avoided on columnar tables. if required or can not be engineered around, its recommended a full table/ ordered data reload after some time depending on your data/volume/needs.

--
You received this message because you are subscribed to the Google Groups "MariaDB ColumnStore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mariadb-columns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mariadb-columnstore/5f6ebd88-27c2-43b1-869c-ed033301dbacn%40googlegroups.com.


--

Allen Herrera
Customer Engineer | MariaDB Corporation
+1.360.888.3938 | calendly.com/allen-herrera | allen.herrera@mariadb.com

pantonis

unread,
Jul 20, 2023, 1:42:13 AM7/20/23
to MariaDB ColumnStore
Unfortunately cpimport is not available for .NET 7

"avoid single inserts because of scalability issues."
Scalability issues? You mean is too slow?

Extent fragmentation comes from running updates/deletes, which should be heavily avoided on columnar tables. if required or can not be engineered around, its recommended a full table/ ordered data reload after some time depending on your data/volume/needs.
Even if we have some rare deletes (every once in a month or so because of some bug to the business logic in the microservice that pushed data)? What is full table/ordered data reload?


Thank you


drrtuy

unread,
Jul 22, 2023, 11:05:41 AM7/22/23
to MariaDB ColumnStore
Hi!

There are two modes of data ingestion in MCS. The first is based on top of cpimport. INSERT..SELECT and the UDF for S3-columnstore ingestion belong to this category. cpimport ingestion has almost fixed overhead that depends on number of columns in a target table.
The second mode covers both single and batched INSERT. Single record INSERT is slower b/c the overhead is the same as with cpimport but it happens for every record.
Batched insert doesn't use cpimport but it has the same overhead as previously but for the batch so it is more efficient comparing with a single record INSERT.

Speaking about DELETE. Imagine a deleted row as a space(in multiple columnar files) that MCS can't automatically reclaim w/o reloading the data into the table.
There ways to partially reload the data into the table.  The table consist of partitions, every partition has a full set of table columns. Every column in this set has 4 extents of data and extent is 8 000 000 values by default so partition is 32 000 000 values. You can calculate how many values do you have per partition and if the amount of the wasted space is significant you should reload the partition. We will automate this algo but it will take some time.

Regards,
Roman

четверг, 20 июля 2023 г. в 08:42:13 UTC+3, pantonis:

pantonis

unread,
Jul 22, 2023, 11:11:56 AM7/22/23
to MariaDB ColumnStore
Thanks for your reply

1. Any plans to release cpimport for .NET? 
2. How do I reload the partition?
3.  "We will automate this algo but it will take some time"  - Any rough estimate (1 month, 6 months, 1year, 2years)? Is there anywhere that we can track this?

TIA

drrtuy

unread,
Aug 8, 2023, 9:24:30 AM8/8/23
to MariaDB ColumnStore
1. cpimport is a tool that needs a Columnstore.xml config + some direct access to the cluster so it is hard to port its functionality into .NET. I suppose that using batched INSERT from .NET or use a temp table with INSERT..SELECT might suite your needs.
2. block activity for a table, select from partition into a temp table, disable partition, ingest the data back, remove partition. [1]
3. A year.

1. The doc that has partition-related functions usage examples.

Regards,
Roman

суббота, 22 июля 2023 г. в 18:11:56 UTC+3, pantonis:
Reply all
Reply to author
Forward
0 new messages