MCS select performance

Ferran Gil Cuesta

unread,

Mar 24, 2022, 7:30:42 AM3/24/22

to MariaDB ColumnStore

Hi all,

We have two servers with same specs (128GB RAM, 16 vCPU) and same database tables (100GB total size, biggest table ~8 GB).

Old server is running MariaDB version 10.1.23, Columnstore version 1.0.9

New server is running MariaDB version 10.6.5, Columnstore version 5.5

Columnstore in the old server was configured with 2 PM (many years ago, when deployed). In the new one, following the recommended settings, there's only one PM.

I am attaching the columnstore.xml settings for both servers, but we are unable to find relevant differences (though we did some testing changing values in the new server to mimic the old one). None improved query time, which is the main concern at the moment.

If a query is only fetching data for a small data period (like last month), both systems can return data in an acceptable amount of time (let's say less than 5 seconds depending on the number of columns requested).

But when we request data for a longer period, like 2 or 3 years, the old system delivers results in ~15 to 30 seconds at most, but the new server takes up to 5 minutes (even more in some occasions), which is too much to be usable at all. New server is not live, so there's no external queries affecting the performance.

When performing queries, both servers show all CPUs at 100%, which makes us wonder if this is what we can get using Columnstore community. However, it seems strange that a much newer version delivers a much worse performance. It just doesn't make sense (remember, same data and same server specs).

How can we improve this performance on the new server?

Any ideas would be appreciated!

Thanks a lot,

Ferran

Columnstore.xml_new_server.xml

Columnstore.xml_old_server.xml

Ferran Gil Cuesta

unread,

Apr 11, 2022, 5:15:20 AM4/11/22

to MariaDB ColumnStore

Hi again,

We have been doing some testing to try to increase the performance but so far none seem to work.

We tried to install Columnstore with 2 PM (on a single machine) but we don't think this option is available any longer for Community server. All documentation we found is outdated, for (very) old versions.

Additionally, we split the data into multiple tables, to see if the combined queries were faster than a single one attacking a bigger table. Speed is the same. The original table had 200M records and 8GB of data, for 5 years period. The smaller tables were split "by year". This is obviously far from desired, as the goal is to be able to store lot of data in columnstore while keeping a good performance.

Finally, we also increased two settings:

- NumBlocksPct from 50 a 60

- TotalUmMemory from 25% to 35%

but so far we are unable to see an increase of RAM usage at all. The VM has 128GB and more than 50% of it is completely unused.

If you have an idea on how to improve performance, please say!

Thanks,

Ferran

drrtuy

unread,

Apr 14, 2022, 7:52:39 AM4/14/22

to MariaDB ColumnStore

Hey Ferran,

There are multiple reasons why the current community MCS is slower comparing with the previous versions.

The varchar UTF-8 columns might become dictionaries in MCS for MDB >= 10.5 and this affect performance, e.g. utf-8mb3 varchar columns are now stores 3 bytes per character as the encoding asks. In most cases this is the main source of slowdowns. Another thing is that now you use 1 server for the whole processing instead of two so this also increases timings x2 roughly speaking.

There is a way to introduce another dbroot into Columnstore.xml manually. I will share the changed version of your Columnstore.xml a bit later.

Did you analyze hardware bottlenecks: CPU, disks?

Regards,

Roman

понедельник, 11 апреля 2022 г. в 12:15:20 UTC+3, fg...@g-n.com:

Zorion Arrizabalaga Aguirre

unread,

Apr 14, 2022, 10:12:13 AM4/14/22

to MariaDB ColumnStore

Hi Roman,

Thank you very much for answering!

We appreciate very much your efforts for providing us with the updated Columnstore.xml at your convenience.

Ferran is out of the office at the moment so he may update my answer with better information next week but in case I can help, we notice that the CPU is our main hardware bottleneck because they are all busy while we are waiting for the query results.

We also may be facing disk bottlenecks, I'm not sure. There is a potentially relevant difference* there between our two servers.

In both servers we are using one Virtual Machine each one (the old database is installed as two PMs but they are running on the same server).

In both servers we are using 16 CPUs and 126G of RAM (n2-highmem-16 in GoogleCloudPlatform).

*The potential relevant difference is how we manage disks, the old MCS is using three different disks:

- The boot disk (/)

- A disk for mysql (/mnt/disks/mysql)

- A disk for both data1 and data2 (/mnt/disk/data)

The new MCS for MDB uses only the boot disk.

Do you think that using different disks may improve our performance?

In that case, since the structure is /var/lib/mysql and /var/lib/columnstore, do you have any suggestion on how we should split the disks?

/var/lib/mysql -> one disk

/var/lib/columnstore -> another disk (or should we use it for /var/lib/columnstore/data1?)

We may test this next week as well.

Many thanks in advance!

Zorion

--
You received this message because you are subscribed to the Google Groups "MariaDB ColumnStore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mariadb-columns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mariadb-columnstore/fb777fe5-392d-481d-92bb-eb4f370ad11dn%40googlegroups.com.

Ferran Gil Cuesta

unread,

Apr 20, 2022, 5:06:48 AM4/20/22

to MariaDB ColumnStore

Hi Roman,

Thank you again for your valuable input.

We were aware of some of the performance implications related to datatype and length, as explained at https://mariadb.com/kb/en/mariadb-columnstore-performance-concepts/#data-scanning-and-filtering but we didn't know about the utf8-mb3. Indeed, if we run "SHOW CREATE TABLE client_table;" we see that the CHARSET in the new server says utf8-mb3 but in the old system it only says utf8 (which matches your explanation).

The structure of one of the table is like the following:

CREATE TABLE `client_table` (
`account` varchar(20) DEFAULT NULL,
`campaign_id` varchar(20) DEFAULT NULL,
`network` varchar(128) DEFAULT NULL,
`device` varchar(15) DEFAULT NULL,
`source` varchar(30) DEFAULT NULL,
`day` date DEFAULT NULL,
`week` date DEFAULT NULL,
`month` date DEFAULT NULL,
`field1` float DEFAULT NULL,
`field2` float DEFAULT NULL,
`field3` float DEFAULT NULL,
`field4` float DEFAULT NULL,
`field5` float DEFAULT NULL,
`field6` float DEFAULT NULL,
`field6` float DEFAULT NULL,
`field8` float DEFAULT NULL,
`field9` float DEFAULT NULL,
`field10` float DEFAULT NULL,
`field11` float DEFAULT NULL,
`field12` float DEFAULT NULL,
`field13` float DEFAULT NULL,
`field14` float DEFAULT NULL,
`field15` float DEFAULT NULL,
`field16` float DEFAULT NULL,
`field17` float DEFAULT NULL,
`field18` float DEFAULT NULL,
`field19` float DEFAULT NULL,
`field20` float DEFAULT 0,
`field21` float DEFAULT 0,
`field22` float DEFAULT 0,
`field23` float DEFAULT 0,
`field24` float DEFAULT 0,
`field25` float DEFAULT 0,
`field26` float DEFAULT 0
) ENGINE=Columnstore DEFAULT CHARSET=utf8mb3

The columns device, network and source have very low cardinality (less than 10 different values in each case), which makes us wonder if we could create another table with the values and on this big table use a tinyint. We would need to join this new hypothetically table in every query (and also change the way we insert data); we are not sure if this would bring some performance benefits or not. Regarding the float columns, it is also possible that some of them don't need to be "that big", but for convenience we used always float (now and in the past).

I just want to clarify that in the old system we have 2 PM but on the same single VM instance (we don't have 2 different VM instances). As CPU raises to 100% while querying data (both in old system and in the new one), I thought the difference was not that important. Are 2 PM performing better than 1 in the same hardware?

It would be great to test a new columnstore.xml settings with 2 PM for the new system. I guess we will also need to copy/duplicate the existing files in columnstore/data1 to data2, am I right?

Thank you very much for your time.

Best regards,

Ferran

drrtuy

unread,

Apr 20, 2022, 8:32:57 AM4/20/22

to MariaDB ColumnStore

Hey,

To get performance back I suggest you to use use xxx_nopad_bin collation family creating the table. And see if you really need utf8 b/c looking at your old config I see no utf8 options thus your tables never used utf8 to store data despite what MDB show create table tells. So you might need ASCII only.

I attached the modified XML so you can diff to see the changes pattern to add a dbroot.

Regards,

Roman

среда, 20 апреля 2022 г. в 12:06:48 UTC+3, fg...@g-n.com:

Columnstore.xml_new_server.xml

drrtuy

unread,

Apr 20, 2022, 8:33:55 AM4/20/22

to MariaDB ColumnStore

And yes, you can benefit replacing varchar in the fact table with a pair of id + satellite table.

Regards,

Roman

среда, 20 апреля 2022 г. в 15:32:57 UTC+3, drrtuy:

Ferran Gil Cuesta

unread,

Apr 21, 2022, 6:29:40 AM4/21/22

to MariaDB ColumnStore

Hi Roman,

I think we can't use ASCII because we have accent marks, although we don't store rich text or emojis or anything else. I've created a new table with collation latin1_swedish_ci and inserted some data (the same amount of data as in another new table using utf8mb3_general_ci as before). Performance did not improve for our test query (around 30 seconds). I will test this a bit further later on.

On the topic to have 2 PM, I've done the following:

- stop mariadb & mariadb-columnstore

- (review and) add the new columnstore.xml settings file
- start mariadb & mariadb-columnstore

It started fine but I didn't see any change in performance or behaviour. In /var/lib/columnstore, where I see a data1 directory, there was no data2 directory (i.e: starting mariadb with the new config did not create it).

Later on, I stopped mariadb again and duplicated the files in /var/lib/columnstore/data1 to data2. Started the database again and ran some tests. No improvement. I don't think data1 and data2 should have the same files, but wanted to test it just in case.

If I execute mcsGetConfig -a it seems that the new config is read, but I am not sure if it's being really used.

I am going to test a satellite table for those 3 varchar columns that have very few different values. However, in most queries we are not getting that data, which I guess it means columnstore does not even need to read data for them.

Thank you for your help.

Ferran

Ferran Gil Cuesta

unread,

Apr 26, 2022, 11:10:46 AM4/26/22

to MariaDB ColumnStore

Hi,

Just a quick message to say that using HDD disks instead of SSD increases the speed of most queries, from 45 seconds to 30, and similar gains.

We thought that CPU was the bottleneck (and it still may be) but increasing to 32 vCPU instead of 16 gave very little improvement, almost negligible.

We are wondering if we should install the OS in an SSD disk and attach another disk (HDD) for the mariadb data.

Ferran

drrtuy

unread,

Apr 30, 2022, 4:28:13 AM4/30/22

to MariaDB ColumnStore

You need _bin collation not _ci b/c semantics of a _bin collation allows to just threat chars as binary numbers. JFYI We have some improvements for non-ascii, no _bin collations, e.g. vectorization that is to come out with MCS for MDB 10.8 and future releases.

For the second part. MDB doesn't pick up/creates dbroot dir on its own w/o CMAPI. Sorry I forget to mention that you need to create this /var/lib/columnstore/data2 dir and chown it so that it belongs to mysql.mysql.

And you need to redistribute data b/w dbroot1 and 2 that happens when you re-ingest the data into a table.

Regards,

Roman

четверг, 21 апреля 2022 г. в 13:29:40 UTC+3, fg...@g-n.com:

drrtuy

unread,

Apr 30, 2022, 4:28:58 AM4/30/22

to MariaDB ColumnStore

That is a peculiar observation. In most cases SSD doesn't make processing slower. Could you share monitoring data for CPU/disk util before and after you switched to HDD(if you have this data).

Regards,

Roman

вторник, 26 апреля 2022 г. в 18:10:46 UTC+3, fg...@g-n.com:

Ferran Gil Cuesta

unread,

May 2, 2022, 10:48:33 AM5/2/22

to MariaDB ColumnStore

We created the /var/lib/columnstore/data2 directory and left it empty. After after ingesting data, files started to appear there.

We created a table2 using "create table table2 like table1" and then inserted all data from table1 (8 GB, 200 milion records) via "insert into table2 select * from table1" (we saw, via htop, that it used cpimport in background). However, we were unable to see any improvement in speed, both querying table1 or table2. Is there anything else we need to do?

Regarding the observation between SSD and HDD, we really don't have specific monitoring data. We just saw an improvement with the same data and same cpu/memory. Google Cloud Engine allows to very easily load a snapshot of a disk (complete data and OS) into a new VM (were you can also change the disk details); this way we can quickly compare results with very small changes in configs. On our old MCS the disc was HDD, and we gave it a try. In both cases, the size of the disks were set at 500GB (obviously it is not a physical disk of that size; Google virtualises it).

Thanks for your help.

Ferran

drrtuy

unread,

Jun 18, 2022, 2:38:44 AM6/18/22

to MariaDB ColumnStore

Hey Ferran,

Maybe the structure of the query/-ies forces it to be processed either single-threaded or in ExeMgr so that multiple dbroots don't make sense for such queries. Can I take a look at them?

The second point regarding HDD calls for comparision b/w these two different execution plans. One can get a plan with timings calling select calsettrace(1); query; select calgettrace(0); There will be a difference b/w execution steps.

Regards,

Roman

понедельник, 2 мая 2022 г. в 17:48:33 UTC+3, fg...@g-n.com:

Ferran Gil Cuesta

unread,

Jun 21, 2022, 6:54:18 AM6/21/22

to MariaDB ColumnStore

Hey Roman,

Thank you for your reply.

First of all, the improvement we saw using HDD disk was indeed a wrong conclusion: it was faster, but because the VM had a different processor (AMD Milan); We further tested the same specs with just SSD vs HDD and indeed SSD was usually a bit faster. We are using SSD and will stick to it.

Almost all queries have subqueries, due to the structure of how we store the info into two different tables (last 90 days in xxx_recent, the other for older than 90 days). Additionally, we join another columnstore table, xxx_attributes, which is usually smaller than xxx_cons and xxx_recent.

SELECT device AS device, campaign_type AS campaign_type, IFNULL(SUM(cost),0) AS cost, IFNULL(SUM(clicks),0) AS clicks, IFNULL(SUM(revenue),0) AS revenue
FROM
((SELECT stats.device AS device, atts.campaign_type AS campaign_type, IFNULL(SUM(IFNULL(stats.cost,0)),0) AS cost, IFNULL(SUM(IFNULL(stats.clicks,0)),0) AS clicks, IFNULL(SUM(IFNULL(stats.revenue,0)),0) AS revenue FROM xxx_recent as stats INNER JOIN xxx_attributes as atts on stats.campaign_id = atts.campaign_id WHERE stats.day BETWEEN date("2022-01-01") AND date("2022-06-21") AND day >= date("2022-03-23") GROUP BY campaign_type, device )
UNION ALL
(SELECT stats.device AS device, atts.campaign_type AS campaign_type, IFNULL(SUM(IFNULL(stats.cost,0)),0) AS cost, IFNULL(SUM(IFNULL(stats.clicks,0)),0) AS clicks, IFNULL(SUM(IFNULL(stats.revenue,0)),0) AS revenue FROM xxx_cons as stats INNER JOIN xxx_attributes as atts on stats.campaign_id = atts.campaign_id WHERE stats.day BETWEEN date("2022-01-01") AND date("2022-06-21") AND day < date("2022-03-23") GROUP BY campaign_type, device )) as t
GROUP BY campaign_type, device
ORDER BY campaign_type, device
LIMIT 0,1000;

When we only need to request data for the last 90 days or less, we skip the xxx_cons from the query (as no data would come from it).

It's been a while since we last update the status of our setup: we are currently having issues with slow queries, specially after one or two days. After a reboot, inserts are "quick" (less than 1 second), but at some point the system degrades and inserts start taking 20 seconds, then 60... and even 300 seconds (the data on the inserts is equivalent; to be clear, slower inserts are NOT inserting more data). Every 3 days we perform a VM reboot, but so often we need to perform a manual reboot in between, as the system is so unusable. It is so tricky to reboot while INSERT queries are being performed, as in some cases we had issues with table locks (luckily we could use viewtablelock and cleartablelock them...) and rollbacks (that never end). When those things happen, MariaDB starts in readonly mode, and even if we change it to readwrite, after a reboot it starts in readonly again. We had to build a new VM instance from a backup as we were not able to keep them in read mode at all.

We are a bit lost on what to test or how to improve stability. Current system is stable for a day or two, until it starts degrading and a reboot is needed (but, as explained, it is a dangerous process to reboot while queries are being executed).

We know we can improve performance by adding more CPUs and RAM (at a cost, of course), but the real issue is that we cannot trust the system to continue working for days without active monitoring.

Thank you very much for any advice you may have.

Best regards,

Ferran

Ferran Gil Cuesta

unread,

Jun 21, 2022, 7:09:37 AM6/21/22

to MariaDB ColumnStore

Hi Roman,

I am attaching an screenshot of the calgettrace(0) for a query like the one in my previous message. This is in a VM that only has one PM (the tests we did with two PM, which required us to re-ingest data, were performed in a test VM which is no longer available).

In this case it was quite fast, but for other clients it may require many more seconds.

I could run a new VM, configure it with 2 PM, re-ingest data and see if the output of a query like that one displays a different trace or not. It is a bit time-consuming task, but can be done if performance could be increased with the same server specs (basically same number of CPUs).

Thanks,

Ferran

drrtuy

unread,

Jun 26, 2022, 8:42:33 AM6/26/22

to MariaDB ColumnStore

There are three bottlenecks: 2 BPS scans from the stats + the second UNION part aggregation. Since there is no PIO only LIO(disk IO doesn't affect the performance) and data read from PrimProc's block buffer.

What is campaign_type's data type is?

Could you run the query for some time and monitor cpu utilization with top switched to thread output so that you can see the utilization patterns? Which thread names are in the top output and if CPU is underutilized? How many hardware threads do you have available. You still run this in the VMs, right?

Regards,

Roman

вторник, 21 июня 2022 г. в 14:09:37 UTC+3, fg...@g-n.com:

Ferran Gil Cuesta

unread,

Jun 28, 2022, 7:27:07 AM6/28/22

to MariaDB ColumnStore

campaign_type is a VARCHAR(128). Not all tables have this column, but they have similar ones (usually VARCHAR(128) too, even if in some cases the values are quite short).

I tested a new query and I did see PIO in the output of calgettrace(0). If I re-execute the same query, then calgettrace(0) shows 0 for PIO.

The query, which is a typical query, was:

First output:

Second output:

I guess this is OK due to caching.

We are running VM, yes. The test one says it has 8 threads, which is the same as the CPUs you choose when setting up the instance in Google Cloud Engine.

Output of lscpu is:

The VM we use in production has 16 threads (if we are not wrong):

When select queries are being executed CPU is always at 100%. I guess this is OK and expected. I am not sure if this output is the one you requested or not:

Thanks for any insight!

Ferran

drrtuy

unread,

Jun 30, 2022, 3:49:25 PM6/30/22

to MariaDB ColumnStore

I would say that you are caped by CPU ATM. There are two ways out of there: scale vertically(b/c you are bound by a single-node mode) adding more cores.

If you can install linux perf and show me the output for perf top run for a minute whilst running a typical workload I can tell you if it is possible to change a schema to reduce timings. The second stats scan is painful b/c of the wide filter on date, namely BW 2020-05-01 AND 2022-06-21. Can I take a look at the output for a full calgetstats. There is field that tells if Extent Elimination has any effect on this or not.

Regards,

Roman

вторник, 28 июня 2022 г. в 14:27:07 UTC+3, fg...@g-n.com:

Ferran Gil Cuesta

unread,

Jul 1, 2022, 10:59:34 AM7/1/22

to MariaDB ColumnStore

We installed linux perf and ran it (perf top) while performing a typical query (that took 30 seconds, we executed a few of them).

We could see this:

I am not sure if this is useful enough. If we need to change any setting in perf, just let us know.

Most of our tables have data since January 2019 (not older). We currently have them in two tables: one for only the last 90 days, another for older than 90 days (up to January 2019). I wonder if we would see an improvement in query time if we split each table by years, and we only query the tables needed. It seems overly complicated and specially the opposite direction to what columnstore has to offer (big tables with good read performance).

You said we have two options, but we were missing the second one. First one, adding more CPU, works for sure (we did lot of testing). More CPUs, less time in queries, but also more expensive VMs.

Thanks!

drrtuy

unread,

Jul 11, 2022, 3:10:48 AM7/11/22

to MariaDB ColumnStore

Breaking tables in pieces doesn't sound like a good approach.
The second approach would be to check if the data is clustered by day field you are filtering with. Could you find out the oid for the day column(SELECT column_id FROM information_schema.columnstore_tables where and column_name='y'). Then use the oid to query Extent Map with editem -o ${oid}.
Speaking about stats.day. There is some overlap in the filters, namely stats.day BETWEEN date("2022-05-01") and something AND day >= date("2022-05-23").
The upcoming release(August 2022) will contain some important speedup features for a single-node.

Regards,

Roman

пятница, 1 июля 2022 г. в 17:59:34 UTC+3, fg...@g-n.com:

Ferran Gil Cuesta

unread,

Jul 11, 2022, 10:55:54 AM7/11/22

to MariaDB ColumnStore

We are unable to display any information via editem. What we did was:

- Query to view oid for each table:

- Called editem -o with many of those object_id but we didn't get any output, ever. Even with -v (verbose), no output:

We are unsure if we should always get some output or just in specific cases. We queried the database (some specific tables) and after that called editem for those tables, but same results (= no output).

About the overlap, you are right. The queries use a union all of two subqueries, to the xx_recent and xx_cons tables. xx_recent table only holds data for last ~90 days, and cons for older than that. Initially queries only had the where stats.day between xxx and yyy (where xxx and yyy comes from user input). We added a second condition at the end with the goal to speed them up somehow (but we fully test this to be honest). The idea is to allow mariadb/columnstore to quickly filter data. We can probably improve it and only query the tables that may have some data and with just the between condition. What we are already doing is only querying the xx_recent table if the requested start date is bigger than today - 90 days; we should do the same for queries that request old data (i.e: avoid querying the xx_recent tables altogether).

We will look forward for the August update. Great news.

Thanks,

Ferran

Ferran Gil Cuesta

unread,

Jul 12, 2022, 5:11:33 AM7/12/22

to MariaDB ColumnStore

Hi Roman,

I managed to get the correct object_id for the day column (for my own future reference, the query to view all columns of all tables, with its information was: select * from information_schema.columnstore_columns;)

For our biggest table, we can see this:

There are some strange values (32767-15-63)... but the rest are values within the range we have in the database (2018 up to today).

I queried this table and I don't see those strange values.

For the same client but the table with recent data (which we recreate every few days, so I guess that's why the oid is a bigger number):

Min and max values make sense too, here.

Thanks,

Ferran

Ferran Gil Cuesta

unread,

Sep 7, 2022, 10:54:52 AM9/7/22

to MariaDB ColumnStore

Any news on the upcoming release? I could not find any info on the website.

The upcoming release(August 2022) will contain some important speedup features for a single-node.

Thanks,

Ferran

drrtuy

unread,

Sep 16, 2022, 9:59:50 AM9/16/22

to MariaDB ColumnStore

Hey Ferran,

So let me first comment on those performance suggestion that I made back in time. As you can see the min/max ranges overlaps so MCS has to potentially look into multiple extents. To optimize the ranges one needs to sort the data before the ingestion. But the clustering/sorting feature is in the nearest roadmap.

The weird values looks resembles NULL values represented as a date so you probably have extents full of nulls or empty.

Unfortunately MDB server team can't adjust their CI nodes in time to deliver the new release(it uses C++20 so needs a relatively fresh compiler or some tweaks) so the closest release will still contain MCS 6 that doesn't have the feature I told about. I think this new release will be out in the next MDB release though.

Regards,

Roman

среда, 7 сентября 2022 г. в 17:54:52 UTC+3, fg...@g-n.com:

Ferran Gil Cuesta

unread,

Sep 21, 2022, 10:46:10 AM9/21/22

to MariaDB ColumnStore

Thanks Roman,

We loaded the data sorted by months (which, more or less, should be also sorted by day). A part from that, what is clear (because I checked it) is that we don't have any NULL or empty day for that table (the one I passed to editem).

I guess I could create a copy of the table and ingest data to it, day by day, and then check again with editem (and the new oid for the column of the new table).