Table with 23mil objects slow SUM

34 views
Skip to first unread message

pantonis

unread,
May 27, 2023, 7:40:53 AM5/27/23
to MariaDB ColumnStore
I have a table that currently hosts 23 million rows

I run the following query
SELECT SUM(Profit)
FROM TradeFact
WHERE DimAccountKey = 1234

I have tried with different account keys. it always takes between 53-58 seconds to run
Profit is double
DimAccountKey is int type

I have run EXPLAIN
Capture.JPG


Why is it so slow. I m running it on a MCS Community installed HyperV Ubuntu 20.04 with 8 cores and 32GB of RAM.

Any idea why is it so slow. It is a the simplest aggregation however it is very slow




Faisal Saeed

unread,
May 27, 2023, 11:05:39 AM5/27/23
to pantonis, MariaDB ColumnStore
Can you please try to create a new table and insert the same data into that new table as 

CREATE TABLE TradeFactSorted LIKE TradeFact;

INSERT INTO TradeFactSorted SELECT * FROM TradeFact ORDER BY DimAccountKey;

Then execute your select based on the new TradeFactSorted table.

Just for the sake of testing. Your DimAccountKey might be too fragmented into many extents that might explain the slowness. 

One extent holds 8 million rows, just for your info.

Thanks,
Faisal.


--
You received this message because you are subscribed to the Google Groups "MariaDB ColumnStore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mariadb-columns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mariadb-columnstore/6b54b235-314b-40cc-aa19-d4868bb74807n%40googlegroups.com.
--
Best Regards,
Faisal Saeed | Principal Solution Architect
MariaDB Corporation | Tel: +65 9386 5771 | fai...@mariadb.com | Twitter: @RF_FaisalZ

Message has been deleted

Todd Stoffel

unread,
May 27, 2023, 3:18:33 PM5/27/23
to MariaDB ColumnStore
I'm still not convinced there isn't a problem with your HyperV environment. I just duplicated this test using our flight data:

https://github.com/mariadb-corporation/mariadb-columnstore-samples

I'm also using our Docker image:


I have 4 CPU and 8 GB of RAM allocated for my container.

The flights table has over 38 million rows.

MariaDB [columnstore_bts]> SELECT SUM(distance)
    -> FROM flights
    -> WHERE carrier = 'UA';
+---------------+
| SUM(distance) |
+---------------+
3866035471.00 |
+---------------+
1 row in set (0.329 sec)

pantonis

unread,
May 28, 2023, 6:13:49 AM5/28/23
to MariaDB ColumnStore
Hello Todd. This is a brand new VM on another machine Intel I9 with 128GB of RAM of where 32GB allocated to VM and also 10 cores.

I have tried my tests on 4 different VMs and event tried a physical server. All of them give problematic performance or errors. It is either my configuration wrong or something is wrong with the MCS.

drrtuy

unread,
May 28, 2023, 7:02:03 AM5/28/23
to MariaDB ColumnStore
Hi!
Could you run the query this way  to collect some statistics.
select mcssettrace(1); run query; select mcsgettrace(0)\G;
and share the output. It is 'EXPLAIN ANALYZE' for MCS.
I would appreciate if you share the output for
select mcsgettrace(1)\G after running the query.

Regards,
Roman

суббота, 27 мая 2023 г. в 14:40:53 UTC+3, pantonis:

Antonis Panagi

unread,
May 28, 2023, 7:04:47 AM5/28/23
to drrtuy, MariaDB ColumnStore
Thank you for your reply. I will do of course. Could you send me your email address as I cannot disclose any more data public?

Thank you very much,

Antonis

--
You received this message because you are subscribed to a topic in the Google Groups "MariaDB ColumnStore" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mariadb-columnstore/TZ4e5Aih28U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mariadb-columns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mariadb-columnstore/9ed5d72d-eac2-47b7-84e5-94423b1dfc24n%40googlegroups.com.

pantonis

unread,
May 29, 2023, 2:19:03 AM5/29/23
to MariaDB ColumnStore
I tried the following:

SELECT calSetTrace(1);
SELECT SUM(Profit) FROM TradeFact WHERE DimAccountKey = 1234;
select calGetTrace();

and the 3rd query returns null

pantonis

unread,
May 29, 2023, 11:12:32 AM5/29/23
to MariaDB ColumnStore
How do we avoid fragmentation?

We are using INSERT INTO Temp tables and then insert into columnstore from temp tables. We cannot use cpimport as we are using a fully automated solution written in .NET.

Any help please would be highly appreciated?

Roland Noland

unread,
May 30, 2023, 3:37:11 AM5/30/23
to pantonis, MariaDB ColumnStore
Could you try "select calgettrace(0);"(mind 0 argument)?
Moreover the extended explain output is available in journalctl -u mcs-exemng.

Regards,
Roman

пн, 29 мая 2023 г., 7:19 pantonis <anton...@gmail.com>:
You received this message because you are subscribed to the Google Groups "MariaDB ColumnStore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mariadb-columns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mariadb-columnstore/ff489082-25ac-4829-9ff3-2cc0e4c41550n%40googlegroups.com.

pantonis

unread,
May 30, 2023, 3:44:36 AM5/30/23
to MariaDB ColumnStore
Hello Roland,

Due to space limitation I had to remove the entire database.

The question here is how do we avoid fragmentation, as I suspect what Faisal mentioned might be the case. We have a job running every 1 minute adding new records (about 30 records each minute). 
And during initial load of database we have daily inserts (about 150-200k rows per day)
We are not using cpimport as we use a .net microservice to add rows to database using INSERT INTO Temp table and then INSERT INTO FactTable SELECT from Temptable

Thank you

Allen Herrera

unread,
May 30, 2023, 12:08:52 PM5/30/23
to pantonis, MariaDB ColumnStore
Hey Pantois,

Can you show a create table for TradeFact. I believe you have it created as an innodb engine instead of columnstore. 
Please make sure ENGINE=Columnstore

I suspect this because
1) the explain should have showed select_type = PUSHED SELECT, meaning its pushing the query to the columnstore engine.
2) the calgettrace should have returned the columnstore query plan but it returned null meaning nothing was done in columnstore.



--

Allen Herrera
Customer Engineer | MariaDB Corporation
+1.360.888.3938 | calendly.com/allen-herrera | allen.herrera@mariadb.com

Roland Noland

unread,
May 30, 2023, 1:03:14 PM5/30/23
to pantonis, MariaDB ColumnStore
The 8 000 0000 column value series is called an extent. An extent has min/max values. As far as I understand Faisal meant the min/max values for most extents becoming so wide so extent elimination that preliminary filters the extents out has a smaller impact.
I would start looking into the extent ranges using either select * from information_schema.columnstore_extents or `editem -d` utility output.

Regards,
Roman

вт, 30 мая 2023 г. в 10:44, pantonis <anton...@gmail.com>:

Faisal Saeed

unread,
May 30, 2023, 1:07:54 PM5/30/23
to Roland Noland, pantonis, MariaDB ColumnStore
Hi Roland,

That's exactly what I meant that the query is reading all the extents and taking so much time. My idea was to sort the table by the "criteria" field if that does not make any difference then there is definitely something wrong with the VM.

Best Regards,
Faisal Saeed | Principal Solution Architect
MariaDB Corporation | Tel: +65 9386 5771 | fai...@mariadb.com | Twitter: @RF_FaisalZ


Reply all
Reply to author
Forward
0 new messages