MCS hanging every few days

Ferran Gil Cuesta

unread,

Feb 22, 2022, 9:16:55 AM2/22/22

to MariaDB ColumnStore

Greetings,

First of all, thank you for your work on MariaDB ColumnStore. We have been using since 2017, but we wanted to upgrade our setup, starting from a fresh install of a recent version. Seems fair and simple, but problems arise.

Server details:

Google Cloud Engine VM, with 128GB RAM, 16 vCPUs, 500GB disk
Debian 10
MariaDB Community Server 10.6.5
ColumnStore 5.5 (installed following this)

Settings are mostly defaults with a few comments:

<VersionBufferFileSize>8GB</VersionBufferFileSize> (to mimic old-and-still-working server setting)
AllowDiskBasedAggregation is kept to N, but with the same database tables we can't perform a "insert into xxx select xxx from xx group by xxx". This query does work (~8minutes) in old server. In this server, enabling disk aggr, we can complete the query.
<TotalUmMemory>25%</TotalUmMemory>, <NumBlocksPct>50</NumBlocksPct>

Currently, data is updated with simple inserts (using executemany, in Python), and a few selects.

Data size 100 GB, distributed in 38 tables (36 columnstore, 2 small InnoDB):

After a few days of updating data from all tables, the database stops working with error: Internal error: MCS-2033: Error occurred when calling system catalog.

A full restart of the VM is required (trying via systemctl restart mariadb mariadb-columnstore does not finish after at least 30 minutes).

After restarting, all data seems fine and system continues to work for a few days. No data corruption or any manual task performed.

How can we troubleshoot this big issue?

We recently added munin to see if we could spot anything strange. A few comments:

mariadbd spikes in CPU usage once columnstore "breaks". At ~8:00am we rebooted the VM and eveything went back to normal.
It may be unrelated, but worth noting: threads keep increasing until they jump down, recurrently. When columnstore stops working, threads are kept (until the reboot). There's a relation on our update tasks and the threads growing (like if our task keeps them "open"), but we were unable to know if this is a real issue or not.
There's no swap on the server, but with 128GB of RAM, so far it seems unnecessary.

Additional information:

I am unable to find the tool columnstoreSupport in our system.
I think this is gone in current MCS versions (but I may be wrong, and happy to provide any additional data!)
The crit.log only has two lines once we rebooted this morning:
Feb 22 07:17:33 atlas-db-inc ddlpackageproc[1260]: 33.852930 |0|0|0| C 23 CAL0008: DMLProc main process has started
Feb 22 07:17:33 atlas-db-inc ddlpackageproc[1294]: 33.907095 |0|0|0| C 23 CAL0008: DDLProc main process has started

Current systemctl status of mariadb-columnstore: The cropped line says
Feb 22 07:17:33 mariadb-columnstore-start.sh[534]: writeengine[1315]: 33.968252 |0|0|0| I 19 CAL0060: dbbuilder system catalog status: System catalog appears to exist. It will remain intact for reuse. The database is not recreated.

Current status of mariadb:
We constantly get the message for aborted connection, which ends with (Got an error reading communication packets) and think it's related to the threads thing we mentioned above. However, we're not sure if this eventually breaks columnstore.
Finally, I am attaching the output of mcsGetConfig -a, in case there are any settings that we could tweak or test.

Thank you very much for taking the time to help us find what is going on.

Ferran

mcsGetConfig.txt

drrtuy

unread,

Feb 23, 2022, 2:07:02 PM2/23/22

to MariaDB ColumnStore

Hey Ferran,

I am glad that you benefit from MCS.

I will start with the suggestion how to future debug the case so we can help you solving this issue. I think that number of threads indirectly points to a thread leak that happens. We need to find out what is the process that spawns/leaks so many threads. Here is the example command to monitor per-process number of threads: ps -o pid,cmd,thcount -x --sort thcount

I also replied some questions inline.

вторник, 22 февраля 2022 г. в 17:16:55 UTC+3, fg...@g-n.com:

I am unable to find the tool columnstoreSupport in our system.

We dropped it in favor of CMAPI. But we will introduce a replacement for this tool.

Current systemctl status of mariadb-columnstore: The cropped line says

JFYI mariadb-columnstore is a one-shot unit to call bash script that starts all other units: mcs-workernode@1, mcs-controllernode etc. mariadb-columnstore journal doesn't have much info.

Current status of mariadb:
We constantly get the message for aborted connection, which ends with (Got an error reading communication packets) and think it's related to the threads thing we mentioned above. However, we're not sure if this eventually breaks columnstore.

This might be interesting. Are you sure the software doesn't quit early leaving MDB client connections behind?

Regards,
Roman

Ferran Gil Cuesta

unread,

Feb 24, 2022, 5:35:43 AM2/24/22

to MariaDB ColumnStore

Thank you very much for your reply.

We need to further debug the connections thing; currently we are not sure if the software is leaving them behind or not, but it is possible.

What we can clearly see is that the number of threads are of ExeMgr, as we can see them here:

2792 at screenshot time, but they keep going up to ~5000, until they jump down to 400 to start growing again.

The command ps -o pid,cmd,thcount -x --sort thcount only lists a value of 1 for each CMD, though.

Running systemctl status mcs-* we can see some more detailed info. We see a line about jemalloc but we are unsure if this is something we already have or not on our system.

Thank you in advance for your help.

Best regards,

Ferran

drrtuy

unread,

Feb 24, 2022, 9:22:58 AM2/24/22

to MariaDB ColumnStore

Replied inline.

четверг, 24 февраля 2022 г. в 13:35:43 UTC+3, fg...@g-n.com:

Thank you very much for your reply.

We need to further debug the connections thing; currently we are not sure if the software is leaving them behind or not, but it is possible.
What we can clearly see is that the number of threads are of ExeMgr, as we can see them here:

2792 at screenshot time, but they keep going up to ~5000, until they jump down to 400 to start growing again.

This is already something. Having this information I doubt this is caused by your software leaving connections.

The threads of ExeMgr(and other services) have names that describes the execution phase the threads are at. I usually look at these phases uses thread view of top utility.
Could I ask you to take a screen of thread view of EM in top or somehow gets the names of the threads that sits there?

The command ps -o pid,cmd,thcount -x --sort thcount only lists a value of 1 for each CMD, though.

GTK this is not a universal solution. It works for me though.

Running systemctl status mcs-* we can see some more detailed info. We see a line about jemalloc but we are unsure if this is something we already have or not on our system.

jemalloc is an important(for MCS) memory allocation lib. There might lots of side effects w/o jemmaloc. Unfortunately we can't officially add libjemaloc as a dependency for the engine package. Could you plz install libjemalloc into your system and restart MCS to see if it makes a difference?

Regards,

Roman

Ferran Gil Cuesta

unread,

Feb 24, 2022, 10:21:03 AM2/24/22

to MariaDB ColumnStore

Dear Roman,

Thanks again for your insights.

Let us add more information:

It seems that we already have libjemalloc since the beginning.

Using htop we can also see the names of the threads. I am adding a couple of screenshots.

In the top-right corner we can see there are 3440threads now.

Also, scrolling down we get the loooong (~3200) list of the threads under ExeMgr. Most of them are displaying "Unspecified". A few others Idle , but a minority.

The command pstree also displays the total threads of each type.

With ps -o pid,cmd,thcount -x --sort thcount this is what we see:

This is already something. Having this information I doubt this is caused by your software leaving connections.

The server is only running MCS, not even a web server. We fill (and query) the database from externally running python scripts, so seeing the threads going up like that made us think that they were the reason (also, reducing the frequency of the updates reduces the speed of growing threads). However, we have no reason to think that threads eventually block the system (or at least it does not happen every time).

Feel free to request anything else that can be of help.

Thanks!

Ferran

drrtuy

unread,

Feb 25, 2022, 1:25:21 PM2/25/22

to MariaDB ColumnStore

I found another issue that looks like yours. Unfortunately I had no clue in the issue. I bet that the cluster eventually fails when the threads count reaches certain number. From here I see two ways to address the issue: get threads call stack to analyze why and which threads are stuck and another is to repro the issue using your DDL and procedures.
Here is the procedure to get the threads call stack(you need debuginfo symbols package installed for MCS though).
gdb -p $(pidof ExeMgr) -batch -ex "set logging on" -ex "thr apply all bt" \ -ex quit
Plz note it might take a loot time to output callstacks for all threads you have now so it might be better to do so after restart.

Plz share your ddl/ETL procedures privately if you can so I can try to repro the issue.

Regards,

Roman

четверг, 24 февраля 2022 г. в 18:21:03 UTC+3, fg...@g-n.com:

Ferran Gil Cuesta

unread,

Mar 3, 2022, 3:09:16 AM3/3/22

to MariaDB ColumnStore

Dear Roman,

Thanks for your reply.

After installing qdb, I ran the query and the output is long (as it printed 3614 threads) but just took a few seconds. It starts with a long list like

[New LWP 23715]

[New LWP 23970]

[New LWP 24102]

[New LWP 24185]

[New LWP 24222]

[New LWP 24225]

[New LWP 24299]

[New LWP 24492]

and then info about each specific thread, although they all look the same:

Thread 3611 (Thread 0x7f7a8d05d700 (LWP 26218)):

#0 __libc_read (nbytes=1, buf=0x7f88f1bf149f, fd=3439) at ../sysdeps/unix/sysv/linux/read.c:26

#1 __libc_read (fd=3439, buf=0x7f88f1bf149f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24

#2 0x00007f8e776f9c22 in messageqcpp::InetStreamSocket::readToMagic(long, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so

#3 0x00007f8e776fa8fc in messageqcpp::InetStreamSocket::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so

#4 0x00007f8e776ff4e0 in messageqcpp::CompressedInetStreamSocket::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so

#5 0x0000564d5da060d5 in messageqcpp::IOSocket::read(timespec const*, bool*, messageqcpp::Stats*) const ()

#6 0x0000564d5d9fd7cb in ?? ()

#7 0x00007f8e772135e0 in threadpool::ThreadPool::beginThread() () from /lib/x86_64-linux-gnu/libthreadpool.so

#8 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0

#9 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486

#10 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Some of them are a bit different:

Thread 1111 (Thread 0x7f820c1a6700 (LWP 1717)):

#0 futex_reltimed_wait_cancelable (private=0, reltime=0x7f820c1a4220, expected=0, futex_word=0x564d5da23174 <joblist::JobStep::jobstepThreadPool+276>) at ../sysdeps/unix/sysv/linux/futex-internal.h:142

#1 __pthread_cond_wait_common (abstime=0x7f820c1a42f0, mutex=0x564d5da23120 <joblist::JobStep::jobstepThreadPool+192>, cond=0x564d5da23148 <joblist::JobStep::jobstepThreadPool+232>) at pthread_cond_wait.c:533

#2 __pthread_cond_timedwait (cond=0x564d5da23148 <joblist::JobStep::jobstepThreadPool+232>, mutex=0x564d5da23120 <joblist::JobStep::jobstepThreadPool+192>, abstime=0x7f820c1a42f0) at pthread_cond_wait.c:667

#3 0x00007f8e77213b51 in threadpool::ThreadPool::beginThread() () from /lib/x86_64-linux-gnu/libthreadpool.so

#4 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0

#5 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486

#6 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1110 (Thread 0x7f820c9a7700 (LWP 1716)):

#0 futex_reltimed_wait_cancelable (private=0, reltime=0x7f820c9a5220, expected=0, futex_word=0x564d5da23174 <joblist::JobStep::jobstepThreadPool+276>) at ../sysdeps/unix/sysv/linux/futex-internal.h:142

#1 __pthread_cond_wait_common (abstime=0x7f820c9a52f0, mutex=0x564d5da23120 <joblist::JobStep::jobstepThreadPool+192>, cond=0x564d5da23148 <joblist::JobStep::jobstepThreadPool+232>) at pthread_cond_wait.c:533

#2 __pthread_cond_timedwait (cond=0x564d5da23148 <joblist::JobStep::jobstepThreadPool+232>, mutex=0x564d5da23120 <joblist::JobStep::jobstepThreadPool+192>, abstime=0x7f820c9a52f0) at pthread_cond_wait.c:667

#3 0x00007f8e77213b51 in threadpool::ThreadPool::beginThread() () from /lib/x86_64-linux-gnu/libthreadpool.so

#4 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0

#5 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486

#6 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

And the last 5 are different as well:

Thread 5 (Thread 0x7f8e71fff700 (LWP 23745)):

#0 futex_reltimed_wait_cancelable (private=0, reltime=0x7f8e71ffcf50, expected=0, futex_word=0x7fff053554d8) at ../sysdeps/unix/sysv/linux/futex-internal.h:142

#1 __pthread_cond_wait_common (abstime=0x7f8e71ffd060, mutex=0x7fff05355488, cond=0x7fff053554b0) at pthread_cond_wait.c:533

#2 __pthread_cond_timedwait (cond=0x7fff053554b0, mutex=0x7fff05355488, abstime=0x7f8e71ffd060) at pthread_cond_wait.c:667

#3 0x00007f8e7720e43e in threadpool::ThreadPool::pruneThread() () from /lib/x86_64-linux-gnu/libthreadpool.so

#4 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0

#5 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486

#6 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f8e72c8a700 (LWP 23743)):

#0 0x00007f8e77311bf0 in __GI___nanosleep (requested_time=0x7f8e72c885e0, remaining=0x7f8e72c885f0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28

#1 0x00007f8e779c2145 in utils::MonitorProcMem::pause_() const () from /lib/x86_64-linux-gnu/libcommon.so

#2 0x0000564d5d9f8848 in ?? ()

#3 0x0000564d5d9f89b0 in ?? ()

#4 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0

#5 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486

#6 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f8e73dff700 (LWP 23741)):

#0 __libc_read (nbytes=1, buf=0x7f8e7424f51f, fd=6) at ../sysdeps/unix/sysv/linux/read.c:26

#1 __libc_read (fd=6, buf=0x7f8e7424f51f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24

#2 0x00007f8e776f9c22 in messageqcpp::InetStreamSocket::readToMagic(long, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so

#3 0x00007f8e776fa8fc in messageqcpp::InetStreamSocket::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so

#4 0x00007f8e776ff4e0 in messageqcpp::CompressedInetStreamSocket::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so

#5 0x00007f8e776e8e79 in messageqcpp::MessageQueueClient::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so

#6 0x00007f8e784bf6a8 in joblist::DistributedEngineComm::Listen(boost::shared_ptr<messageqcpp::MessageQueueClient>, unsigned int) () from /lib/x86_64-linux-gnu/libjoblist.so

#7 0x00007f8e784bfa33 in ?? () from /lib/x86_64-linux-gnu/libjoblist.so

#8 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0

#9 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486

#10 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f8e735fe700 (LWP 23739)):

#0 __libc_read (nbytes=1, buf=0x7f8e7424f58f, fd=5) at ../sysdeps/unix/sysv/linux/read.c:26

#1 __libc_read (fd=5, buf=0x7f8e7424f58f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24