<VersionBufferFileSize>8GB</VersionBufferFileSize> (to mimic old-and-still-working server setting)
AllowDiskBasedAggregation is kept to N, but with the same database tables we can't perform a "insert into xxx select xxx from xx group by xxx". This query does work (~8minutes) in old server. In this server, enabling disk aggr, we can complete the query.
<TotalUmMemory>25%</TotalUmMemory>, <NumBlocksPct>50</NumBlocksPct>
Feb 22 07:17:33 mariadb-columnstore-start.sh[534]: writeengine[1315]: 33.968252 |0|0|0| I 19 CAL0060: dbbuilder system catalog status: System catalog appears to exist. It will remain intact for reuse. The database is not recreated.
- I am unable to find the tool columnstoreSupport in our system.
- Current systemctl status of mariadb-columnstore: The cropped line says
- Current status of mariadb:
We constantly get the message for aborted connection, which ends with (Got an error reading communication packets) and think it's related to the threads thing we mentioned above. However, we're not sure if this eventually breaks columnstore.
Thank you very much for your reply.We need to further debug the connections thing; currently we are not sure if the software is leaving them behind or not, but it is possible.What we can clearly see is that the number of threads are of ExeMgr, as we can see them here:
2792 at screenshot time, but they keep going up to ~5000, until they jump down to 400 to start growing again.
The command ps -o pid,cmd,thcount -x --sort thcount only lists a value of 1 for each CMD, though.
Running systemctl status mcs-* we can see some more detailed info. We see a line about jemalloc but we are unsure if this is something we already have or not on our system.
[New LWP 23715]
[New LWP 23970]
[New LWP 24102]
[New LWP 24185]
[New LWP 24222]
[New LWP 24225]
[New LWP 24299]
[New LWP 24492]
and then info about each specific thread, although they all look the same:
Thread 3611 (Thread 0x7f7a8d05d700 (LWP 26218)):
#0 __libc_read (nbytes=1, buf=0x7f88f1bf149f, fd=3439) at ../sysdeps/unix/sysv/linux/read.c:26
#1 __libc_read (fd=3439, buf=0x7f88f1bf149f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24
#2 0x00007f8e776f9c22 in messageqcpp::InetStreamSocket::readToMagic(long, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#3 0x00007f8e776fa8fc in messageqcpp::InetStreamSocket::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#4 0x00007f8e776ff4e0 in messageqcpp::CompressedInetStreamSocket::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#5 0x0000564d5da060d5 in messageqcpp::IOSocket::read(timespec const*, bool*, messageqcpp::Stats*) const ()
#6 0x0000564d5d9fd7cb in ?? ()
#7 0x00007f8e772135e0 in threadpool::ThreadPool::beginThread() () from /lib/x86_64-linux-gnu/libthreadpool.so
#8 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0
#9 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#10 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 1111 (Thread 0x7f820c1a6700 (LWP 1717)):
#0 futex_reltimed_wait_cancelable (private=0, reltime=0x7f820c1a4220, expected=0, futex_word=0x564d5da23174 <joblist::JobStep::jobstepThreadPool+276>) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
#1 __pthread_cond_wait_common (abstime=0x7f820c1a42f0, mutex=0x564d5da23120 <joblist::JobStep::jobstepThreadPool+192>, cond=0x564d5da23148 <joblist::JobStep::jobstepThreadPool+232>) at pthread_cond_wait.c:533
#2 __pthread_cond_timedwait (cond=0x564d5da23148 <joblist::JobStep::jobstepThreadPool+232>, mutex=0x564d5da23120 <joblist::JobStep::jobstepThreadPool+192>, abstime=0x7f820c1a42f0) at pthread_cond_wait.c:667
#3 0x00007f8e77213b51 in threadpool::ThreadPool::beginThread() () from /lib/x86_64-linux-gnu/libthreadpool.so
#4 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0
#5 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 1110 (Thread 0x7f820c9a7700 (LWP 1716)):
#0 futex_reltimed_wait_cancelable (private=0, reltime=0x7f820c9a5220, expected=0, futex_word=0x564d5da23174 <joblist::JobStep::jobstepThreadPool+276>) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
#1 __pthread_cond_wait_common (abstime=0x7f820c9a52f0, mutex=0x564d5da23120 <joblist::JobStep::jobstepThreadPool+192>, cond=0x564d5da23148 <joblist::JobStep::jobstepThreadPool+232>) at pthread_cond_wait.c:533
#2 __pthread_cond_timedwait (cond=0x564d5da23148 <joblist::JobStep::jobstepThreadPool+232>, mutex=0x564d5da23120 <joblist::JobStep::jobstepThreadPool+192>, abstime=0x7f820c9a52f0) at pthread_cond_wait.c:667
#3 0x00007f8e77213b51 in threadpool::ThreadPool::beginThread() () from /lib/x86_64-linux-gnu/libthreadpool.so
#4 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0
#5 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
And the last 5 are different as well:
Thread 5 (Thread 0x7f8e71fff700 (LWP 23745)):
#0 futex_reltimed_wait_cancelable (private=0, reltime=0x7f8e71ffcf50, expected=0, futex_word=0x7fff053554d8) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
#1 __pthread_cond_wait_common (abstime=0x7f8e71ffd060, mutex=0x7fff05355488, cond=0x7fff053554b0) at pthread_cond_wait.c:533
#2 __pthread_cond_timedwait (cond=0x7fff053554b0, mutex=0x7fff05355488, abstime=0x7f8e71ffd060) at pthread_cond_wait.c:667
#3 0x00007f8e7720e43e in threadpool::ThreadPool::pruneThread() () from /lib/x86_64-linux-gnu/libthreadpool.so
#4 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0
#5 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 4 (Thread 0x7f8e72c8a700 (LWP 23743)):
#0 0x00007f8e77311bf0 in __GI___nanosleep (requested_time=0x7f8e72c885e0, remaining=0x7f8e72c885f0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1 0x00007f8e779c2145 in utils::MonitorProcMem::pause_() const () from /lib/x86_64-linux-gnu/libcommon.so
#2 0x0000564d5d9f8848 in ?? ()
#3 0x0000564d5d9f89b0 in ?? ()
#4 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0
#5 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 3 (Thread 0x7f8e73dff700 (LWP 23741)):
#0 __libc_read (nbytes=1, buf=0x7f8e7424f51f, fd=6) at ../sysdeps/unix/sysv/linux/read.c:26
#1 __libc_read (fd=6, buf=0x7f8e7424f51f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24
#2 0x00007f8e776f9c22 in messageqcpp::InetStreamSocket::readToMagic(long, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#3 0x00007f8e776fa8fc in messageqcpp::InetStreamSocket::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#4 0x00007f8e776ff4e0 in messageqcpp::CompressedInetStreamSocket::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#5 0x00007f8e776e8e79 in messageqcpp::MessageQueueClient::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#6 0x00007f8e784bf6a8 in joblist::DistributedEngineComm::Listen(boost::shared_ptr<messageqcpp::MessageQueueClient>, unsigned int) () from /lib/x86_64-linux-gnu/libjoblist.so
#7 0x00007f8e784bfa33 in ?? () from /lib/x86_64-linux-gnu/libjoblist.so
#8 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0
#9 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#10 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 2 (Thread 0x7f8e735fe700 (LWP 23739)):
#0 __libc_read (nbytes=1, buf=0x7f8e7424f58f, fd=5) at ../sysdeps/unix/sysv/linux/read.c:26
#1 __libc_read (fd=5, buf=0x7f8e7424f58f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24
#2 0x00007f8e776f9c22 in messageqcpp::InetStreamSocket::readToMagic(long, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#3 0x00007f8e776fa8fc in messageqcpp::InetStreamSocket::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#4 0x00007f8e776ff4e0 in messageqcpp::CompressedInetStreamSocket::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#5 0x00007f8e776e8e79 in messageqcpp::MessageQueueClient::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#6 0x00007f8e784bf6a8 in joblist::DistributedEngineComm::Listen(boost::shared_ptr<messageqcpp::MessageQueueClient>, unsigned int) () from /lib/x86_64-linux-gnu/libjoblist.so
#7 0x00007f8e784bfa33 in ?? () from /lib/x86_64-linux-gnu/libjoblist.so
#8 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0
#9 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#10 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 1 (Thread 0x7f8e74673f40 (LWP 23738)):
#0 0x00007f8e77311667 in __libc_accept (fd=13, addr=..., len=0x7fff053540f4) at ../sysdeps/unix/sysv/linux/accept.c:26
#1 0x00007f8e776ffa98 in messageqcpp::CompressedInetStreamSocket::accept(timespec const*) () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#2 0x00007f8e776e8626 in messageqcpp::MessageQueueServer::accept(timespec const*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#3 0x0000564d5d9fba03 in ?? ()
#4 0x0000564d5d9f6058 in ?? ()
#5 0x00007f8e768f909b in __libc_start_main (main=0x564d5d9f5e70, argc=1, argv=0x7fff05355798, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff05355788) at ../csu/libc-start.c:308
#6 0x0000564d5d9f750a in ?? ()
[Inferior 1 (process 23738) detached]
I can send the full output if needed, but it's just these same messages repeated so many times.
I am unsure if we have "debuginfo symbols package installed for MCS" or not, or how can we check or install them.
Thank you very much.
Ferran
Thread 3611 (Thread 0x7f7a8d05d700 (LWP 26218)):#0 __libc_read (nbytes=1, buf=0x7f88f1bf149f, fd=3439) at ../sysdeps/unix/sysv/linux/read.c:26
#1 __libc_read (fd=3439, buf=0x7f88f1bf149f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24
#2 0x00007f8e776f9c22 in messageqcpp::InetStreamSocket::readToMagic(long, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#3 0x00007f8e776fa8fc in messageqcpp::InetStreamSocket::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#4 0x00007f8e776ff4e0 in messageqcpp::CompressedInetStreamSocket::read(timespec const*, bool*, messageqcpp::Stats*) const () from /lib/x86_64-linux-gnu/libmessageqcpp.so
#5 0x0000564d5da060d5 in messageqcpp::IOSocket::read(timespec const*, bool*, messageqcpp::Stats*) const ()
#6 0x0000564d5d9fd7cb in ?? ()
#7 0x00007f8e772135e0 in threadpool::ThreadPool::beginThread() () from /lib/x86_64-linux-gnu/libthreadpool.so
#8 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0
#9 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#10 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Some of them are a bit different:Thread 1111 (Thread 0x7f820c1a6700 (LWP 1717)):
#0 futex_reltimed_wait_cancelable (private=0, reltime=0x7f820c1a4220, expected=0, futex_word=0x564d5da23174 <joblist::JobStep::jobstepThreadPool+276>) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
#1 __pthread_cond_wait_common (abstime=0x7f820c1a42f0, mutex=0x564d5da23120 <joblist::JobStep::jobstepThreadPool+192>, cond=0x564d5da23148 <joblist::JobStep::jobstepThreadPool+232>) at pthread_cond_wait.c:533
#2 __pthread_cond_timedwait (cond=0x564d5da23148 <joblist::JobStep::jobstepThreadPool+232>, mutex=0x564d5da23120 <joblist::JobStep::jobstepThreadPool+192>, abstime=0x7f820c1a42f0) at pthread_cond_wait.c:667
#3 0x00007f8e77213b51 in threadpool::ThreadPool::beginThread() () from /lib/x86_64-linux-gnu/libthreadpool.so
#4 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0
#5 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 1110 (Thread 0x7f820c9a7700 (LWP 1716)):
#0 futex_reltimed_wait_cancelable (private=0, reltime=0x7f820c9a5220, expected=0, futex_word=0x564d5da23174 <joblist::JobStep::jobstepThreadPool+276>) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
#1 __pthread_cond_wait_common (abstime=0x7f820c9a52f0, mutex=0x564d5da23120 <joblist::JobStep::jobstepThreadPool+192>, cond=0x564d5da23148 <joblist::JobStep::jobstepThreadPool+232>) at pthread_cond_wait.c:533
#2 __pthread_cond_timedwait (cond=0x564d5da23148 <joblist::JobStep::jobstepThreadPool+232>, mutex=0x564d5da23120 <joblist::JobStep::jobstepThreadPool+192>, abstime=0x7f820c9a52f0) at pthread_cond_wait.c:667
#3 0x00007f8e77213b51 in threadpool::ThreadPool::beginThread() () from /lib/x86_64-linux-gnu/libthreadpool.so
#4 0x00007f8e7761b615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0
#5 0x00007f8e77307fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6 0x00007f8e769ce4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
And the last 5 are different as well:
I can send the full output if needed, but it's just these same messages repeated so many times.
I am unsure if we have "debuginfo symbols package installed for MCS" or not, or how can we check or install them.
strace -p $(pidof ExeMgr) Was printing repetitive lines, like:
accept(13, {sa_family=AF_INET, sin_port=htons(52196), sin_addr=inet_addr("127.0.0.1")}, [16]) = 66
sendto(66, "A", 1, 0, NULL, 0) = 1
futex(0x7fff9048cba4, FUTEX_WAKE_PRIVATE, 1) = 1
accept(13, {sa_family=AF_INET, sin_port=htons(52198), sin_addr=inet_addr("127.0.0.1")}, [16]) = 67
sendto(67, "A", 1, 0, NULL, 0) = 1
futex(0x7fff9048cba4, FUTEX_WAKE_PRIVATE, 1) = 1
accept(13, {sa_family=AF_INET, sin_port=htons(52200), sin_addr=inet_addr("127.0.0.1")}, [16]) = 69
sendto(69, "A", 1, 0, NULL, 0) = 1
futex(0x7fff9048cba0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7fff9048cb50, FUTEX_WAKE_PRIVATE, 1) = 1
accept(13, {sa_family=AF_INET, sin_port=htons(52202), sin_addr=inet_addr("127.0.0.1")}, [16]) = 15
sendto(15, "A", 1, 0, NULL, 0) = 1
futex(0x7fff9048cba0, FUTEX_WAKE_PRIVATE, 1) = 1
accept(13, {sa_family=AF_INET, sin_port=htons(52204), sin_addr=inet_addr("127.0.0.1")}, [16]) = 16
sendto(16, "A", 1, 0, NULL, 0) = 1
futex(0x7fff9048cba0, FUTEX_WAKE_PRIVATE, 1) = 1
accept(13, {sa_family=AF_INET, sin_port=htons(52206), sin_addr=inet_addr("127.0.0.1")}, [16]) = 19
sendto(19, "A", 1, 0, NULL, 0) = 1
futex(0x7fff9048cba0, FUTEX_WAKE_PRIVATE, 1) = 1
cat /proc/$(pidof PrimProc)/status | grep -i threads It kept stable displaying 78 / 79 threads.
cat /proc/$(pidof ExeMgr)/status | grep -i threads I saw this was up to 195 when I was running our own ETL (just updating some client data). But then I stopped out ETL, rebooted mariadb and mariadb-columnstore, and just executing the bash script of the jira report, values kept quite stable at 158.
ps -efT | grep ExeMgr | wc -l displayed 158... then 160 but kept stable at this number.
I ran the script every 5 seconds for at least 20 minutes, and it all went OK.
After executing the script, I could briefly see the queries with show processlist; but load on the server was fine.
We did this test in a Google Cloud VM with 16 vCPU and 64GB RAM. OS was Debian 10.
Unfortunately I don't seem to be able to reproduce the issue using the script, although it looks similar to what seems to be happening to us when processing our data.
Thanks
Ferran
--
You received this message because you are subscribed to a topic in the Google Groups "MariaDB ColumnStore" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mariadb-columnstore/rnMcDpj6gEQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mariadb-columns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mariadb-columnstore/38cc84fc-d11c-4442-aa02-72d69d9eb80en%40googlegroups.com.
You received this message because you are subscribed to the Google Groups "MariaDB ColumnStore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mariadb-columns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mariadb-columnstore/7e06fd6a-6871-4c77-a7bd-fd459545d5adn%40googlegroups.com.
When ExeMgr finishes serving a request from MariaDB Server, ExeMgr's TCP connection can remain open, and its thread can continue running, which can cause ColumnStore to use more resources than required.
Hi Roman,