Locking issue with ProxySQL

Dirk Bonenkamp

unread,

Jan 25, 2017, 4:04:15 AM1/25/17

to proxysql

Hi All,

I'm experiencing a locking issue when using ProxySQL.

We run a 3 node Galera cluster, all non SELECT / SELECT ... FOR UPDATE queries are routed to 1 node.

We have some transactions which lock a table, do some stuff and then unlock the table. When running this directly on 1 of the cluster nodes, this works fine. However, when running it trough ProxySQL, we run into a strange problem.

We see a 'NULL' process, and the other operations are 'waiting for metadata lock':

| 44 | proteion | 10.0.1.40:35581 | proteion | Query | 40 | Waiting for table metadata lock | SELECT SUBSTR(journal_number, 4 , 5) as high FROM 01_invoice WHERE (administration_id = "29") AND | 0

| 47 | proteion | 10.0.1.40:35582 | proteion | Sleep | 40 | | NULL | 0

| 1111 | proteion | 10.0.1.40:36842 | proteion | Query | 40 | Waiting for table metadata lock | LOCK TABLES 01_invoice WRITE | 0

Killing the null process (47) clears things up.

Any ideas on how to solve this issue?

Thanks, kind regards,

Dirk

Dirk Bonenkamp

unread,

Jan 25, 2017, 4:29:02 AM1/25/17

to proxysql

I'd like to add that reverting to 1.3.1 seems to fix this issue...

Hmm, weird...!

Op woensdag 25 januari 2017 10:04:15 UTC+1 schreef Dirk Bonenkamp:

René Cannaò

unread,

Jan 25, 2017, 4:42:57 AM1/25/17

to Dirk Bonenkamp, proxysql

Hi Dirk,

Connection id=47 is perhaps holding some lock on table 01_invoice : the output of SHOW ENGINE INNODB STATUS could have helped.

in 1.3.2 and 1.3.3 there were 2 bugs fixed related to routing:

* in 1.3.2 : query session variables, combined with transaction_persistent==true were incorrectly disabling query routing #837

* in 1.3.3 : in some circumstances transaction_persistent could disable routing after commit #889

Are you using transaction_persistent ? If yes, we should investigate if a bug fix had a side effect for your specific workload.

How easy is to reproduce this issue? Can you easily reproduce it in a testing environment?

--
You received this message because you are subscribed to the Google Groups "proxysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dirk Bonenkamp

unread,

Jan 25, 2017, 6:30:24 AM1/25/17

to proxysql, dirkbo...@gmail.com

Hi Rene,

We are using transaction_persistent, so I think it has something to do with one of these issues.

I can reproduce it very easily in our test environment - if this is what you mean. Any steps I can take to help pinpointing and solving this issue?

Cheers,

Dirk

Op woensdag 25 januari 2017 10:42:57 UTC+1 schreef René Cannaò:

To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+u...@googlegroups.com.

René Cannaò

unread,

Jan 25, 2017, 7:14:09 AM1/25/17

to Dirk Bonenkamp, proxysql

Ok, so probably it is a side effect of fixing a bug in multiplexing.

Could you please try this in your testing environment:

a) enable logging globally. From Admin:
SET mysql-eventslog_filename='events.log';

SET mysql-eventslog_filesize=10485760;

LOAD MYSQL VARIABLES TO RUNTIME;

b) enabled logging for all the queries from user proteion

This is done in mysql_query_rules. I suggest it to be the first rule, so you could increase all the rule_id first. Something like:

UPDATE mysql_query_rules SET rule_id=rule_id+1;

INSERT INTO mysql_query_rules (rule_id, active, username, log) VALUES (1,1,'proteion',1);

LOAD MYSQL QUERY RULES TO RUNTIME;

c) send me a copy of the query rules:

SELECT * FROM mysql_query_rules;

d) reproduce the issue

e) send me a copy of "SHOW FULL PROCESSLIST" executed in Admin.
Yes, Admin has SHOW FULL PROCESSLIST :)
This step is very important, because the queries are logged in the events log when they complete, not before. Therefore I will need to understand what is not able to execute when the lock issue happens, and in which connection: I will then be able to correlate with the rest of the queries in the log.

f) send me the file(s) generated in /var/lib/proxysql/ named events.log.0000000X .

Please note that all the queries from user proteion will be logged, so if there are sensitive information maybe is better not to send in the mailing list.

Thanks,

René

To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+unsubscribe@googlegroups.com.

Dirk Bonenkamp

unread,

Jan 25, 2017, 11:51:52 AM1/25/17

to proxysql, dirkbo...@gmail.com

OK, here we go!

The query rules:

mysql> SELECT * FROM mysql_query_rules\G

*************************** 1. row ***************************

rule_id: 1

active: 1

username: proteion

schemaname: NULL

flagIN: 0

client_addr: NULL

proxy_addr: NULL

proxy_port: NULL

digest: NULL

match_digest: NULL

match_pattern: NULL

negate_match_pattern: 0

flagOUT: NULL

replace_pattern: NULL

destination_hostgroup: NULL

cache_ttl: NULL

reconnect: NULL

timeout: NULL

retries: NULL

delay: NULL

mirror_flagOUT: NULL

mirror_hostgroup: NULL

error_msg: NULL

log: 1

apply: 0

comment: NULL

*************************** 2. row ***************************

rule_id: 2

active: 1

username: NULL

schemaname: NULL

flagIN: 0

client_addr: NULL

proxy_addr: NULL

proxy_port: NULL

digest: NULL

match_digest: NULL

match_pattern: ^SELECT .* FOR UPDATE$

negate_match_pattern: 0

flagOUT: NULL

replace_pattern: NULL

destination_hostgroup: 10

cache_ttl: NULL

reconnect: NULL

timeout: NULL

retries: NULL

delay: NULL

mirror_flagOUT: NULL

mirror_hostgroup: NULL

error_msg: NULL

log: NULL

apply: 1

comment: NULL

*************************** 3. row ***************************

rule_id: 3

active: 1

username: NULL

schemaname: NULL

flagIN: 0

client_addr: NULL

proxy_addr: NULL

proxy_port: NULL

digest: NULL

match_digest: NULL

match_pattern: ^SELECT

negate_match_pattern: 0

flagOUT: NULL

replace_pattern: NULL

destination_hostgroup: 20

cache_ttl: NULL

reconnect: NULL

timeout: NULL

retries: NULL

delay: NULL

mirror_flagOUT: NULL

mirror_hostgroup: NULL

error_msg: NULL

log: NULL

apply: 1

comment: NULL

3 rows in set (0.00 sec)

The processlist

mysql> show full processlist\G

*************************** 1. row ***************************

ThreadID: 3

SessionID: 1239

user: proteion

db: proteion

cli_host: 127.0.0.1

cli_port: 47797

hostgroup: 20

l_srv_host: 10.0.1.40

l_srv_port: 43750

srv_host: 10.0.1.23

srv_port: 3306

command: Query

time_ms: 39104

info: SELECT SUBSTR(journal_number, 4 , 5) as high FROM 01_invoice WHERE (administration_id = "29") AND (CHAR_LENGTH(journal_number) = 8) AND (SUBSTR(journal_number, 3 , 1) = "7") GROUP BY id ORDER BY high DESC LIMIT 1

*************************** 2. row ***************************

ThreadID: 3

SessionID: 1296

user: proactive-config

db: proactive-config

cli_host: 127.0.0.1

cli_port: 48019

hostgroup: 20

l_srv_host: 10.0.1.40

l_srv_port: 54772

srv_host: 10.0.1.25

srv_port: 3306

command: Sleep

time_ms: 23877

info:

*************************** 3. row ***************************

ThreadID: 3

SessionID: 1299

user: proteion

db: proteion

cli_host: 127.0.0.1

cli_port: 48025

hostgroup: 10

l_srv_host: 10.0.1.40

l_srv_port: 43979

srv_host: 10.0.1.23

srv_port: 3306

command: Query

time_ms: 23692

info: LOCK TABLES 00_module WRITE, 01_invoice WRITE, 01_invoice_line WRITE, 01_invoice_line_order_line WRITE

*************************** 4. row ***************************

ThreadID: 1

SessionID: 1392

user: apollo

db: apollo

cli_host: 127.0.0.1

cli_port: 48426

hostgroup: 10

l_srv_host: 10.0.1.40

l_srv_port: 44380

srv_host: 10.0.1.23

srv_port: 3306

command: Sleep

time_ms: 1580

info:

4 rows in set (0.00 sec)

And I've attached the events.log file. No sensitive things in there...

I hope you can find something. If there is anything I can do, please let me know!

Cheers,

Dirk

Op woensdag 25 januari 2017 13:14:09 UTC+1 schreef René Cannaò:

events.log.00000001

René Cannaò

unread,

Jan 25, 2017, 6:20:22 PM1/25/17

to Dirk Bonenkamp, proxysql

Dirk,

thank you for all the output provided.

Query rules look very simple.

I spent a lot of time reading the log file, and I couldn't spot anything odd with regards to routing: they seems correct.

To summarize some finding, there are two connections that issue lock/unlock:

./tools/eventslog_reader_sample /mnt/hgfs/-vagrant/events.log.00000001 | egrep -i '^(un|)lock' -B 1
ProxySQL LOG QUERY: thread_id="1236" username="proteion" schemaname=proteion" client="127.0.0.1:47790" HID=10 server="10.0.1.23:3306" starttime="2017-01-25 16:45:30.538038" endtime="2017-01-25 16:45:30.539384" duration=1346us digest="0x724063EF19842298"
LOCK TABLES 01_invoice WRITE
--
ProxySQL LOG QUERY: thread_id="1236" username="proteion" schemaname=proteion" client="127.0.0.1:47790" HID=10 server="10.0.1.23:3306" starttime="2017-01-25 16:45:30.991545" endtime="2017-01-25 16:45:30.991986" duration=441us digest="0x36792372854ABCB7"
UNLOCK TABLES
ProxySQL LOG QUERY: thread_id="1239" username="proteion" schemaname=proteion" client="127.0.0.1:47797" HID=10 server="10.0.1.23:3306" starttime="2017-01-25 16:45:30.794244" endtime="2017-01-25 16:45:30.992775" duration=198531us digest="0x724063EF19842298"
LOCK TABLES 01_invoice WRITE

So far all seems good. The two connections (1236 and 1239) send the statements on HG=10 .

From processlist, session 1299 is trying to execute LOCK TABLES: this doesn't succeed because the table is locked by session 1239 on HG=10, server 10.0.1.23 .

Session 1239 is currently running a query, according to processlist.
Until the query doesn't complete, session 1239 will hold the LOCK.
The current query executed by session 1239 is being executed on HG=20: this is a correct according to the rules! Due to a bug recently fixed , if you are using 1.3.1 this query is sent to HG=10.

No matter where the SELECT is sent, on HG=10 or HG=20, the lock is held by session 1299 until it completes.

According to the event log, this query should be very fast, less than half second:
./tools/eventslog_reader_sample /mnt/hgfs/-vagrant/events.log.00000001 | grep "GROUP BY id ORDER BY high DESC" -B 1
ProxySQL LOG QUERY: thread_id="1236" username="proteion" schemaname=proteion" client="127.0.0.1:47790" HID=20 server="10.0.1.25:3306" starttime="2017-01-25 16:45:30.539701" endtime="2017-01-25 16:45:30.969523" duration=429822us digest="0xD637CBCD3D73F43C"

SELECT SUBSTR(journal_number, 4 , 5) as high FROM 01_invoice WHERE (administration_id = "29") AND (CHAR_LENGTH(journal_number) = 8) AND (SUBSTR(journal_number, 3 , 1) = "7") GROUP BY id ORDER BY high DESC LIMIT 1

But according to processlist, it is running for over 39 seconds.

This is the same query reported in the first email, where there is a metadata lock, so it needed further investigation.

After a lot of head scratching, I remembered that Galera does not support LOCK TABLES: http://galeracluster.com/documentation-webpages/limitations.html#table-locking

Therefore it is possible that, because of the bug in ProxySQL 1.3.1, some SELECTs were always being routed to the writer node, not triggering any issue in Galera.

While now that the routing issue is fixed in ProxySQL, these SELECTs do not go to the writer node and Galera ends in some locking issue.

Because LOCK TABLES is not supported in Galera, perhaps you should stop using them.

At the same time, it is possible that your application is using LOCK/UNLOCK TABLES as a poor alternative to transactions.

What I recommend in this case is to use ProxySQL itself to rewrite '^LOCK TABLES.*$' in 'BEGIN', and '^UNLOCK TABLES$' in 'COMMIT' , so you will have real transactions.

Thanks,

René

To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+unsubscribe@googlegroups.com.

René Cannaò

unread,

Jan 26, 2017, 3:24:33 AM1/26/17

to Dirk Bonenkamp, proxysql

Dirk,

I have been advised to also use wsrep_convert_lock_to_trx (http://galeracluster.com/documentation-webpages/mysqlwsrepoptions.html#wsrep-convert-lock-to-trx) to automatically convert LOCK/UNLOCK TABLES in BEGIN/COMMIT

Dirk Bonenkamp

unread,

Jan 30, 2017, 5:40:35 AM1/30/17

to proxysql, dirkbo...@gmail.com

Hi René,

First of all, thank you for taking the time looking into this issue. Highly appreciated, this level of support is something that makes ProxySQL stand out!

Off course you are completely right, we shouldn't use LOCK... We never had any problems with it, since we started on a single server, than switched to master / slave, than switched to Galera with only 1 active node (Haproxy) and finally switched to ProxySQL, where we were 'protected' by a bug. All these situation allow for the usage of LOCK.

Fortunately, we don't use LOCK a lot and we're currently working on the fixes. It was a bit more complex than replacing the LOCK with a transaction, but we have a working mock up. If things go trough testing as expected, we should have this in production in 2 weeks.

Again, many thanks for your awesome support!

Cheers,

Dirk

Op donderdag 26 januari 2017 09:24:33 UTC+1 schreef René Cannaò:

Reply all

Reply to author

Forward