Locking issue with ProxySQL

987 views
Skip to first unread message

Dirk Bonenkamp

unread,
Jan 25, 2017, 4:04:15 AM1/25/17
to proxysql
Hi All,

I'm experiencing a locking issue when using ProxySQL.

We run a 3 node Galera cluster, all non SELECT / SELECT ... FOR UPDATE queries are routed to 1 node.

We have some transactions which lock a table, do some stuff and then unlock the table. When running this directly on 1 of the cluster nodes, this works fine. However, when running it trough ProxySQL, we run into a strange problem.

We see a 'NULL' process, and the other operations are 'waiting for metadata lock':

|   44 | proteion             | 10.0.1.40:35581   | proteion           | Query   |   40 | Waiting for table metadata lock | SELECT SUBSTR(journal_number, 4 , 5) as high  FROM 01_invoice  WHERE (administration_id = "29") AND  |         0 
|   47 | proteion             | 10.0.1.40:35582   | proteion           | Sleep   |   40 |                                 | NULL                                                                                                 |         0 
| 1111 | proteion             | 10.0.1.40:36842   | proteion           | Query   |   40 | Waiting for table metadata lock | LOCK TABLES 01_invoice WRITE                                                                         |         0 

Killing the null process (47) clears things up.

Any ideas on how to solve this issue?

Thanks, kind regards,

Dirk


Dirk Bonenkamp

unread,
Jan 25, 2017, 4:29:02 AM1/25/17
to proxysql
I'd like to add that reverting to 1.3.1 seems to fix this issue... 

Hmm, weird...!

Op woensdag 25 januari 2017 10:04:15 UTC+1 schreef Dirk Bonenkamp:

René Cannaò

unread,
Jan 25, 2017, 4:42:57 AM1/25/17
to Dirk Bonenkamp, proxysql
Hi Dirk,

Connection id=47 is perhaps holding some lock on table 01_invoice : the output of SHOW ENGINE INNODB STATUS could have helped.

in 1.3.2 and 1.3.3 there were 2 bugs fixed related to routing:
* in 1.3.2 : query session variables, combined with transaction_persistent==true were incorrectly disabling query routing #837
* in 1.3.3 : in some circumstances transaction_persistent could disable routing after commit #889
Are you using transaction_persistent ? If yes, we should investigate if a bug fix had a side effect for your specific workload.

How easy is to reproduce this issue? Can you easily reproduce it in a testing environment?


--
You received this message because you are subscribed to the Google Groups "proxysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dirk Bonenkamp

unread,
Jan 25, 2017, 6:30:24 AM1/25/17
to proxysql, dirkbo...@gmail.com
Hi Rene,

We are using transaction_persistent, so I think it has something to do with one of these issues.

I can reproduce it very easily in our test environment - if this is what you mean. Any steps I can take to help pinpointing and solving this issue?

Cheers,

Dirk

Op woensdag 25 januari 2017 10:42:57 UTC+1 schreef René Cannaò:
To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+u...@googlegroups.com.

René Cannaò

unread,
Jan 25, 2017, 7:14:09 AM1/25/17
to Dirk Bonenkamp, proxysql
Ok, so probably it is a side effect of fixing a bug in multiplexing.

Could you please try this in your testing environment:
a) enable logging globally. From Admin:
SET mysql-eventslog_filename='events.log';
SET mysql-eventslog_filesize=10485760;
LOAD MYSQL VARIABLES TO RUNTIME;

b) enabled logging for all the queries from user proteion
This is done in mysql_query_rules. I suggest it to be the first rule, so you could increase all the rule_id first. Something like:
UPDATE mysql_query_rules SET rule_id=rule_id+1;
INSERT INTO mysql_query_rules (rule_id, active, username, log) VALUES (1,1,'proteion',1);
LOAD MYSQL QUERY RULES TO RUNTIME;

c) send me a copy of the query rules:
SELECT * FROM mysql_query_rules;

d) reproduce the issue

e) send me a copy of "SHOW FULL PROCESSLIST" executed in Admin.
Yes, Admin has SHOW FULL PROCESSLIST :)
This step is very important, because the queries are logged in the events log when they complete, not before. Therefore I will need to understand what is not able to execute when the lock issue happens, and in which connection: I will then be able to correlate with the rest of the queries in the log.

f) send me the file(s) generated in /var/lib/proxysql/ named events.log.0000000X .
Please note that all the queries from user proteion will be logged, so if there are sensitive information maybe is better not to send in the mailing list.

Thanks,
René

To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+unsubscribe@googlegroups.com.

Dirk Bonenkamp

unread,
Jan 25, 2017, 11:51:52 AM1/25/17
to proxysql, dirkbo...@gmail.com
OK, here we go!

The query rules:

mysql> SELECT * FROM mysql_query_rules\G
*************************** 1. row ***************************
              rule_id: 1
               active: 1
             username: proteion
           schemaname: NULL
               flagIN: 0
          client_addr: NULL
           proxy_addr: NULL
           proxy_port: NULL
               digest: NULL
         match_digest: NULL
        match_pattern: NULL
 negate_match_pattern: 0
              flagOUT: NULL
      replace_pattern: NULL
destination_hostgroup: NULL
            cache_ttl: NULL
            reconnect: NULL
              timeout: NULL
              retries: NULL
                delay: NULL
       mirror_flagOUT: NULL
     mirror_hostgroup: NULL
            error_msg: NULL
                  log: 1
                apply: 0
              comment: NULL
*************************** 2. row ***************************
              rule_id: 2
               active: 1
             username: NULL
           schemaname: NULL
               flagIN: 0
          client_addr: NULL
           proxy_addr: NULL
           proxy_port: NULL
               digest: NULL
         match_digest: NULL
        match_pattern: ^SELECT .* FOR UPDATE$
 negate_match_pattern: 0
              flagOUT: NULL
      replace_pattern: NULL
destination_hostgroup: 10
            cache_ttl: NULL
            reconnect: NULL
              timeout: NULL
              retries: NULL
                delay: NULL
       mirror_flagOUT: NULL
     mirror_hostgroup: NULL
            error_msg: NULL
                  log: NULL
                apply: 1
              comment: NULL
*************************** 3. row ***************************
              rule_id: 3
               active: 1
             username: NULL
           schemaname: NULL
               flagIN: 0
          client_addr: NULL
           proxy_addr: NULL
           proxy_port: NULL
               digest: NULL
         match_digest: NULL
        match_pattern: ^SELECT
 negate_match_pattern: 0
              flagOUT: NULL
      replace_pattern: NULL
destination_hostgroup: 20
            cache_ttl: NULL
            reconnect: NULL
              timeout: NULL
              retries: NULL
                delay: NULL
       mirror_flagOUT: NULL
     mirror_hostgroup: NULL
            error_msg: NULL
                  log: NULL
                apply: 1
              comment: NULL
3 rows in set (0.00 sec)

The processlist

mysql> show full processlist\G
*************************** 1. row ***************************
  ThreadID: 3
 SessionID: 1239
      user: proteion
        db: proteion
  cli_host: 127.0.0.1
  cli_port: 47797
 hostgroup: 20
l_srv_host: 10.0.1.40
l_srv_port: 43750
  srv_host: 10.0.1.23
  srv_port: 3306
   command: Query
   time_ms: 39104
      info: SELECT SUBSTR(journal_number, 4 , 5) as high  FROM 01_invoice  WHERE (administration_id = "29") AND (CHAR_LENGTH(journal_number) = 8) AND (SUBSTR(journal_number, 3 , 1) = "7") GROUP BY id  ORDER BY high DESC  LIMIT 1
*************************** 2. row ***************************
  ThreadID: 3
 SessionID: 1296
      user: proactive-config
        db: proactive-config
  cli_host: 127.0.0.1
  cli_port: 48019
 hostgroup: 20
l_srv_host: 10.0.1.40
l_srv_port: 54772
  srv_host: 10.0.1.25
  srv_port: 3306
   command: Sleep
   time_ms: 23877
      info: 
*************************** 3. row ***************************
  ThreadID: 3
 SessionID: 1299
      user: proteion
        db: proteion
  cli_host: 127.0.0.1
  cli_port: 48025
 hostgroup: 10
l_srv_host: 10.0.1.40
l_srv_port: 43979
  srv_host: 10.0.1.23
  srv_port: 3306
   command: Query
   time_ms: 23692
      info: LOCK TABLES 00_module WRITE, 01_invoice WRITE, 01_invoice_line WRITE, 01_invoice_line_order_line WRITE
*************************** 4. row ***************************
  ThreadID: 1
 SessionID: 1392
      user: apollo
        db: apollo
  cli_host: 127.0.0.1
  cli_port: 48426
 hostgroup: 10
l_srv_host: 10.0.1.40
l_srv_port: 44380
  srv_host: 10.0.1.23
  srv_port: 3306
   command: Sleep
   time_ms: 1580
      info: 
4 rows in set (0.00 sec)

And I've attached the events.log file. No sensitive things in there...

I hope you can find something. If there is anything I can do, please let me know!

Cheers,

Dirk



Op woensdag 25 januari 2017 13:14:09 UTC+1 schreef René Cannaò:
events.log.00000001

René Cannaò

unread,
Jan 25, 2017, 6:20:22 PM1/25/17
to Dirk Bonenkamp, proxysql
Dirk,

thank you for all the output provided.
Query rules look very simple.
I spent a lot of time reading the log file, and I couldn't spot anything odd with regards to routing: they seems correct.

To summarize some finding, there are two connections that issue lock/unlock:

./tools/eventslog_reader_sample /mnt/hgfs/-vagrant/events.log.00000001 | egrep -i '^(un|)lock'  -B 1
ProxySQL LOG QUERY: thread_id="1236" username="proteion" schemaname=proteion" client="127.0.0.1:47790" HID=10 server="10.0.1.23:3306" starttime="2017-01-25 16:45:30.538038" endtime="2017-01-25 16:45:30.539384" duration=1346us digest="0x724063EF19842298"
LOCK TABLES 01_invoice WRITE
--
ProxySQL LOG QUERY: thread_id="1236" username="proteion" schemaname=proteion" client="127.0.0.1:47790" HID=10 server="10.0.1.23:3306" starttime="2017-01-25 16:45:30.991545" endtime="2017-01-25 16:45:30.991986" duration=441us digest="0x36792372854ABCB7"
UNLOCK TABLES
ProxySQL LOG QUERY: thread_id="1239" username="proteion" schemaname=proteion" client="127.0.0.1:47797" HID=10 server="10.0.1.23:3306" starttime="2017-01-25 16:45:30.794244" endtime="2017-01-25 16:45:30.992775" duration=198531us digest="0x724063EF19842298"
LOCK TABLES 01_invoice WRITE


So far all seems good. The two connections (1236 and 1239) send the statements on HG=10 .

From processlist, session 1299 is trying to execute LOCK TABLES: this doesn't succeed because the table is locked by session 1239 on HG=10, server 10.0.1.23 .

Session 1239 is currently running a query, according to processlist.
Until the query doesn't complete, session 1239 will hold the LOCK.
The current query executed by session 1239 is being executed on HG=20: this is a correct according to the rules! Due to a bug recently fixed , if you are using 1.3.1 this query is sent to HG=10.

No matter where the SELECT is sent, on HG=10 or HG=20, the lock is held by session 1299 until it completes.

According to the event log, this query should be very fast, less than half second:
./tools/eventslog_reader_sample /mnt/hgfs/-vagrant/events.log.00000001 | grep "GROUP BY id  ORDER BY high DESC" -B 1
ProxySQL LOG QUERY: thread_id="1236" username="proteion" schemaname=proteion" client="127.0.0.1:47790" HID=20 server="10.0.1.25:3306" starttime="2017-01-25 16:45:30.539701" endtime="2017-01-25 16:45:30.969523" duration=429822us digest="0xD637CBCD3D73F43C"

SELECT SUBSTR(journal_number, 4 , 5) as high  FROM 01_invoice  WHERE (administration_id = "29") AND (CHAR_LENGTH(journal_number) = 8) AND (SUBSTR(journal_number, 3 , 1) = "7") GROUP BY id  ORDER BY high DESC  LIMIT 1


But according to processlist, it is running for over 39 seconds.
This is the same query reported in the first email, where there is a metadata lock, so it needed further investigation.

After a lot of head scratching, I remembered that Galera does not support LOCK TABLES: http://galeracluster.com/documentation-webpages/limitations.html#table-locking
Therefore it is possible that, because of the bug in ProxySQL 1.3.1, some SELECTs were always being routed to the writer node, not triggering any issue in Galera.
While now that the routing issue is fixed in ProxySQL, these SELECTs do not go to the writer node and Galera ends in some locking issue.

Because LOCK TABLES is not supported in Galera, perhaps you should stop using them.
At the same time, it is possible that your application is using LOCK/UNLOCK TABLES as a poor alternative to transactions.
What I recommend in this case is to use ProxySQL itself to rewrite '^LOCK TABLES.*$' in 'BEGIN', and '^UNLOCK TABLES$' in 'COMMIT' , so you will have real transactions.

Thanks,
René


To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+unsubscribe@googlegroups.com.

René Cannaò

unread,
Jan 26, 2017, 3:24:33 AM1/26/17
to Dirk Bonenkamp, proxysql
Dirk,

I have been advised to also use wsrep_convert_lock_to_trx (http://galeracluster.com/documentation-webpages/mysqlwsrepoptions.html#wsrep-convert-lock-to-trx) to automatically convert LOCK/UNLOCK TABLES in BEGIN/COMMIT

Dirk Bonenkamp

unread,
Jan 30, 2017, 5:40:35 AM1/30/17
to proxysql, dirkbo...@gmail.com
Hi René,

First of all, thank you for taking the time looking into this issue. Highly appreciated, this level of support is something that makes ProxySQL stand out!

Off course you are completely right, we shouldn't use LOCK... We never had any problems with it, since we started on a single server, than switched to master / slave, than switched to Galera with only 1 active node (Haproxy) and finally switched to ProxySQL, where we were 'protected' by a bug. All these situation allow for the usage of LOCK.

Fortunately, we don't use LOCK a lot and we're currently working on the fixes. It was a bit more complex than replacing the LOCK with a transaction, but we have a working mock up. If things go trough testing as expected, we should have this in production in 2 weeks.

Again, many thanks for your awesome support!

Cheers,

Dirk


Op donderdag 26 januari 2017 09:24:33 UTC+1 schreef René Cannaò:
Reply all
Reply to author
Forward
0 new messages