查看MySQL锁等待的原因

--sys库的介绍

    mysql 5.6也可以有sys库(基于performance_schema的视图)。sys库是一个开源项目,在githup上早就有,是一个DBA的开源工具,后来mysql感觉好,就放在了mysql5.7上。

    下载地址:https://github.com/mysql/mysql-sys

    [tms@m-db3 ~]$ cd mysql-sys-master

    [tms@m-db3 ~]$ mysql < sys_56.sql

    这样,就可以在mysql5.6里面加入sys库了,不过mysql 5.6只有88张表,而mysql 5.7有101张,这是因为Mysql 5.7的performace_schema库里面又多了几张表。

    sys库是performance_schema的视图。

--MySQL锁等待

    当Mysql发生锁等待情况时,可以通过如下语句来在线查看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
mysql> select from sys.innodb_lock_waits G;
*************************** 1. row ***************************
                wait_started: 2018-07-16 16:25:17  //锁等待开始的时间,16:25开始等待
                    wait_age: 00:10:08    //发现问题时已经等待了10分钟了
               wait_age_secs: 608   //608秒,也就是等10分钟了
                locked_table: `iws`.`busi_reconciliationgbgsinfo_inputdetails` //被锁住的表名
                locked_index: PRIMARY  //被锁住的索引
                 locked_type: RECORD //锁的类型为行锁
              waiting_trx_id: 13666265  //waiting transaction id,正在等待事务的id号
         waiting_trx_started: 2018-07-16 16:24:54 //这个事务是从16:24开始等待
             waiting_trx_age: 00:10:31 //等了10分钟了
     waiting_trx_rows_locked: 1 //正在等待的这个事务锁住了1行记录
   waiting_trx_rows_modified: 0 //正在等待的这个事务修改了0行记录
                 waiting_pid: 441805 //这个等待事务的线程id是多少,通过show processlist 命令可以查到它,结果看到是一个sleep的线程,没有执行具体sql语句,见下
               waiting_query: update busi_reconciliationgbgs ...                where id = 4510 //等待锁释放的语句
             waiting_lock_id: 13666265:2924:21:94 //正在等待的锁id
           waiting_lock_mode: X //等待锁的类型是排它锁
             blocking_trx_id: 13666259 //这个事务id阻塞了waiting lock
                blocking_pid: 441803 阻塞事务的pid
              blocking_query: NULL  //阻塞事务的sql语句
            blocking_lock_id: 13666259:2924:21:94
          blocking_lock_mode: X
        blocking_trx_started: 2018-07-16 16:24:51
            blocking_trx_age: 00:10:34
    blocking_trx_rows_locked: 1
  blocking_trx_rows_modified: 1
     sql_kill_blocking_query: KILL QUERY 441803
sql_kill_blocking_connection: KILL 441803
1 row in set (0.00 sec)
ERROR: 
No query specified

    上面看到输出了很多的东西,看的我都蒙圈了。后来查看mysql官方文档,慢慢的才发现,其实只关注上面的waiting_pid、waiting_query和blocking_pid、blocking_query四个参数即可;其中waiting_pid和blocking_pid两个参数就是通过执行show processlist命令里面输出的线程id号,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
mysql> show full processlist G;
*************************** 8. row ***************************
     Id: 441803
   User: iws
   Host: 172.16.21.7:46121
     db: iws
Command: Sleep
   Time: 655
  State: 
   Info: NULL
*************************** 9. row ***************************
     Id: 441805
   User: iws
   Host: 172.16.21.7:46122
     db: iws
Command: Query
   Time: 652
  State: updating
   Info: update busi_reconciliationgbgsinfo_inputdetails     set                bgs_id = 1622              ,         date '2018-06-24 00:00:00'              ,         awbnumber = '006-85516771'              ,         incidental = 15.00              ,         entry_exit = 23.00              ,         warehousing_fee = 0.00              ,         loading_unloading = 0.00              ,         other = 0.00              ,         total = 38.00                     ,         state = 20              ,         comparison_resultsid = 30              ,         confirmation_method = '人工'              ,         confirmationid = 'root'              ,         confirmationtime = '2018-07-16 16:25:17'              ,         confirmation_note = '.'              ,         createtime = '2018-06-24 20:00:07'                     ,         createrid = '9862ebdbaf3249a88bcaa8f01bde0471'                        where id = 4510

    通过上面两个的输出结果,我们明白了,是441803线程锁住了表,造成线程441805的等待。

    我们看到发生等待的线程441805对应的sql语句是:update busi_reconciliationgbgs ... where id = 4510,但是锁表的线程441803对应的sql语句竟然是Null。这就更让人迷惑了。

    于是我默默的翻开了ysql官方文档,原来里面已经对这个null专门做了说明。

    官方文档说,要想找到这个null值对应的阻塞语句,可以通过下面几个步骤寻找:

    a)、根据锁表的processlist id 441803,运用如下sql,找到null对应的sql语句,如下:

1
2
3
SELECT  SQL_TEXT FROM performance_schema.events_statements_current WHERE THREAD_ID
in 
(SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID=441803)

    b)、如果上面找到的sql语句,你还是不能分析出为什么他们会锁表,一直拿着锁不释放,那么你可以查看 performance_schema.events_statements_history表里面最近执行过的10条sql(假设上面查到的thread_id=28):

1
2
3
SELECT EVENT_ID,CURRENT_SCHEMA, SQL_TEXT FROM performance_schema.events_statements_history WHERE THREAD_ID
in 
(SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID=441803) order by event_id

   其他:

    上面查询锁的sql可以只关注已下几个列,如下:

1
2
3
4
5
6
7
8
9
10
11
12
SELECT
wait_started,
wait_age,
waiting_pid,
waiting_query,
blocking_trx_id,
blocking_pid,
blocking_query,
blocking_lock_mode,
sql_kill_blocking_query
FROM
sys.innodb_lock_waits

~~~~~~~~~~~~分割线~~~~~~~~~~~~~~~~~

    最近我用python 2.6写了个自动杀锁的脚本,只要发现sys.innodb_lock_waits表里面有锁表的内容,就杀死相应的sql线程,并输出杀死sql的内容到当前目录下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
#!/usr/bin/env python
#-*-coding:utf8-*-
#下载rpm包安装,下载地址:https://dev.mysql.com/downloads/connector/python/,注意mysql-connector-python版本需要是1.1.17的,2.x的版本运行会有问题
from __future__ import print_function
import mysql.connector as mdb
import os
#全局变量
username = 'root'
password = ''
hostname = 'localhost'
database = 'sys'
#配置信息
config = {
    'user': username,
    'password': password,
    'host': hostname,
    'database': database
}
#定义函数,查看锁表的行数
def Get_sys_lock():
    show_locked_num = "select count(*) from sys.innodb_lock_waits"
    cursor.execute(show_locked_num)
    for i in cursor:
        locked_sql_num = i[0]
    return locked_sql_num
     
#定义函数,如果有锁表,就重定向到locked_sql.txt文件里面
def show_locked_sql():
    count = 0
    count1 0
    #如果日志文件存在就删除
    if os.path.isfile('locked_sql.txt'):
        os.remove('locked_sql.txt')
    if os.path.isfile('null_sql.txt'):
        os.remove('null_sql.txt')
    if os.path.isfile('last_10_null_sql.txt'):
        os.remove('last_10_null_sql.txt')
    #引用函数
    locked_sql_num = Get_sys_lock()
    print("锁表的行数是:{0}".format(locked_sql_num))
    if locked_sql_num > 0: #如果有锁表
        show_locked_sql = " SELECT 
            wait_started, 
            wait_age, 
            waiting_pid, 
            waiting_query, 
            blocking_trx_id, 
            blocking_pid, 
            blocking_query, 
            blocking_lock_mode, 
            sql_kill_blocking_query 
            FROM 
            sys.innodb_lock_waits 
            "
        cursor.execute(show_locked_sql)
        for i in cursor:
            wait_started = i[0]
            wait_age = i[1]
            waiting_pid = i[2]
            waiting_query = i[3]
            blocking_trx_id = i[4]
            blocking_pid = i[5]
            blocking_query = i[6]
            blocking_lock_mode = i[7]
            sql_kill_blocking_query = i[8]
            if not str(blocking_query).strip(): #如果blocking_query字符串为Null
                #import pdb;pdb.set_trace()
                show_null_sql = "SELECT  SQL_TEXT FROM performance_schema.events_statements_current WHERE THREAD_ID in (SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID=%s)" % blocking_pid
                conn = mdb.connect(**config)
                cursor1 = conn.cursor()
                cursor1.execute(show_null_sql)
                #print(cursor1.fetchall())
                for j in cursor1:
                    SQL_TEXT = j[0]
                    print(SQL_TEXT)
                cursor1.close
                try:
                    count1 += 1
                    f = open('null_sql.txt','a') #a表示追加
                    f.write (
                        '##########' 'The ' + str(count1) + ' rows ' 'Blocking null query对应的具体sql为########## ' +
                        'blocking_pid: ' + str(blocking_pid) + ' '
                        'sql_text: ' + str(SQL_TEXT) + ' '
                    )
                except OSError as reason:
                    print('出错了:' + str(reason))
                finally:
                    f.close
                #再查看null对应的最后10条sql
                show_last_10_null_sql = "SELECT EVENT_ID,CURRENT_SCHEMA, SQL_TEXT FROM performance_schema.events_statements_history WHERE THREAD_ID in (SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID=%s) order by event_id" % blocking_pid
                cursor2 = conn.cursor()
                cursor2.execute(show_last_10_null_sql)
                cursor2.close
                #print(cursor1.fetchall())
                count2 0
                for j in cursor2:
                    EVENT_ID = j[0]
                    CURRENT_SCHEMA = j[1]
                    SQL_TEXT = j[2]
                    try:
                        count2 += 1
                        f = open('last_10_null_sql.txt','a') #a表示追加
                        f.write (
                            '##########' 'The ' + str(count2) + ' rows ' 'laster blocking null query对应的具体sql为########## ' +
                            'blocking_pid: ' + str(blocking_pid) + ' '
                            'EVENT_ID: ' + str(EVENT_ID) + ' '
                            'CURRENT_SCHEMA: ' + str(CURRENT_SCHEMA) + ' '
                            'SQL_TEXT: ' + str(SQL_TEXT) + ' '
                        )   
                    except OSError as reason:
                        print('出错了:' + str(reason))
                    finally:
                        f.close
            #把锁表的情况重定向到一个locked_sql.txt文件里面
            try:
                count += 1
                f = open('locked_sql.txt','a') #a表示追加
                f.write('##########' 'The ' + str(count) + ' rows' '########### ')
                f.write (
                    'wait_started: ' + str(wait_started) + ' ' +
                    'wait_age: ' + str(wait_age) + ' ' 
                    'waiting_pid: ' + str(waiting_pid )  + ' ' +
                    'waiting_query: ' + str(waiting_query) + ' ' +
                    'blocking_trx_id: ' + str(blocking_trx_id) + ' ' +
                    'blocking_pid: ' + str(blocking_pid) + ' ' +
                    'blocking_query: ' + str(blocking_query) + ' ' +
                    'blocking_lock_mode: ' + str(blocking_lock_mode)  + ' ' +
                    'sql_kill_blocking_query: ' + str(sql_kill_blocking_query) + ' '
                )
                '''
                f.write (
                    '##########' 'Blocking null query对应的具体sql为########## ' +
                    'blocking_pid:' + str(blocking_pid) +
                    'sql_text:' + str(SQL_TEXT)
                     
                )
                '''
            except OSError as reason:
                print('出错了:' + str(reason))
            finally:
                f.close
#定义函数,列出当前所有执行的sql线程
def show_processlist():
    count = 0
    #如果日志文件存在就删除
    if os.path.isfile('show_processlist.txt'):
        os.remove('show_processlist.txt')
    #引用函数
    locked_sql_num = Get_sys_lock()
    #print("锁表的行数是:{0}".format(locked_sql_num))
    if locked_sql_num > 0: #如果有锁表
        show_processlist = "select 
            id, 
            user, 
            host, 
            db, 
            time, 
            state, 
            info 
            from information_schema.`PROCESSLIST`  order  by time desc 
            "
        cursor.execute(show_processlist)
        for i in cursor:
            id = i[0]
            user = i[1]
            host = i[2]
            db = i[3]
            time = i[4]
            state = i[5]
            info = i[6]
            #把锁表的情况重定向到一个show_processlist.txt文件里面
            try:
                count += 1
                f = open('show_processlist.txt','a') #a表示追加
                f.write('##########' 'The ' + str(count) + ' rows' '########### ')
                f.write (
                    'id: ' + str(id) + ' ' +
                    'user: ' + str(user) + ' ' 
                    'host: ' + str(host)  + ' ' +
                    'db: ' + str(db) + ' ' +
                    'time: ' + str(time) + ' ' +
                    'state: ' + str(state) + ' ' +
                    'info: ' + str(info) + ' ' 
                )
            except OSError as reason:
                print('出错了:' + str(reason))
            finally:
                f.close
#定义函数,如果有锁表,就杀死
def kill_locked_sql():
    #引用函数
    locked_sql_num = Get_sys_lock()
    #print("锁表的行数是:{0}".format(locked_sql_num))
    if locked_sql_num > 0: #如果有锁表
        execute_locked_sql = " SELECT 
            sql_kill_blocking_query 
            FROM 
            sys.innodb_lock_waits 
            "
        cursor.execute(execute_locked_sql)
        for i in cursor:
            sql_kill_blocking_query = i[0]
            conn = mdb.connect(**config)
            cursor1 = conn.cursor()
            try:
                cursor1.execute(sql_kill_blocking_query)
            except:
                print('出错了')
            cursor1.close
         
#主程序
conn = mdb.connect(**config)
cursor = conn.cursor()
show_locked_sql()
show_processlist()
kill_locked_sql()
cursor.close
conn.close

 --参考文档:

 https://dev.mysql.com/doc/refman/8.0/en/sys-innodb-lock-waits.html

 https://dev.mysql.com/doc/refman/8.0/en/innodb-information-schema-examples.html#innodb-information-schema-examples-null-blocking-query