MySQL索引使用查询优化

问题描述:

我有以下MySQL(MyISAM)表,大约有300万行。

I have the following MySQL (MyISAM) table with about 3 Million rows.

CREATE TABLE `tasks` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `node` smallint(6) NOT NULL,
  `pid` int(11) NOT NULL,
  `job` int(11) NOT NULL,
  `a_id` int(11) DEFAULT NULL,
  `user_id` int(11) NOT NULL,
  `state` int(11) NOT NULL,
  `start_time` int(11) NOT NULL,
  `end_time` int(11) NOT NULL,
  `stop_time` int(11) NOT NULL,
  `end_stream` int(11) NOT NULL,
  `message` varchar(255) DEFAULT NULL,
  `rate` float NOT NULL,
  `exiting` int(11) NOT NULL DEFAULT '0',
  `bytes` int(11) NOT NULL,
  `motion` tinyint(4) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `a_id` (`a_id`),
  KEY `job` (`job`),
  KEY `state` (`state`),
  KEY `end_time` (`end_time`),
  KEY `start_time` (`start_time`),
) ENGINE=MyISAM AUTO_INCREMENT=100 DEFAULT CHARSET=utf8;

现在当我运行以下查询时,MySQL只使用a_id索引并需要扫描一些千行。

Now when I run the following query, MySQL is only using the a_id index and needs to scan a few thousand rows.

SELECT count(id) AS tries FROM `tasks` WHERE ( job='1' OR job='3' ) 
AND a_id='614' AND state >'80' AND state < '100' AND start_time >='1386538013';

当我添加额外的索引KEY newkey a_id start_time ),MySQL仍然是试图只使用a_id而不是newkey。只有在查询中使用提示/强制索引时,才会使用它。更改查询中的字段无济于事。

When I add an additional index KEY newkey (a_id,state,start_time), MySQL is still trying to use a_id only and not newkey. Only when using the hint / force index in the query, it's been used. Changing the fields in the query around does not help.

有什么想法吗?我不一定要在我的陈述中提示。 MySQL没有这样做的事实自动向我表明我的表,密钥或查询存在问题。任何帮助都非常感谢。

Any ideas? I don't necessarily want hints in my statements. The fact that MySQL is not doing this automatically indicates to me that there is an issue with my table, keys or query somewhere. Any help is highly appreciated.

其他信息:

mysql> show index in tasks;
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tasks |          0 | PRIMARY   |            1 | id          | A         |     3130554 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | a_id      |            1 | a_id        | A         |        2992 |     NULL | NULL   | YES  | BTREE      |         |               |
| tasks |          1 | job       |            1 | job         | A         |           5 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | state     |            1 | state       | A         |           9 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | end_time  |            1 | end_time    | A         |     1565277 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | newkey    |            1 | a_id        | A         |        2992 |     NULL | NULL   | YES  | BTREE      |         |               |
| tasks |          1 | newkey    |            2 | state       | A         |        8506 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | newkey    |            3 | start_time  | A         |     3130554 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

包含和不包含引号的EXPLAIN:

EXPLAIN with and without quotes:

mysql> DESCRIBE SELECT count(id) AS tries FROM `tasks` WHERE ( job='1' OR job='3' )  AND a_id='614' AND state >'80' AND state < '100' AND start_time >='1386538013';
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys              | key       | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
|  1 | SIMPLE      | tasks | ref  | a_id,job,state,newkey      | a_id      | 5       | const |  740 | Using where |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
1 row in set (0.10 sec)

mysql> DESCRIBE SELECT count(id) AS tries FROM `tasks` WHERE ( job=1 OR job=3 )  AND a_id = 614 AND state > 80 AND state < 100 AND start_time >= 1386538013;
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys              | key       | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
|  1 | SIMPLE      | tasks | ref  | a_id,job,state,newkey      | a_id      | 5       | const |  740 | Using where |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
1 row in set (0.01 sec)


一些事情......我会在
(a_id,job,state,start_time)上有一个SINGLE复合索引

A few things... I would have a SINGLE compound index on ( a_id, job, state, start_time )

这有助于优化所有标准的查询,我认为是最好的调整序列。单个A_ID,然后是两个作业,一个小的状态范围,然后基于时间。接下来,注意没有引号......看起来你正在将数字转换为字符串比较,将它们保留为数字用于比较 - 比字符串更快。

This to help optimize the query on all the criteria, in what I believe is the best tuned sequence. A single "A_ID", then two jobs, a small state range, then time based. Next, notice no quotes... It appears you were converting numeric to string comparisons, leave them as numeric for compare -- faster than strings.

此外,通过使用它们作为索引的一部分,它是一个COVERING索引,这意味着它不必转到原始页面数据来获取其他值来测试合格记录是否包含。

Also, by having them all as part of the index, it is a COVERING index meaning it does NOT have to go to the raw page data to get the other values to test the qualifying records to include or not.

SELECT 
      count(*) AS tries 
   FROM 
      tasks
   WHERE 
          a_id = 614
      AND job IN ( 1, 3 ) 
      AND state > 80 AND state < 100 
      AND start_time >= 1386538013;

现在,索引的原因......考虑以下情况。你有两个有盒子的房间......在第一个房间里,每个盒子都是一个a_id,其中就是按顺序排列的工作,每个工作都是状态范围,最后是开始时间。

Now, the why the index... consider the following scenario. You have two rooms that have boxes... In the first room, each box is an "a_id", within that are the jobs in order, within each job are the state ranges, and finally by start time.

在另一个房间,您的方框按开始时间排序,a_id排序,最后状态。

In another room, your boxes are sorted by start time, within that a_id are sorted, and finally state.

这将是更容易找到你需要的东西。这就是你应该如何考虑索引。我宁愿去一个方框A_ID = 614,然后跳到Job 1,另一个用于Job 3.在每个Job 1,Job 3中,抓80-100,然后是时间。但是,您可以在每个标准考虑因素中更好地了解您的数据和数量,并且可以进行调整。

Which would be easier to find what you need. That is how you should think on the indexes. I would rather go to one box for "A_ID = 614", then jump to Job 1 and another for Job 3. Within each Job 1, Job 3, grab 80-100, then time. You however know better your data and volume in each criteria consideration and may adjust.

最后,计数(ID)与计数(*)。我所关心的只是一个合格的记录。我不需要知道实际的ID,因为过滤标准已经被认定为包括或不包括,为什么看(在这种情况下)实际的ID。

Finally, the count(ID) vs count(*). All I care about is a record qualified. I don't need to know the actual ID as the filtering criteria already qualified as include or not, why look (in this case) for the actual "ID".