Mysql Exists vs IN-相关子查询vs子查询?

Mysql Exists vs IN-相关子查询vs子查询?

问题描述:

我很好奇EXISTS()的执行速度应该比IN()快.

I'm curious about how the execution of EXISTS() is supposed to be faster than IN().

回答问题比尔·卡文(Bill Karwin)提出了一个好观点.当您使用EXISTS()时,它使用的是相关子查询(依赖子查询),而IN()仅使用子查询.

I was answering a question when Bill Karwin brought up a good point. when you use EXISTS() it is using a correlated subquery (dependent subquery) and IN() is only using a subquery.

EXPLAIN显示EXISTSNOT EXISTS都使用一个依赖子查询,而IN / NOT IN都只使用一个子查询..所以我很好奇关联子查询比子查询要快吗?

EXPLAIN shows that EXISTS and NOT EXISTS both use a dependent subquery and IN / NOT IN both use just a subquery.. so I'm curious how a correlated subquery is faster than a subquery??

我以前使用过EXISTS,它的执行速度比IN快,这就是我感到困惑的原因.

I've used EXISTS before and it does execute faster than IN which is why I'm confused.

这是 SQLFIDDLE 及其说明

EXPLAIN SELECT COUNT(t1.table1_id) 
FROM table1 t1 
WHERE EXISTS
(   SELECT 1 
    FROM table2 t2
    WHERE t2.table1_id <=> t1.table1_id
);

+-------+-----------------------+-----------+-------+---------------+-----------+--------+--------------------------+--------+------------------------------+
| ID    |   SELECT_TYPE         |   TABLE   | TYPE  | POSSIBLE_KEYS |   KEY     |KEY_LEN |  REF                     |   ROWS |  EXTRA                       |
+-------+-----------------------+-----------+-------+---------------+-----------+--------+--------------------------+--------+------------------------------+
|  1    |   PRIMARY             |   t1      | index | (null)        |   PRIMARY |   4    | (null)                   |   4    |  Using where; Using index    |
|  2    |   DEPENDENT SUBQUERY  |   t2      | REF   | table1_id     |  table1_id|   4    | db_9_15987.t1.table1_id  |   1    |  Using where; Using index    |
+-------+-----------------------+-----------+-------+---------------+-----------+--------+--------------------------+--------+------------------------------+


EXPLAIN SELECT COUNT(t1.table1_id) 
FROM table1 t1 
WHERE NOT EXISTS
(   SELECT 1 
    FROM table2 t2
    WHERE t2.table1_id = t1.table1_id
);
+-------+-----------------------+-----------+-------+---------------+-----------+--------+--------------------------+--------+------------------------------+
| ID    |   SELECT_TYPE         |   TABLE   | TYPE  | POSSIBLE_KEYS |   KEY     |KEY_LEN |  REF                     |   ROWS |  EXTRA                       |
+-------+-----------------------+-----------+-------+---------------+-----------+--------+--------------------------+--------+------------------------------+
|  1    |   PRIMARY             |   t1      | index | (null)        |   PRIMARY |   4    | (null)                   |   4    |  Using where; Using index    |
|  2    |   DEPENDENT SUBQUERY  |   t2      | ref   | table1_id     |  table1_id|   4    | db_9_15987.t1.table1_id  |   1    |  Using index                 |
+-------+-----------------------+-----------+-------+---------------+-----------+--------+--------------------------+--------+------------------------------+


EXPLAIN SELECT COUNT(t1.table1_id) 
FROM table1 t1 
WHERE t1.table1_id NOT IN 
(   SELECT t2.table1_id 
    FROM table2 t2
);
+-------+-------------------+-----------+-------+---------------+-----------+--------+----------+--------+------------------------------+
| ID    |   SELECT_TYPE     |   TABLE   | TYPE  | POSSIBLE_KEYS |   KEY     |KEY_LEN |  REF     |   ROWS |  EXTRA                       |
+-------+-------------------+-----------+-------+---------------+-----------+--------+----------+--------+------------------------------+
|  1    |   PRIMARY         |   t1      | index | (null)        |   PRIMARY |   4    | (null)   |   4    |  Using where; Using index    |
|  2    |   SUBQUERY        |   t2      | index | (null)        |  table1_id|   4    | (null)   |   2    |  Using index                 |
+-------+-------------------+-----------+-------+---------------+-----------+--------+----------+--------+------------------------------+

FEW问题

在上面的说明中,EXISTS如何具有附加功能的using whereusing index,而NOT EXISTS如何不具有附加功能的using where?

In the explains above, how does EXISTS have using where and using index in extras but NOT EXISTS does not have using where in extras?

相关子查询比子查询更快?

How is a correlated subquery faster than a subquery?

这是与RDBMS无关的答案,但可能仍会有所帮助.在我的理解中,相关的(又是依赖的)子查询可能是最常被错误指控为性能不佳的罪魁祸首.

This is a RDBMS-agnostic answer, but may help nonetheless. In my understanding, the correlated (aka, dependent) subquery is perhaps the most often falsely accused culprit for bad performance.

问题(正如最经常描述的那样)是它为外部查询的每一行处理内部查询.因此,如果外部查询返回1,000行,而内部查询返回10,000,则您的查询必须遍历10,000,000行(外部x内部)以产生结果.与相同结果集上不相关查询的11,000行(外+内)相比,这是不好的.

The problem (as it is most often described) is that it processes the inner query for every row of the outer query. Therefore, if the outer query returns 1,000 rows, and the inner query returns 10,000, then your query has to slog through 10,000,000 rows (outer×inner) to produce a result. Compared to the 11,000 rows (outer+inner) from a non-correlated query over the same result sets, that ain't good.

但是,这只是最坏的情况.在许多情况下,DBMS将能够利用索引来大大减少行数.即使只有内部查询可以使用索引,10,000行还是〜13搜索,这将总数减少到13,000.

However, this is just the worst case scenario. In many cases, the DBMS will be able to exploit indexes to drastically reduce the rowcount. Even if only the inner query can use an index, the 10,000 rows becomes ~13 seeks, which drops the total down to 13,000.

exists运算符可以在第一行之后停止处理行,从而进一步降低了查询成本,尤其是当大多数外部行与至少一个内部行匹配时.

The exists operator can stop processing rows after the first, cutting down the query cost further, especially when most outer rows match at least one inner row.

在极少数情况下,我看到SQL Server 2008R2将相关子查询优化到合并联接(该联接仅遍历两个集合-最好的情况),在内部和外部查询中都可以找到合适的索引.

In some rare cases, I have seen SQL Server 2008R2 optimise correlated subqueries to a merge join (which traverses both sets only once - best possible scenario) where a suitable index can be found in both inner and outer queries.

造成不良性能的真正原因不一定是相关子查询,而是嵌套扫描.

The real culprit for bad performance is not necessarily correlated subqueries, but nested scans.