MySQL - 查询列的重复项并返回原始行和重复行

问题描述:

I have a table that I use to store some systematically chosen "serial numbers" for each product that is bought...

The problem is, a CSV was uploaded that I believe contained some duplicate "serial numbers", which means that when the application tries to modify a row, it may not be modifying the correct one.

I need to be able to query the database and get all rows that are a double of the serial_number column. It should look something like this:

ID, serial_number, meta1, meta2, meta3
3, 123456, 0, 2, 4
55, 123456, 0, 0, 0
6, 345678, 0, 1, 2
99, 345678, 0, 1, 2

So as you can see, I need to be able to see both the original row and the duplicate row and all of it's columns of data ... this is so I can compare them and determine what data is now inconsistent.

我有一张表,用于为每个购买的产品存储一些系统选择的“序列号”。 。 p>

问题是,上传的CSV我认为包含一些重复的“序列号”,这意味着当应用程序尝试修改行时,可能无法修改正确的行 一个。 p>

我需要能够查询数据库并获取 serial_number code>列的两倍的所有行。 它看起来像这样: p>

  ID,serial_number,meta1,meta2,meta3 
3,123456,0,2,4 
55,123456,0,0,0  
6,345678,0,1,2 
99,345678,0,1,2 
  code>  pre> 
 
 

所以你可以看到,我需要能够看到 原始行和重复行及其所有数据列...这样我就可以比较它们并确定哪些数据现在不一致。 p> div>

Some versions of MySQL implement in with a subquery very inefficiently. A safe alternative is a join:

SELECT t.*
FROM t join
     (select serial_number, count(*) as cnt
      from t
      group by serial_number
     ) tsum
     on tsum.serial_number = t.serial_number and cnt > 1
order by t.serial_number;

Another alternative is to use an exists clause:

select t.*
from t
where exists (select * from t t2 where t2.serial_number = t.serial_number and t2.id <> t.id)
order by t.serial_number;

Both these queries (as well as the one proposed by @fthiella) are standard SQL. Both would benefit from an index on (serial_number, id).

SELECT *
FROM
  yourtable
WHERE
  serial_number IN (SELECT serial_number
                    FROM yourtable
                    GROUP BY serial_number
                    HAVING COUNT(*)>1)
ORDER BY
  serial_number, id