MySQL选择特定值的最新行

问题描述:

我正在努力围绕生成一个能够听取正确结果的 MySQL 查询.

I'm battling to wrap my head around producing a single MySQL query that would heed the correct results.

我有一个结构如下的表格:

I've got a table that is structured as followed:

workflow_status_history:

workflow_status_history:

id    reference         status
 1    308ffn3oneb    Lead Received
 2    308ffn3oneb    Quoted
 3    308ffn3oneb    Invoiced
 4    853442ec2fc    Lead Received

如您所见,workflow_status_history 表保留了我们系统上每个工作流的所有状态的历史记录,而不是用新状态替换或覆盖之前的状态.这有助于深入报告和审计.工作流的起始状态始终为 Lead Received.

As you can see, the workflow_status_history table keeps a history of all the statuses of each workflow on our system, rather than replacing or overwriting the previous status with the new status. This helps with in-depth reporting and auditing. A workflow will always have a starting status of Lead Received.

然而,问题是我需要选择表中最新状态为reference字段>潜在客户收到.所以在上面的例子中,字段编号 4 会返回,但是字段 123 不会返回因为该工作流参考的最新状态是 Invoiced.但是,如果 853442ec2fc(字段编号 4)获得了 Lead Received 以外的新状态,它也不应该在下次查询运行时返回.

The problem however is that I need to select the reference field of each row in the table who's latest or only status is Lead Received. So in the example above, field number 4 would return, however fields 1, 2 and 3 would not return because the latest status for that workflow reference is Invoiced. But if 853442ec2fc (field number 4) gets a new status other than Lead Received, it also should not return the next time the query runs.

我目前的查询如下:

SELECT *, MAX(id) FROM workflow_status_history WHERE 'status' = 'Lead Received' GROUP BY reference LIMIT 20

这当然不会返回所需的结果,因为 WHERE 子句确保它返回所有具有 Lead Received 状态的行,而不管它是否为最新状态.因此它将始终返回表中的前 20 个分组工作流引用.

This, of course, doesn't return the desired result because the WHERE clause ensures that it returns all the rows that have a Lead Received status, irrespective of it being the latest status or not. So it will always return the first 20 grouped workflow references in the table.

我将如何生成正确的查询以返回所需的结果?

How would I go about producing the correct query to return the desired results?

感谢您的帮助.

这是一个与自身左连接的情况.这个查询的想法是:选择状态为Lead Received"的所有引用,这些引用没有具有相同引用和更高 ID 的行.我假设您只使用 id 来确定什么是较新"状态,没有时间戳等.

This is a case for a left join with itself. The idea in this query is: select all references with status 'Lead Received' which do not have a row with the same reference and a higher ID. I assume you only use the id for determining what is the 'newer' status, no timestamp etc.

SELECT
  DISTINCT h1.reference
FROM
  workflow_status_history h1 LEFT JOIN workflow_status_history h2 ON 
                  h1.reference = h2.reference AND 
                  h1.id < h2.id

WHERE
  h1.status = 'Lead Received' AND
  h2.id IS NULL