SQL 查询,如带有 OR 条件的 GROUP BY
我会尽量描述真实情况.在我们公司,我们有一个带桌子的预订系统,我们称之为 Customers,其中电子邮件和电话联系人与每个传入的订单一起保存 - 这是我无法更改的系统的一部分.我面临着如何获得唯一客户数量的问题.对于独特客户,我指的是拥有相同电子邮件地址或相同电话号码的一群人.
I'll try to describe the real situation. In our company we have a reservation system with a table, let's call it Customers, where e-mail and phone contacts are saved with each incoming order - that's the part of a system I can't change. I'm facing the problem how to get count of unique customers. With the unique customer I mean group of people who has either the same e-mail or same phone number.
示例 1:在现实生活中,您可以想象已婚的汤姆和桑德拉.Tom 订购了 4 种产品,当其中一个与 Sandra 共享(作为家庭电话)时,他在我们的预订系统中填写了 3 个不同的电子邮件地址和 2 个不同的电话号码,因此我可以假设它们以某种方式连接.除了这个共享电话号码外,Sandra 还填写了她的私人电话号码,而且对于这两个订单,她只使用一个电子邮件地址.对我来说,这意味着将所有的以下行算作一个唯一客户.所以事实上这个独特的客户可能会成长为整个家庭.
Example 1: From the real life you can imagine Tom and Sandra who are married. Tom, who ordered 4 products, filled in our reservation system 3 different e-mail addresses and 2 different phone numbers when one of them shares with Sandra (as a homephone) so I can presume they are connected somehow. Sandra except this shared phone number filled also her private one and for both orders she used only one e-mail address. For me this means to count all of the following rows as one unique customer. So in fact this unique customer may grow up into the whole family.
ID E-mail Phone Comment
---- ------------------- -------------- ------------------------------
0 tom@email.com +44 111 111 First row
1 tommy@email.com +44 111 111 Same phone, different e-mail
2 thomas@email.com +44 111 111 Same phone, different e-mail
3 thomas@email.com +44 222 222 Same e-mail, different phone
4 sandra@email.com +44 222 222 Same phone, different e-mail
5 sandra@email.com +44 333 333 Same e-mail, different phone
正如 ypercube 所说,我可能需要递归计算所有这些唯一客户.
As ypercube said I will probably need a recursion to count all of these unique customers.
示例 2:这是我想要做的示例.
是否可以在不使用递归的情况下获取唯一客户数使用游标或其他东西的实例还是需要递归?
Example 2: Here is the example of what I want to do.
Is it possible to get count of unique customers without using recursion for instance by using cursor or something or is the recursion necessary ?
ID E-mail Phone Comment
---- ------------------- -------------- ------------------------------
0 linsey@email.com +44 111 111 ─┐
1 louise@email.com +44 111 111 ├─ 1. unique customer
2 louise@email.com +44 222 222 ─┘
---- ------------------- -------------- ------------------------------
3 steven@email.com +44 333 333 ─┐
4 steven@email.com +44 444 444 ├─ 2. unique customer
5 sandra@email.com +44 444 444 ─┘
---- ------------------- -------------- ------------------------------
6 george@email.com +44 555 555 ─── 3. unique customer
---- ------------------- -------------- ------------------------------
7 xavier@email.com +44 666 666 ─┐
8 xavier@email.com +44 777 777 ├─ 4. unique customer
9 xavier@email.com +44 888 888 ─┘
---- ------------------- -------------- ------------------------------
10 robert@email.com +44 999 999 ─┐
11 miriam@email.com +44 999 999 ├─ 5. unique customer
12 sherry@email.com +44 999 999 ─┘
---- ------------------- -------------- ------------------------------
----------------------------------------------------------------------
Result ∑ = 5 unique customers
----------------------------------------------------------------------
我已尝试使用 GROUP BY 进行查询,但我不知道如何按第一列或第二列对结果进行分组.我正在寻找让我们说类似的事情
I've tried a query with GROUP BY but I don't know how to group the result by either first or second column. I'm looking for let's say something like
SELECT COUNT(*) FROM Customers
GROUP BY Email OR Phone
再次感谢您的建议
P.S.在完全改写之前,我非常感谢这个问题的答案.现在此处的答案可能与更新不符,因此如果您打算这样做,请不要在这里投票(当然问题除外:).我完全重写了这篇文章.
感谢并抱歉我的错误开始.
P.S.
I really appreciate the answers for this question before the complete rephrase. Now the answers here may not correspond to the update so please don't downvote here if you're going to do it (except the question of course :). I completely rewrote this post.
Thanks and sorry for my wrong start.
这是使用递归 CTE 的完整解决方案.
Here is a full solution using a recursive CTE.
;WITH Nodes AS
(
SELECT DENSE_RANK() OVER (ORDER BY Part, PartRank) SetId
, [ID]
FROM
(
SELECT [ID], 1 Part, DENSE_RANK() OVER (ORDER BY [E-mail]) PartRank
FROM dbo.Customer
UNION ALL
SELECT [ID], 2, DENSE_RANK() OVER (ORDER BY Phone) PartRank
FROM dbo.Customer
) A
),
Links AS
(
SELECT DISTINCT A.Id, B.Id LinkedId
FROM Nodes A
JOIN Nodes B ON B.SetId = A.SetId AND B.Id < A.Id
),
Routes AS
(
SELECT DISTINCT Id, Id LinkedId
FROM dbo.Customer
UNION ALL
SELECT DISTINCT Id, LinkedId
FROM Links
UNION ALL
SELECT A.Id, B.LinkedId
FROM Links A
JOIN Routes B ON B.Id = A.LinkedId AND B.LinkedId < A.Id
),
TransitiveClosure AS
(
SELECT Id, Id LinkedId
FROM Links
UNION
SELECT LinkedId Id, LinkedId
FROM Links
UNION
SELECT Id, LinkedId
FROM Routes
),
UniqueCustomers AS
(
SELECT Id, MIN(LinkedId) UniqueCustomerId
FROM TransitiveClosure
GROUP BY Id
)
SELECT A.Id, A.[E-mail], A.Phone, B.UniqueCustomerId
FROM dbo.Customer A
JOIN UniqueCustomers B ON B.Id = A.Id