从NORTHWIND数据库获取每个国家/地区的最畅销产品
早上好,过去的一天我一直在为此苦苦挣扎,但我似乎无法弄清楚.
Good day guys, I've been struggling with this for the past day and I just can't seem to figure it out.
我的任务是从流行的开源数据库NORTHWIND中获取每个国家/地区销售量最大的产品: https://northwinddatabase.codeplex.com
My task is to derive the most sold product for each country from the popular open source database called NORTHWIND: https://northwinddatabase.codeplex.com
我能够进入这一阶段,这是我在SQL Server中的代码:
I was able to get to this stage, here is my code in SQL Server:
--Get most sold product for each country
WITH TotalProductsSold AS
(
SELECT od.ProductID, SUM(od.Quantity) AS TotalSold
FROM [Order Details] AS od
GROUP BY od.ProductID
)
SELECT MAX(TotalProductsSold.TotalSold) AS MostSoldQuantity, s.Country --,p.ProductName
FROM Products AS p
INNER JOIN TotalProductsSold
ON TotalProductsSold.ProductID = p.ProductID
INNER JOIN Suppliers AS s
ON s.SupplierID = p.SupplierID
GROUP BY s.Country
ORDER BY MostSoldQuantity DESC
这给了我以下结果:
这很好,但是我希望找出MostSoldQuantity的产品名称.
That's all good but I wish to find out the product name for the MostSoldQuantity.
非常感谢您!
PS,我在我认为可以使用的地方添加了--p.ProductName注释,但没有,如果有人可以向我解释为什么GROUP BY不能自动允许我为该行得出很好的产品名称,>
P.S I put a comment --p.ProductName where I thought it would work but it didnt and if someone could explain me why does GROUP BY not automatically allow me to derive the product name for the row that would be great
首先,从每个国家(而不只是每个产品)的销售产品数量开始.然后对它们进行排名,并只选择RANK = 1的任何东西. 像...
First, start with the count of products sold, per country, not just per product. Then rank them and pick only anything at RANK = 1. Something like...
WITH
ProductQuantityByCountry AS
(
SELECT
s.CountryID,
p.ProductID,
SUM(od.Quantity) AS Quantity
FROM
[Order Details] AS od
INNER JOIN
Products AS p
ON p.ProductID = od.ProductID
INNER JOIN
Suppliers AS s
ON s.SupplierID = p.SupplierID
GROUP BY
s.CountryID,
p.ProductID
),
RankedProductQuantityByCountry
AS
(
SELECT
RANK() OVER (PARTITION BY CountryID ORDER BY Quantity DESC) AS countryRank,
*
FROM
ProductQuantityByCountry
)
SELECT
*
FROM
RankedProductQuantityByCountry
WHERE
countryRank = 1
请注意,一个国家可能会提供相同数量的不同产品,因此两种产品的排名都可能为1.调查ROW_NUMER()
和/或DENSE_RANK()
的其他行为类似RANK()
.
Note, one country may supply identical quantity of different producs, and so two products could both have rank = 1. Look into ROW_NUMER()
and/or DENSE_RANK()
for other but similar behaviours to RANK()
.
一个简单但容易理解的问题是为什么SQL不允许您在最终查询中放入Product.Name
呢?
A simple though exercise to cover why SQL doesn't let you put Product.Name
in your final query is to ask a question.
在这种情况下,SQL应该做什么?
What should SQL do in this case?
SELECT
MAX(TotalProductsSold.TotalSold) AS MostSoldQuantity,
MIN(TotalProductsSold.TotalSold) AS LeastSoldQuantity,
s.Country,
p.ProductName
FROM
blahblahblah
GROUP BY
s.Country
ORDER BY
MostSoldQuantity DESC
MIN
和 MAX
的存在使事情变得模棱两可.
The presence of a MIN
and a MAX
makes things ambiguous.
您 可能很清楚,您要执行操作by country
,并且该操作是从该国家/地区选择销量最高的产品.但这实际上不是明确的,对查询的微小更改可能会对任何推断的行为造成非常混乱的后果.相反,SQL的声明性语法对要解决的问题提供了非常清晰/明确/确定性的描述.
You may be clear that you want to perform an operation by country
and that operation to be to pick the product with the highest sales volume from that country. But it's not actually explicit, and small changes to the query could have very confusing consequences to any inferred behaviour. Instead SQL's declarative syntax provides a very clear / explicit / deterministic description of the problem to be solved.
如果GROUP BY
子句中未提及表达式,则不能SELECT
对其进行汇总,而不能对其进行汇总.这样一来,对于SQL引擎的含义或应该做的事情就没有歧义.
If an expression isn't mentioned in the GROUP BY
clause, you can't SELECT
it, without aggregating it. This is so that there is no ambiguity as to what is meant or what the SQL engine is supposed to do.
通过要求您在查询的一个级别上规定get the total sales per country per product
,然后可以在另一查询级别将and then pick the highest ranked per country
.
By requiring you to stipulate get the total sales per country per product
at one level of the query, you can then cleanly state and then pick the highest ranked per country
at another level of the query.
感觉好像您遇到的查询的时间长于必须"的时间.但这还会导致查询完全明确,既可以将查询编译成执行计划,也可以供将来将来读取您的代码的其他编码人员使用.
This can feel like you end up with queries that are longer than "should" be necessary. But it also results in queries that are completely un-ambiguous, both for compiling the query down to an execution plan, and for other coders who will read your code in the future.