MySQL使用sum和left join将结果翻倍

问题描述:

我有三个表设置:孩子,玩具和游戏,每个都有唯一的主键:id_kid,id_toy和id_game.每个孩子可以拥有多个玩具和游戏,但是每个玩具或游戏只能由一个孩子拥有.

I have a three table setup : kids, toys and games, each with unique primary keys : id_kid, id_toy and id_game. Each kid can have multiple toys and games, but each toy or game is owned by only one kid.

玩具和游戏的已购买专栏包含3个州:-1,0,1 表结构是这样的:

The toys and games have a bought column with 3 states : -1,0,1 The table structure is something like this :

kids

id_kid
kid_name
etc

games

id_game
id_kid_games --> links with id_kid in kids_table (maybe not the best name, I know)
game_name
bought --> can be -1,0,1

toys

id_toy
id_kid_toys --> links with id_kid in kids_table
toy_name
bought --> can be -1,0,1

对于每个孩子,我尝试使用下面的查询来获取购买和不购买的玩具和游戏的总和,但是结果却翻了一番:

For each kid i'm trying to get a total of toys and games, bought and not bought, using the query below, however the results are doubled :

SELECT kids.*, 

COUNT(DISTINCT toys.id_toy) AS total_toys, 
SUM(CASE toys.bought WHEN 1 THEN 1 ELSE 0 END) AS toys_bought, 
SUM(CASE toys.bought WHEN -1 THEN 1 ELSE 0 END) AS toys_not_bought, 

COUNT(DISTINCT games.id_game) AS total_games, 
SUM(CASE games.bought WHEN 1 THEN 1 ELSE 0 END) AS games_bought, 
SUM(CASE games.bought WHEN -1 THEN 1 ELSE 0 END) AS games_not_bought 

FROM kids as k 
LEFT JOIN toys t ON k.id_kid = t.id_kid_toys
LEFT JOIN games g ON k.id_kid = g.id_kid_games
GROUP BY k.id_kid
ORDER BY k.name ASC

一个孩子有2个玩具和4个游戏,都买了,结果是总共2个玩具(正确),总共4个游戏(正确),购买了8个玩具,购买了8个游戏. (都错了)

One kid has 2 toys and 4 games, all bought, and the results are 2 total toys (correct), 4 total games (correct), 8 toys bought, 8 games bought. (both wrong)

请提供答案-如果可能的话-请勿使用子选择. 谢谢.

Please help with an answer - if possible - without using subselects. Thank you.

当您从两个不相关的关系(孩子加入玩具,孩子加入游戏)中选择数据时,子查询是自然的方式.由于可以使用不相关的子查询,因此这应该不会特别慢.

As you are selecting data from two unrelated relations (kids joined to toys, and kids joined to games), subqueries are the natural way of doing it. As uncorrelated subqueries may be used, this should not be particularly slow.

尝试此查询是否足够有效:

Try if this query is sufficiently efficient:

与原始查询相比,它基本上只是颠倒了联接和分组的顺序.

Compared to your original query, it basically just reverses the order of joinining and grouping.

SELECT kids.*, t.total_toys, t.toys_bought, t.toys_not_bought,
               g.total_games, g.games_bought, g.games_not_bought
FROM kids
LEFT JOIN (SELECT id_kids_toys,
                  COUNT(*) AS total_toys,
                  SUM(CASE bought WHEN 1 THEN 1 ELSE 0 END) as toys_bought,
                  SUM(CASE bought WHEN -1 THEN 1 ELSE 0 END) as toys_not_bought
           FROM toys
           GROUP BY id_kids_toys) AS t
ON t.id_kids_toys = kids.id_kid
LEFT JOIN (SELECT id_kids_games,
                  COUNT(*) AS total_games,
                  SUM(CASE bought WHEN 1 THEN 1 ELSE 0 END) as games_bought,
                  SUM(CASE bought WHEN -1 THEN 1 ELSE 0 END) as games_not_bought
           FROM games
           GROUP BY id_kids_games) AS g
ON g.id_kids_games = kids.id_kid
ORDER by kids.name;

如果您坚持要避免子查询,那么查询效率可能会大大降低:

If you insist on avoiding subqueries, this, probably far less efficient, query might do:

SELECT kids.*, 
COUNT(DISTINCT toys.id_toy) AS total_toys, 

-- sum only toys joined to first game
SUM(IF(g2.id_game IS NULL AND bought = 1, 1, 0)) AS toys_bought, 
SUM(IF(g2.id_game IS NULL AND bought = -1, 1, 0)) AS toys_not_bought, 

-- sum only games joined to first toy
COUNT(DISTINCT games.id_game) AS total_games, 
SUM(IF(t2.id_toy IS NULL AND bought = 1, 1, 0)) AS games_bought, 
SUM(IF(t2.id_toy IS NULL AND bought = -1, 1, 0)) AS games_not_bought 

FROM kids as k 
LEFT JOIN toys t ON k.id_kid = t.id_kid_toys
LEFT JOIN games g ON k.id_kid = g.id_kid_games

-- select only rows where either game or toy is the first one for this kid
LEFT JOIN toys t2 on k.id_kid = t.id_kid_toys AND t2.id_toy < t.id_toy
LEFT JOIN games g2 ON k.id_kid = g.id_kid_games AND g2.id_game < g.id_game 
WHERE t2.id_toy IS NULL OR g2.id_game IS NULL

GROUP BY k.id_kid
ORDER BY k.name ASC

通过确保对于每个孩子,只计算与第一个玩具关联的游戏,并且仅对与第一个玩具关联的玩具进行计数.

It works by ensuring that for each kid, only the games joined to the first toy is counted, and only the toys joined to the first game is counted.