求sql语句,有些复杂,先合并再分组再按条件只取其中一条记录
求sql语句,有点复杂,先合并再分组再按条件只取其中一条记录
1,先合并,
2,再分组
3,再只取一条记录
例如,表 T1
name start_date end_date
A 2014.1.1 2014.3.3
A 2014.5.5 2014.6.6
A 2014.3.3 2014.5.5
B 2014.1.1 2014.3.3
B 2014.5.5 2014.6.6
B 2014.6.6 2014.7.7
按name分组,先合并,(稍微有点复杂,我尽量描述的清晰点)
1,
name=A的记录有三条,
如果这三条记录的start_date 和 end_date能接起来(例如第三条的end_date>=第一条记录的start_date,之所以没用第二条记录跟第一条记录比较,是因为第三条记录的sart_date和第一条记录的end_date最接近),那么把这两条记录合并为一条记录,也就是变成下面这样(两条记录),
name start_date end_date
A 2014.1.1 2014.5.5
A 2014.5.5 2014.6.6
然后同理,剩下这两条记录也能接起来,变成一条记录
name start_date end_date
A 2014.1.1 2014.6.6
name=A的记录合并完成,因为最后只剩下一条记录,所以不用分组,
2.
但是name=B的记录有点复杂,因为并不是所有记录都能接起来
a. 先找到日期最早的记录,即,第一条记录,那么和他最接近的记录是第二条,但是第二条和第一条记录不满足条件(第二条的end_date 和 第一条记录的start_date之间有空隙),所以不能合并,
b. 那么第二条记录和第三条记录之间能合并,所以最后变成了两条记录:
name start_date end_date
B 2014.1.1 2014.3.3
B 2014.5.5 2014.7.7
因为最后针对每个name只想取一条记录(end_date最大的),其实到这里我就知道了,可以用row_number() over(partition by name order by end_date desc) rowNumber,然后取rowNumber=1的记录。
请大牛们支招(实在没分了,谢谢各位)
------解决思路----------------------
------解决思路----------------------
------解决思路----------------------
用以下方法测测
1,先合并,
2,再分组
3,再只取一条记录
例如,表 T1
name start_date end_date
A 2014.1.1 2014.3.3
A 2014.5.5 2014.6.6
A 2014.3.3 2014.5.5
B 2014.1.1 2014.3.3
B 2014.5.5 2014.6.6
B 2014.6.6 2014.7.7
按name分组,先合并,(稍微有点复杂,我尽量描述的清晰点)
1,
name=A的记录有三条,
如果这三条记录的start_date 和 end_date能接起来(例如第三条的end_date>=第一条记录的start_date,之所以没用第二条记录跟第一条记录比较,是因为第三条记录的sart_date和第一条记录的end_date最接近),那么把这两条记录合并为一条记录,也就是变成下面这样(两条记录),
name start_date end_date
A 2014.1.1 2014.5.5
A 2014.5.5 2014.6.6
然后同理,剩下这两条记录也能接起来,变成一条记录
name start_date end_date
A 2014.1.1 2014.6.6
name=A的记录合并完成,因为最后只剩下一条记录,所以不用分组,
2.
但是name=B的记录有点复杂,因为并不是所有记录都能接起来
a. 先找到日期最早的记录,即,第一条记录,那么和他最接近的记录是第二条,但是第二条和第一条记录不满足条件(第二条的end_date 和 第一条记录的start_date之间有空隙),所以不能合并,
b. 那么第二条记录和第三条记录之间能合并,所以最后变成了两条记录:
name start_date end_date
B 2014.1.1 2014.3.3
B 2014.5.5 2014.7.7
因为最后针对每个name只想取一条记录(end_date最大的),其实到这里我就知道了,可以用row_number() over(partition by name order by end_date desc) rowNumber,然后取rowNumber=1的记录。
请大牛们支招(实在没分了,谢谢各位)
------解决思路----------------------
WITH t1([name],start_date,end_date) AS (
SELECT 'A','2014.1.1','2014.3.3' UNION ALL
SELECT 'A','2014.5.5','2014.6.6' UNION ALL
SELECT 'A','2014.3.3','2014.5.5' UNION ALL
SELECT 'B','2014.1.1','2014.3.3' UNION ALL
SELECT 'B','2014.5.5','2014.6.6' UNION ALL
SELECT 'B','2014.6.6','2014.7.7'
)
,t2 AS (
SELECT [name],
start_date,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY start_date) n
FROM t1
WHERE NOT EXISTS (SELECT *
FROM t1 t
WHERE t.[name] = t1.[name]
AND t.end_date = t1.start_date
)
)
,t3 AS (
SELECT [name],
end_date,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY start_date) n
FROM t1
WHERE NOT EXISTS (SELECT *
FROM t1 t
WHERE t.[name] = t1.[name]
AND t.start_date = t1.end_date
)
)
SELECT t2.[name],
t2.start_date,
t3.end_date
FROM t2
JOIN t3
ON t3.[name] = t2.[name]
AND t3.n = t2.n
name start_date end_date
---- ---------- --------
A 2014.1.1 2014.6.6
B 2014.1.1 2014.3.3
B 2014.5.5 2014.7.7
------解决思路----------------------
;WITH t1(id,[name],start_date,end_date) AS ( -- 要求有ID,用来避免自指
SELECT 1,'A','2014.1.1','2014.3.3' UNION ALL
SELECT 2,'A','2014.5.1','2014.6.6' UNION ALL
SELECT 3,'A','2014.3.1','2014.5.5' UNION ALL
SELECT 4,'B','2014.1.1','2014.3.3' UNION ALL
SELECT 5,'B','2014.5.1','2014.6.6' UNION ALL
SELECT 6,'B','2014.5.1','2014.6.3' UNION ALL -- 加这条会让 BETWEEN 判断错误
SELECT 7,'B','2014.6.1','2014.7.7'
)
,s1 AS (
SELECT [name],
MAX(end_date) AS end_date
FROM t1
GROUP BY [name]
)
,s2 AS (
SELECT s1.[name], s.start_date
FROM s1
CROSS APPLY (
SELECT top 1 start_date
FROM t1
WHERE t1.[name] = S1.[name]
AND NOT EXISTS (SELECT *
FROM t1 t
WHERE t.id <> t1.id
AND t.[name] = t1.[name]
-- AND t1.start_date BETWEEN t.start_date AND t.end_date
-- 如下判断连续
AND t.start_date < t1.start_date
AND t1.start_date <= t.end_date
)
ORDER BY start_date DESC
) s
)
SELECT s1.[name],
s2.start_date,
s1.end_date
FROM s1
JOIN s2
ON s1.[name] = s2.[name]
------解决思路----------------------
用以下方法测测
use Tempdb
go
--> -->
if not object_id(N'Tempdb..#T1') is null
drop table #T1
Go
Create table #T1([name] nvarchar(1),[start_date] Datetime,[end_date] Datetime)
Insert #T1
select N'A','2014.1.1','2014.3.3' union all
select N'A','2014.5.1','2014.6.6' union all
select N'A','2014.3.1','2014.5.5' union all
select N'B','2014.1.1','2014.3.3' union all
select N'B','2014.5.1','2014.6.6' union all
select N'B','2014.6.1','2014.7.7'
Go
SELECT a.[name]
,[start_date]=MAX(a.[start_date])
,[end_date] = MAX(b.[end_date])
FROM ( SELECT [name]
,[start_date]
FROM #T1 AS a
WHERE NOT EXISTS ( SELECT 1
FROM #T1 AS b
WHERE [name] = a.[name]
AND a.[start_date] BETWEEN [start_date] AND [end_date] AND [start_date]<a.[start_date] )
) AS a
INNER JOIN ( SELECT [name]
,[end_date]
FROM #T1 AS a
WHERE NOT EXISTS ( SELECT 1
FROM #T1 AS b
WHERE [name] = a.[name]
AND (end_date BETWEEN a.[start_date] AND a.[end_date] ) AND [start_date]>a.[start_date] )
) AS b ON a.[name] = b.[name]
AND a.[start_date] <= b.[end_date]
GROUP BY a.[name]
/*
name start_date end_date
A 2014-01-01 00:00:00.000 2014-06-06 00:00:00.000
B 2014-05-01 00:00:00.000 2014-07-07 00:00:00.000
*/