如果时间间隔重叠,则对行进行分组
我需要一个 SQL 查询来创建一个新字段 group_id
来标识每个 ID
中具有重叠 start_time
和 end_time 的记录
间隔.一个可接受的解决方案将为每个 ID
和重叠的一组间隔具有唯一的 group_id
.例子:计算了 group_id 的 times
表示例
I need an SQL query to create a new field group_id
which identifies records within each ID
that have overlapping start_time
and end_time
intervals. An acceptable solution will have a unique group_id
for each ID
and overlapping set of intervals.
Example:
sample times
table with group_id computed
ID START_TIME END_TIME GROUP_ID
100 10:00:00 12:00:00 1
100 10:15:00 12:30:00 1
100 12:15:00 12:45:00 1
100 13:00:00 14:00:00 2
101 09:00:00 13:00:00 1
101 09:30:00 13:30:00 1
101 10:00:00 10:20:00 1
101 10:19:59 11:15:00 1
101 10:21:00 10:30:00 1
101 11:00:00 12:30:00 1
101 11:30:00 12:35:00 1
102 10:01:00 11:25:00 1
102 11:01:00 11:30:00 1
105 10:00:00 10:20:00 1
105 10:21:00 10:30:00 2
105 10:30:01 11:00:00 3
106 10:00:00 10:22:00 1
107 10:19:57 10:20:01 1
108 10:01:01 10:16:59 1
附加信息:对于给定的ID
,如果它的任何区间重叠,则相应的记录属于同一组,因此应该具有相同的group_id
.当 A 的 start_time
和/或 end_time
在 B 的 start_time
和 end_time
之间时,A 记录与另一条记录 B 重叠.
Additional Info: For a given ID
, if any of its intervals overlap then the corresponding records belong to the same group, and thus should have the same group_id
. A record A overlaps another record B when A’s start_time
and/or end_time
is between B’s start_time
and end_time
.
在示例中,ID
= 100 有四个间隔.前三个重叠 => 第二个记录与第一个重叠(10:15 的 start_time
在 10 的 start_time
和 end_time
之间:00 到 12:00),第三个与第二个重叠(12:15 的 start_time
在 10 的 start_time
和 end_time
之间:15 至 12:30).因此,它们都具有相同的 group_id
1.ID
= 100 的第四个区间不与该 ID内的任何其他区间重叠code>,因此它成为自己的组,并带有新的
group_id
.最后一条记录有一个完全不同的ID
,所以它开始了第三组,也有一个新的group_id
.
In the example, ID
= 100 has four intervals. The first three overlap => the second record overlaps with the first (the start_time
of 10:15 is between the start_time
and end_time
of 10:00 to 12:00) and the third overlaps with the second (the start_time
of 12:15 is between the start_time
and end_time
of 10:15 to 12:30). Because of this, they all have the same group_id
of 1. The fourth interval for ID
= 100 does not overlap any of the other intervals within that ID
, and so it becomes its own group with a new group_id
. The last record has a completely different ID
and so it starts a third group also with a new group_id
.
我试过这个 MYSQL 脚本.输出不会重置组 ID 并按串行顺序继续.想知道哪些更改可以使其发挥作用.
edit: I've tried this MYSQL script. The output does not reset the group ID and continues in the serial order. Would like to know what changes can make it work.
WITH C1 AS (
SELECT *,
CASE
WHEN start_time <= MAX(IFnull(end_time,'9999-12-31 00:00:00.000')) OVER(
partition by id
ORDER BY start_time
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
THEN 0
ELSE 1
END AS isstart
FROM activity
)
SELECT ID,start_time,end_time,
SUM(isstart) OVER(ORDER BY ID ROWS UNBOUNDED PRECEDING) AS DG
FROM C1;
Query Output:
100 10:00:00 12:00:00 1
100 10:15:00 12:30:00 1
100 12:15:00 12:45:00 1
100 13:00:00 14:00:00 2
101 09:00:00 13:00:00 3
101 09:30:00 13:30:00 3
101 10:00:00 10:20:00 3
101 10:19:59 11:15:00 3
101 10:21:00 10:30:00 3
101 11:00:00 12:30:00 3
101 11:30:00 12:35:00 3
102 10:01:00 11:25:00 4
102 11:01:00 11:30:00 4
105 10:00:00 10:20:00 5
105 10:21:00 10:30:00 6
105 10:30:01 11:00:00 7
106 10:00:00 10:22:00 8
107 10:19:57 10:20:01 9
108 10:01:01 10:16:59 10
(去掉mysql-server标签)
(Removing the mysql-server tag)
WITH C1 AS (
SELECT *,
CASE
WHEN start_time <= MAX(IFnull(end_time,'9999-12-31 00:00:00.000')) OVER(
partition by id
ORDER BY start_time
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
THEN 0
ELSE 1
END AS isstart
FROM activity
)
SELECT ID,start_time,end_time,
SUM(isstart) OVER(partition by id ORDER BY ID ROWS UNBOUNDED PRECEDING) AS DG
FROM C1;
这应该对你有用