SQL-从MySQL数据(带有时间戳和状态消息的行集)中查找所有停机时间和停机时间长度

问题描述:

我已经开始使用循环PHP脚本监视ISP的停机时间,该脚本每5秒自动检查一次连接并将结果存储在MySQL数据库中.脚本检查它是否能够访问几个远程网站并记录结果.检查的时间和状态始终存储在数据库中.

I have started monitoring my ISP's downtimes with a looping PHP script which checks the connection automatically every 5 seconds and stores the result in MySQL database. The scripts checks if it's able to reach a couple of remote websites and logs the result. The time and status of the check are always stored in the database.

表的结构如下:

id (auto increment)
time (time stamp)
status (varchar)

现在是我的问题.

我有数据,但是我不知道如何使用它来获得我想要的结果.基本上,我想查找连接断开的所有时间段以及连接断开的时间.

I have the data, but I don't know how to use it to achieve the result I would like to get. Basically I would like to find all the periods of time when the connection was down and for how long the connection was down.

例如,如果我们有10行包含以下数据

For instance if we have 10 rows with following data

0 | 2012-07-24 22:23:00 | up
1 | 2012-07-24 22:23:05 | up
2 | 2012-07-24 22:23:10 | down
3 | 2012-07-24 22:23:16 | down
4 | 2012-07-24 22:23:21 | up
5 | 2012-07-24 22:23:26 | down
6 | 2012-07-24 22:23:32 | down
7 | 2012-07-24 22:23:37 | up
8 | 2012-07-24 22:23:42 | up
9 | 2012-07-24 22:23:47 | up

查询应返回句点(从22:23:10到22:23:21,以及从22:23:26到22:23:37).因此,查询应该始终查找到第一次断开连接到第一次重新建立连接之间的时间.

the query should return the periods (from 22:23:10 to 22:23:21, and from 22:23:26 to 22:23:37). So the query should find always the time between the first time the connection goes down, and the first time the connection is up again.

我认为一种可行的方法是查找连接断开或上升的所有行,但是如何找到这些行呢?还有比这更好的解决方案吗?

One method I thought could work was finding all the rows where the connection goes down or up, but how could I find these rows? And is there some better solution than this?

我真的不知道查询应该是什么样子,因此将非常感谢您的帮助.

I really don't know what the query should look like, so the help would be highly appreciated.

谢谢拉西(Lassi)

这是一种方法.

首先按时间戳顺序获取状态行(内联视图别名为s).然后,在处理每一行时,请使用MySQL用户变量保留前几行的值.

Start by getting the status rows in order by timestamp (inline view aliased as s). Then use MySQL user variables to keep the values from previous rows, as you process through each row.

我们真正想要的是紧随一系列下降"状态之后的上升"状态.当我们找到具有"up"状态的行时,我们真正需要的是来自先前"down"状态系列的最早时间戳.

What we're really looking for is an 'up' status that immediately follows a sequence of 'down' status. And when we find that row with the 'up' status, what we really need is the earliest timestamp from the preceding series of 'down' status.

因此,类似这样的方法将起作用:

So, something like this will work:

SELECT d.start_down
     , d.ended_down
  FROM (SELECT @i := @i + 1 AS i
             , @start := IF(s.status = 'down' AND (@status = 'up' OR @i = 1), s.time, @start) AS start_down
             , @ended := IF(s.status = 'up' AND @status = 'down', s.time, NULL) AS ended_down
             , @status := s.status
         FROM (SELECT t.time
                    , t.status
                 FROM mydata t
                WHERE t.status IN ('up','down')
                ORDER BY t.time ASC, t.status ASC
              ) s
         JOIN (SELECT @i := 0, @status := 'up', @ended := NULL, @start := NULL) i
      ) d
WHERE d.start_down IS NOT NULL
  AND d.ended_down IS NOT NULL

这适用于您显示的特定数据集.

This works for the particular data set you show.

这不能处理的(不返回的内容)是尚未结束的下降"时间段,即一系列下降"状态,后面没有上升"状态.

What this doesn't handle (what it doesn't return) is a 'down' period that is not yet ended, that is, a sequence of 'down' status with no following 'up' status.

为避免文件排序操作按顺序返回行,您需要在(time,status)上包含覆盖索引.该查询将生成一个临时(MyISAM)表,以实现别名为d的内联视图.

To avoid a filesort operation to return the rows in order, you'll want a covering index on (time,status). This query will generate a temporary (MyISAM) table to materialize the inline view aliased as d.

注意:要了解此查询的功能,请剥离该最外面的查询,然后仅对别名为d的内联视图运行查询(可以将s.time添加到选择中列表.)

NOTE: To understand what this query is doing, peel off that outermost query, and run just the query for the inline view aliased as d (you can add s.time to the select list.)

此查询获取的每一行都具有"up"或"down"状态. 技巧"是它仅在结束下降"时间段的行上分配开始"和结束"时间(标记下降时间). (也就是说,第一行的状态为"up",紧随其后的状态为"down".)这是完成实际工作的地方,最外面的查询只是过滤掉该结果集中的所有多余"行(不需要.)

This query is getting every row with an 'up' or 'down' status. The "trick" is that it is assigning both a "start" and "end" time (marking a down period) on only the rows that end a 'down' period. (That is, the first row with an 'up' status following rows with a 'down' status.) This is where the real work is done, the outermost query just filters out all the "extra" rows in this resultset (that we don't need.)

SELECT @i := @i + 1 AS i
     , @start := IF(s.status = 'down' AND (@status = 'up' OR @i = 1), s.time, @start) AS start_down
     , @ended := IF(s.status = 'up' AND @status = 'down', s.time, NULL) AS ended_down
     , @status := s.status
     , s.time
  FROM (SELECT t.time
             , t.status
          FROM mydata t
         WHERE t.status IN ('up','down')
         ORDER BY t.time ASC, t.status ASC
       ) s
  JOIN (SELECT @i := 0, @status := 'up', @ended := NULL, @start := NULL) i

内联视图别名为s的目的是获取按时间戳记值排序的行,因此我们可以按顺序处理它们.别名为i的内联视图就在那里,因此我们可以在查询开始时初始化一些用户变量.

The purpose of inline view aliased as s is to get the rows ordered by timestamp value, so we can process them in sequence. The inline view aliased as i is just there so we can initialize some user variables at the start of the query.

如果我们在Oracle或SQL Server上运行,则可以使用解析函数"或排序函数"(分别命名).MySQL没有提供类似的功能,因此我们必须自己滚".

If we were running on Oracle or SQL Server, we could make use of "analytic functions" or "ranking functions" (as they are named, respectively.) MySQL doesn't provide anything like that, so we have to "roll our own".