如何在Pandas数据框中找到5分钟的时间间隔?

问题描述:

我有一个看起来像这样的熊猫数据框:

I have pandas dataframe that looks similar to this:

                     TIMESTAMP  EVENT_COUNT
0          2014-07-23 04:28:23            1 
1          2014-07-23 04:28:24            1
2   2014-07-23 04:28:25.999000            4
3          2014-07-23 04:28:27            1
4   2014-07-23 04:28:28.999000            2
5          2014-07-23 04:28:30            1
6          2014-07-23 04:29:31            7
7          2014-07-23 04:29:33            1
8          2014-07-23 04:29:34            1
9          2014-07-23 04:29:36            1
10         2014-07-23 04:40:37            2
11         2014-07-23 04:40:39            1
12         2014-07-23 04:40:40            1
13         2014-07-23 04:40:42            1
14         2014-07-23 04:40:43            1
15  2014-07-23 04:40:44.999000            4
16         2014-07-23 04:41:46            1
17         2014-07-23 04:41:47            1
18         2014-07-23 04:41:49            1
19         2014-07-23 04:41:50            1
20         2014-07-23 04:50:52            9
21         2014-07-23 04:50:53            4
22         2014-07-23 04:50:55            6
23         2014-07-27 01:12:13            1

我的最终目标是能够找到超过5分钟的差距.因此,从上面,我会发现:

My end goal is to be able to find gaps in this that exceed 5 minutes. So, from above, I'd find a gap between:

2014-07-23 04:29:36 and 2014-07-23 04:40:37
2014-07-23 04:41:50 and 2014-07-23 04:50:52
2014-07-23 04:50:55 and 2014-07-27 01:12:13

小于5分钟的间隙无需识别.因此以下内容不会被视为空白".

Gaps of less than 5 minutes do not need to be identified. So the following wouldn't be recognized as a "gap".

2014-07-23 04:28:30 and 2014-07-23 04:29:31    (Only 61 seconds)
2014-07-23 04:40:37 and 2014-07-23 04:40:39    (Only 2 seconds)
2014-07-23 04:40:44.999000 and 2014-07-23 04:41:46 (Just over 61 seconds)

如何找到上述差距?当我尝试此 answer 中提到的解决方案时,似乎没有任何改变.我使用了以下命令:

How can I find the gaps mentioned above? When I tried the solution mentioned in this answer, nothing seems to have changed. I used the following command:

df.reindex(pd.date_range(min(df['TIMESTAMP']), max(df['TIMESTAMP']), freq='5min')).fillna(0)

运行此命令后,数据帧看起来相同.

The dataframe looks the same after this command is run.

IIUC,只要dtype已经是datetime64,那么您就可以使用 dt.seconds :

IIUC so long as the dtype are datetime64 already then you can just use diff which will create a timedelta and then call the attribute dt.seconds:

In [8]:
df['OVER 5 MINS'] = (df['TIMESTAMP'].diff()).dt.seconds > 300
df

Out[8]:
                    TIMESTAMP  EVENT_COUNT OVER 5 MINS
INDEX                                                 
0     2014-07-23 04:28:23.000            1       False
1     2014-07-23 04:28:24.000            1       False
2     2014-07-23 04:28:25.999            4       False
3     2014-07-23 04:28:27.000            1       False
4     2014-07-23 04:28:28.999            2       False
5     2014-07-23 04:28:30.000            1       False
6     2014-07-23 04:29:31.000            7       False
7     2014-07-23 04:29:33.000            1       False
8     2014-07-23 04:29:34.000            1       False
9     2014-07-23 04:29:36.000            1       False
10    2014-07-23 04:40:37.000            2        True
11    2014-07-23 04:40:39.000            1       False
12    2014-07-23 04:40:40.000            1       False
13    2014-07-23 04:40:42.000            1       False
14    2014-07-23 04:40:43.000            1       False
15    2014-07-23 04:40:44.999            4       False
16    2014-07-23 04:41:46.000            1       False
17    2014-07-23 04:41:47.000            1       False
18    2014-07-23 04:41:49.000            1       False
19    2014-07-23 04:41:50.000            1       False
20    2014-07-23 04:50:52.000            9        True
21    2014-07-23 04:50:53.000            4       False
22    2014-07-23 04:50:55.000            6       False
23    2014-07-27 01:12:13.000            1        True