计算分组序列中两个值之间的差异

问题描述：

This is a follow-up question for this post: Loop through dataframe in R and measure time difference between two values

以下代码已经为我提供了出色的帮助计算特定刺激和下一个响应之间的时间差（分钟）：

I already got excellent help with the following code to calculate the time difference in minutes between a certain Stimuli and the next Response:

df$Date <- as.POSIXct(strptime(df$Date,"%d.%m.%Y %H:%M"))
df %>%
  arrange(User,Date)%>%
  mutate(difftime= difftime(lead(Date),Date, units = "mins") ) %>%
  group_by(User)%>%
  filter((StimuliA==1 | StimuliB==1) & lead(Responses)==1)`

数据集：

structure(list(User = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L), Date = c("24.11.2015 20:39", 
"25.11.2015 11:42", "11.01.2016 22:46", "26.11.2015 22:42", "04.03.2016 05:45", 
"24.11.2015 13:13", "25.11.2015 13:59", "27.11.2015 12:18", "28.05.2016 06:49", 
"06.07.2016 09:46", "03.12.2015 09:32", "07.12.2015 08:18", "08.12.2015 19:40", 
"08.12.2015 19:40", "22.12.2015 08:50", "22.12.2015 08:52", "22.12.2015 08:52", 
"22.12.2015 20:46"), StimuliA = c(1L, 0L, 0L, 1L, 1L, 1L, 0L, 
1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), StimuliB = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
0L), Responses = c(0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 
0L, 1L, 0L, 1L, 1L, 1L, 1L)), .Names = c("User", "Date", "StimuliA", 
"StimuliB", "Responses"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -18L), spec = structure(list(cols = structure(list(
    User = structure(list(), class = c("collector_integer", "collector"
    )), Date = structure(list(), class = c("collector_character", 
    "collector")), StimuliA = structure(list(), class = c("collector_integer", 
    "collector")), StimuliB = structure(list(), class = c("collector_integer", 
    "collector")), Responses = structure(list(), class = c("collector_integer", 
    "collector"))), .Names = c("User", "Date", "StimuliA", "StimuliB", 
"Responses")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

目标/问题 线索帮助确定激励== 1（A或B）与下一个响应[按日期/时间排序]（响应== 1）之间的时间差。我将如何更改该代码以找到此顺序中刺激A或B与 LAST 响应之间的时间差。（直到下一次刺激发生）

Goal/Question The lead arugment helped to determine the time difference between a Stimuli == 1 (A or B) and the next response [sorted by date/time] (Response == 1). How would i change that code to find the time difference between the Stimuli A or B and the LAST Response in this sequence. (until the next Stimuli occurs)

所需的输出：

User    Date        StimuliA    StimuliB    Responses   time diff  Seq_ID
1   24.11.2015 20:39    1           0           0                  1_1_0
1   25.11.2015 11:42    0           0           1                  1_1_0
1   11.01.2016 22:46    0           0           1       69247      1_1_0
2   26.11.2015 22:42    1           0           0                  2_1_0
2   04.03.2016 05:45    0           1           0                  2_1_1
3   24.11.2015 13:13    1           0           0                  3_1_0
3   25.11.2015 13:59    0           0           1       1486       3_1_0
3   27.11.2015 12:18    1           0           0                  3_2_0
3   28.05.2016 06:49    0           0           1                  3_2_0
3   06.07.2016 09:46    0           0           1       319528     3_2_0
4   03.12.2015 09:32    1           0           0                  4_1_0
4   07.12.2015 08:18    1           0           0                  4_2_0
4   08.12.2015 19:40    0           0           1       2122       4_1_0
4   08.12.2015 19:40    0           1           0                  4_2_1
4   22.12.2015 08:50    0           0           1       19510      4_2_1
5   22.12.2015 08:52    0           0           1                  5_0_0
5   22.12.2015 08:52    0           0           1                  5_0_0
5   22.12.2015 20:46    0           0           1                  5_0_0

对于刺激A，这意味着值c（69247， 319528，2122）和B c（1486，19510）。

For Stimuli A this meant the values c(69247, 319528, 2122) and B c(1486, 19510).

答

尝试一下。

# df$Date <- as.POSIXct(strptime(df$Date,"%d.%m.%Y %H:%M"))
df %>% 
  arrange(User, Date) %>% 
  group_by(User) %>%
  mutate(
    last.date = Date[which(StimuliA == 1L)[c(1,1:sum(StimuliA == 1L))][cumsum(StimuliA == 1L)+ 1]]
  ) %>%
  mutate(
    timesince = ifelse(Responses == 1L, Date - last.date, NA)
  )

通过首先创建一个记录最后刺激数据的列，然后使用 ifelse 和 lag 来获取当前日期和最后一个刺激日期之间的差额。您可以过滤器仅提取LAST响应。

This works by first creating a column that records the data of last stimuli, and then using ifelse and lag to get the difference between the current date and the last stimuli date. You can filter to extract only the LAST response.

有一种更干净的方法可以执行 last.date使用 zoo.na.locf 进行操作，但我不想假设您可以接受其他软件包依赖项。

There is a cleaner way to do the "last.date" operation with zoo.na.locf, but I didn't want to assume you were ok with another package dependency.

编辑要确定序列（如果我正确理解了序列的意思），请继续使用

EDIT To identify the sequence (if I correctly understand what you mean by "sequence"), continue the chain with

%>% mutate(sequence = cumsum(StimuliA))

鉴定定义为阳性刺激后观察结果的序列。要过滤出序列的最后一个响应，请继续使用

to identify sequences defined as observations following a positive Stimuli. To filter out the last response of a sequence, continue the chain with

%>% group_by(User, sequence) %>%
  filter(timesince == max(timesince, na.rm = TRUE))

按序列（和用户）分组，然后提取与每个序列相关的最大时间差（这将对应于序列的最后一个正响应）。

to group by sequence (and user) and then extract the maximum time difference associated with each sequence (which will correspond to the last positive response of a sequence).

计算分组序列中两个值之间的差异

相关推荐