找到每个病人最匹配的时间
我有两组数据:
第一组:
patient<-c("A","A","B","B","C","C","C","C")
arrival<-c("11:00","11:00","13:00","13:00","14:00","14:00","14:00","14:00")
lastRow<-c("","Yes","","Yes","","","","Yes")
data1<-data.frame(patient,arrival,lastRow)
另一组数据:
patient<-c("A","A","A","A","B","B","B","C","C","C")
availableSlot<-c("11:15","11:35","11:45","11:55","12:55","13:55","14:00","14:00","14:10","17:00")
data2<-data.frame(patient, availableSlot)
我想为第一个数据集创建一个列,以便每个患者的每一行都显示可用的插槽,是
最接近到达时间:
I want to create add a column to the first dataset such that for each last row of each patient, it shows the available slot that is closest to the arrival time:
结果将是:
patient arrival lastRow availableSlot
A 11:00
A 11:00 Yes 11:15
B 13:00
B 13:00 Yes 12:55
C 14:00
C 14:00
C 14:00
C 14:00 Yes 14:00
感谢任何人可以告诉我如何实现这个在R.中
Would appreciate if anyone can tell me how I can implement this in R.
我将使用data.table,首先通过转换为ITime进行清理,忽略冗余行: p>
I'd use data.table, first cleaning up by converting to ITime and ignoring redundant rows:
library(data.table)
setDT(data1)[, arrival := as.ITime(as.character(arrival))]
setDT(data2)[, availableSlot := as.ITime(as.character(availableSlot))]
DT1 = unique(data1, by="patient", fromLast=TRUE)
然后你可以做一个滚动加入:
Then you can do a "rolling join":
res = data2[DT1, on=.(patient, availableSlot = arrival), roll="nearest",
.(patient, availableSlot = x.availableSlot)]
# patient availableSlot
# 1: A 11:15:00
# 2: B 12:55:00
# 3: C 14:00:00
如何工作
语法是 x [i,on =,roll =,j]
。
-
on =
是合并列。 - 这是一个连接:对于
i
的每一行,我们正在寻找x
。 - 使用
roll =nearest
,on = / code>被滚动到最接近的匹配。
- 可以引用原始表中的
on =
列与x。*
和i。*
前缀。 -
j
参数应该列出列,而。()
是list()的别名,
这里。
-
on=
are the merge-by columns. - It's a join: for each row of
i
, we are looking for matches inx
. - With
roll="nearest"
, the final column in theon=
is "rolled" to its nearest match. - The
on=
columns in the original tables can be referenced withx.*
andi.*
prefixes. - The
j
argument should give a list of columns, and.()
is an alias forlist()
here.
查看包裹的介绍资料 http://r-datatable.com/Getting-started 并键入?data.table
用于与滚动联接相关的文档。
Check out the package's introductory materials at http://r-datatable.com/Getting-started and type ?data.table
for the docs relevant to rolling joins.
我将停在 res
,但如果你真的想要它在原来的表...
I would stop at res
, but if you really want it back in your original table...
# a very nonstandard step:
data1[lastRow == "Yes", availableSlot := res$availableSlot ]
# patient arrival lastRow availableSlot
# 1: A 11:00:00 <NA>
# 2: A 11:00:00 Yes 11:15:00
# 3: B 13:00:00 <NA>
# 4: B 13:00:00 Yes 12:55:00
# 5: C 14:00:00 <NA>
# 6: C 14:00:00 <NA>
# 7: C 14:00:00 <NA>
# 8: C 14:00:00 Yes 14:00:00
现在, data1
在新列中有 availableSlot
,类似于 data1 $ col 。
Now, data1
has availableSlot
in a new column, similar to when you do data1$col <- val
.