选择每个组中具有最大值的行

问题描述:

在具有针对每个主题的多个观察值的数据集中.对于每个主题,我想选择最大值为"pt"的行.例如,使用以下数据集:

In a dataset with multiple observations for each subject. For each subject I want to select the row which have the maximum value of 'pt'. For example, with a following dataset:

ID    <- c(1,1,1,2,2,2,2,3,3)
Value <- c(2,3,5,2,5,8,17,3,5)
Event <- c(1,1,2,1,2,1,2,2,2)

group <- data.frame(Subject=ID, pt=Value, Event=Event)
#   Subject pt Event
# 1       1  2     1
# 2       1  3     1
# 3       1  5     2 # max 'pt' for Subject 1
# 4       2  2     1
# 5       2  5     2
# 6       2  8     1
# 7       2 17     2 # max 'pt' for Subject 2
# 8       3  3     2
# 9       3  5     2 # max 'pt' for Subject 3

对象1、2和3的最大pt值分别为5、17和5.

Subject 1, 2, and 3 have the biggest pt value of 5, 17, and 5 respectively.

我如何首先找到每个主题的最大pt值,然后将此观察值放在另一个数据框中?结果数据框应仅对每个主题具有最大的pt值.

How could I first find the biggest pt value for each subject, and then, put this observation in another data frame? The resulting data frame should only have the biggest pt values for each subject.

这是一个 data.table 解决方案:

require(data.table) ## 1.9.2
group <- as.data.table(group)

如果要在每个组中保留与 pt 的最大值对应的所有条目:

If you want to keep all the entries corresponding to max values of pt within each group:

group[group[, .I[pt == max(pt)], by=Subject]$V1]
#    Subject pt Event
# 1:       1  5     2
# 2:       2 17     2
# 3:       3  5     2

如果您只想 pt 的第一个最大值:

If you'd like just the first max value of pt:

group[group[, .I[which.max(pt)], by=Subject]$V1]
#    Subject pt Event
# 1:       1  5     2
# 2:       2 17     2
# 3:       3  5     2

在这种情况下,这没有什么区别,因为数据的任何组中都没有多个最大值.

In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.