如何在R中的组中选择具有特定值的行
我正在R中进行循环和函数训练(但目前处于非常基本的水平).对于最近的研究,我需要准备以下数据:
I am training myself in loops and functions in R (but am at a really basic level at the moment). For a recent study, I need to prepare my data as following:
我有一个看起来像这样的数据集:
I have a data set that looks like this:
dd <- read.table(text="
event.timeline.ys ID year group
1 2 800033 2008 A
2 1 800033 2009 A
3 0 800033 2010 A
4 -1 800033 2011 A
5 -2 800033 2012 A
15 0 800076 2008 B
16 -1 800076 2009 B
17 5 800100 2014 C
18 4 800100 2015 C
19 2 800100 2017 C
20 1 800100 2018 C
30 0 800125 2008 A
31 -1 800125 2009 A
32 -2 800125 2010 A", header=TRUE)
我只想为每个人保留event.timeline.ys> = 0的 last 行(ID 800033为第3行)和 first >具有event.timeline.ys<的行0(这将是ID 800033的第4行).所有其他行将被删除.因此,我的最终数据帧应每个ID仅包含两行.
I would like to keep for each person only the last row with event.timeline.ys >= 0 (this would be row 3 for ID 800033) and the first row with event.timeline.ys < 0 (this would be row 4 for ID 800033). All other rows would be deleted. My final data frame should therefore contain only two rows per ID.
ID = 800100的人的event.timeline.ys上没有任何负值.在这种情况下,我只想保留event.timeline.ys> = 0的最后一行.
The person with the ID = 800100 does not have any negative values on event.timeline.ys. In this case, I would like to keep only the last row with event.timeline.ys >= 0.
然后,最终数据集将如下所示:
The final data set would then look like this:
event.timeline.ys ID year group
3 0 800033 2010 A
4 -1 800033 2011 A
15 0 800076 2008 B
16 -1 800076 2009 B
20 1 800100 2018 C
30 0 800125 2008 A
31 -1 800125 2009 A
我考虑过使用for循环在每个ID中检查带event.timeline.ys> = 0的 last 行和带事件的 first 行.时间轴
I thought about using a for-loop to check within each ID what the last row with event.timeline.ys >= 0 and the first row with event.timeline.ys < 0 is. However, the practical implementation in R fails.
有人建议吗?对于不基于for循环或类似内容的其他解决方案,我也持开放态度.
Does anyone has a smart advice? I am also very open to other solutions that are not based on for-loops or similar stuff.
以下是在dplyr中使用group_by
的一个选项:
Here's one option making use of group_by
in dplyr:
dd %>% group_by(ID, category = event.timeline.ys >= 0) %>%
filter(abs(event.timeline.ys) == min(abs(event.timeline.ys))) %>%
dplyr::select(-category) %>%
as.data.frame
category event.timeline.ys ID year group
1 TRUE 0 800033 2010 A
2 FALSE -1 800033 2011 A
3 TRUE 0 800076 2008 B
4 FALSE -1 800076 2009 B
5 TRUE 1 800100 2018 C
6 TRUE 0 800125 2008 A
7 FALSE -1 800125 2009 A