如何在R中的组中选择具有特定值的行

问题描述：

我正在R中进行循环和函数训练(但目前处于非常基本的水平).对于最近的研究，我需要准备以下数据:

I am training myself in loops and functions in R (but am at a really basic level at the moment). For a recent study, I need to prepare my data as following:

我有一个看起来像这样的数据集:

I have a data set that looks like this:

dd <- read.table(text="
    event.timeline.ys     ID     year    group
1                   2     800033 2008    A
2                   1     800033 2009    A   
3                   0     800033 2010    A   
4                  -1     800033 2011    A   
5                  -2     800033 2012    A   
15                  0     800076 2008    B
16                 -1     800076 2009    B
17                  5     800100 2014    C     
18                  4     800100 2015    C   
19                  2     800100 2017    C   
20                  1     800100 2018    C   
30                  0     800125 2008    A    
31                 -1     800125 2009    A    
32                 -2     800125 2010    A", header=TRUE)

我只想为每个人保留event.timeline.ys> = 0的 last 行(ID 800033为第3行)和 first >具有event.timeline.ys<的行0(这将是ID 800033的第4行).所有其他行将被删除.因此，我的最终数据帧应每个ID仅包含两行.

I would like to keep for each person only the last row with event.timeline.ys >= 0 (this would be row 3 for ID 800033) and the first row with event.timeline.ys < 0 (this would be row 4 for ID 800033). All other rows would be deleted. My final data frame should therefore contain only two rows per ID.

ID = 800100的人的event.timeline.ys上没有任何负值.在这种情况下，我只想保留event.timeline.ys> = 0的最后一行.

The person with the ID = 800100 does not have any negative values on event.timeline.ys. In this case, I would like to keep only the last row with event.timeline.ys >= 0.

然后，最终数据集将如下所示:

The final data set would then look like this:

    event.timeline.ys     ID     year    group  
3                   0     800033 2010    A   
4                  -1     800033 2011    A      
15                  0     800076 2008    B
16                 -1     800076 2009    B 
20                  1     800100 2018    C   
30                  0     800125 2008    A    
31                 -1     800125 2009    A

我考虑过使用for循环在每个ID中检查带event.timeline.ys> = 0的 last 行和带事件的 first 行.时间轴

I thought about using a for-loop to check within each ID what the last row with event.timeline.ys >= 0 and the first row with event.timeline.ys < 0 is. However, the practical implementation in R fails.

有人建议吗?对于不基于for循环或类似内容的其他解决方案，我也持开放态度.

Does anyone has a smart advice? I am also very open to other solutions that are not based on for-loops or similar stuff.

答

以下是在dplyr中使用group_by的一个选项:

Here's one option making use of group_by in dplyr:

dd %>% group_by(ID, category = event.timeline.ys >= 0) %>% 
  filter(abs(event.timeline.ys) == min(abs(event.timeline.ys))) %>% 
  dplyr::select(-category) %>%
  as.data.frame

  category event.timeline.ys     ID year group
1     TRUE                 0 800033 2010     A
2    FALSE                -1 800033 2011     A
3     TRUE                 0 800076 2008     B
4    FALSE                -1 800076 2009     B
5     TRUE                 1 800100 2018     C
6     TRUE                 0 800125 2008     A
7    FALSE                -1 800125 2009     A

如何在R中的组中选择具有特定值的行

相关推荐