R根据其他列删除重复项

R根据其他列删除重复项

问题描述:

我想根据其他列的相似或不同之处删除重复项.

I want to remove duplicates based on similarities or differences of other columns.

所有重复的ID应该被完全删除,即使它们具有不同的颜色.他们是否也具有不同的子组也没关系.如果它们具有相同的ID和相同的颜色,则只保留第一个.

All the duplicated ID should be completely removed but just if they have DIFFERENT colours. It doesn't matter if they have different subgroups as well. If they have the same ID AND the same colour, just the first one should be kept.

最后,我要列出所有ID的列表,这些ID仅是单色的(与子组无关).所有的多色ID应被删除.

At the end I want to have a list of all ID which are single-colour only (independent of subgroup). All the multicoloured ID should be removed.

这里和例子:

   id colour   subgroup
1   1    red   lightred
2   2   blue  lightblue
3   2   blue   darkblue
4   3    red   lightred
5   4    red    darkred
6   4    red    darkred
7   4   blue  lightblue
8   5  green  darkgreen
9   5  green  darkgreen
10  5  green lightgreen
11  6    red    darkred
12  6   blue   darkblue
13  6  green lightgreen

最后应该看起来像这样:

At the end it should look like this:

  id colour  subgroup
1  1    red  lightred
2  2   blue lightblue
4  3    red  lightred
8  5  green darkgreen

我在此示例中使用的数据:

The data I used for this example:

id = c(1,2,2,3,4,4,4,5,5,5,6,6,6)
colour = c("red","blue","blue","red","red","red","blue","green","green","green","red","blue","green")
subgroup = c("lightred","lightblue","darkblue","lightred","darkred","darkred","lightblue","darkgreen","darkgreen","lightgreen","darkred","darkblue","lightgreen")
data = data.frame(cbind(id,colour,subgroup))

感谢您的帮助!

library(tidyverse)
data%>%
  group_by(id)%>%
  filter(1==length(unique(colour)),!duplicated(colour))
# A tibble: 4 x 3
# Groups:   id [4]
  id    colour subgroup 
  <fct> <fct>  <fct>    
1 1     red    lightred 
2 2     blue   lightblue
3 3     red    lightred 
4 5     green  darkgreen

使用基本R:

 subset(data,as.logical(ave(colour,id,FUN=function(x)length(unique(x))==1& !duplicated(x))))
  id colour  subgroup
1  1    red  lightred
2  2   blue lightblue
4  3    red  lightred
8  5  green darkgreen