使用带有向量的dplyr的子集数据帧

问题描述:

我知道如何使用dplyr,但是在这里我被卡住了

I know how to use dplyr but here I'm stuck

我有一个向量,例如:

v <- c("A","B","C")

和一个数据框,例如

Groups letters 
G1 A
G1 B
G1 C
G1 C
G2 A
G2 C
G3 A
G3 A
G3 C
G4 C

我想只保留具有所有字母 Groups .

And I would like ton only keep Groups that have all the letters.

,然后在此示例中仅保留G1,因为存在 v 中存在的所有 A,B C .

and then keep only G1 in this exemple because all A,B and C present in v are present.

我尝试过:

filtred_df2=filtred_df %>%
  group_by(Groups) %>%
  filter(all(letters %in% v))

可能有更短的方法,但这应该可行.首先,我们将数据限制为V中的行,然后计算该组中有多少个字母并将其与V中的唯一字母的数量进行比较.最后加入原始数据以仅包含所有字母的组.

There's probably a shorter way, but this should work. First, we limit the data to rows in V, then we count how many of the letters that group has and compare that to the number of unique letters in V. Finally join to original data to only include groups with all letters.

filtred_df %>%
  filter(letters %in% v) %>%  # Only care about letters that are in V
  count(Groups, letters) %>%   # or distinct(Groups, letters) %>%
  count(Groups) %>%
  filter(n == length(unique(v))) %>%
  select(-n) %>%
  left_join(filtred_df)