R - 基于两列识别和删除重复的行

问题描述：

我有一些看起来像这样的数据:

I have some data that looks like this:

Course_ID   Text_ID
33          17
33          17
58          17
5           22
8           22
42          25
42          25
17          26
17          26
35          39
51          39

没有编程背景，我发现表达我的问题很棘手，但这里是:我只想保留 Course_ID 变化但 Text_ID 变化的行> 是一样的.因此，例如，最终数据将如下所示:

Not having a background in programming, I'm finding it tricky to articulate my question, but here goes: I only want to keep rows where Course_ID varies but where Text_ID is the same. So for example, the final data would look something like this:

Course_ID   Text_ID
5           22
8           22
35          39
51          39

如您所见，只有 Text_ID 22 和 39 具有不同的 Course_ID 值.我怀疑对数据进行子集化是可行的方法，但正如我所说，我在这方面是个新手，非常感谢有关如何处理此问题的任何建议.

As you can see, Text_ID 22 and 39 are the only ones that have different Course_ID values. I suspect subsetting the data would be the way to go, but as I said, I'm quite a novice at this kind of thing and would really appreciate any advice on how to approach this.

答

选择那些没有重复Course_ID的组.

Select those groups where there is no repeats of Course_ID.

在 dplyr 中你可以把它写成 -

In dplyr you can write this as -

library(dplyr)
df %>% group_by(Text_ID) %>% filter(n_distinct(Course_ID) == n()) %>% ungroup

#  Course_ID Text_ID
#      <int>   <int>
#1         5      22
#2         8      22
#3        35      39
#4        51      39

和data.table -

library(data.table)
setDT(df)[, .SD[uniqueN(Course_ID) == .N], Text_ID]

R - 基于两列识别和删除重复的行

相关推荐