如何合并具有相同列名的多个数据框?

问题描述:

我有一个主"数据框,其中包含以下各列:

I have a "master" dataframe that has the following columns:

userid, condition

由于有四个实验条件,因此我也有四个带有答案信息的数据框,其中包括以下几列:

Since there are four experiment conditions, I also have four dataframes that carry answer information, with the following columns:

userid, condition, answer1, answer2

现在,我想加入这些,因此用户ID,条件及其对这些条件的答案的所有组合都将合并.每种情况下,每行只应在相应的列中有正确的答案.

Now, I'd like to join these, so all combinations of user IDs, conditions and their answers to these conditions are merged. Each condition should only have the correct answer in the appropriate column, per row.

master = data.frame(userid=c("foo","foo","foo","foo","bar","bar","bar","bar"), condition=c("A","B","C","D","A","B","C","D"))
cond_a = data.frame(userid=c("foo","bar"), condition="A", answer1=c("1","1"), answer2=c("2","2"))
cond_b = data.frame(userid=c("foo","bar"), condition="B", answer1=c("3","3"), answer2=c("4","4"))
cond_c = data.frame(userid=c("foo","bar"), condition="C", answer1=c("5","5"), answer2=c("6","6"))
cond_d = data.frame(userid=c("foo","bar"), condition="D", answer1=c("7","7"), answer2=c("8","8"))

如何将所有条件合并到主数据库中,所以主数据库表如下所示?

How do I merge all conditions into the master, so the master table looks like follows?

  userid condition answer1 answer2
1    bar         A       1       2
2    bar         B       3       4
3    bar         C       5       6
4    bar         D       7       8
5    foo         A       1       2
6    foo         B       3       4
7    foo         C       5       6
8    foo         D       7       8

我尝试了以下操作:

temp = merge(master, cond_a, all.x=TRUE)

哪个给我:

  userid condition answer1 answer2
1    bar         A       1       2
2    bar         B    <NA>    <NA>
3    bar         C    <NA>    <NA>
4    bar         D    <NA>    <NA>
5    foo         A       1       2
6    foo         B    <NA>    <NA>
7    foo         C    <NA>    <NA>
8    foo         D    <NA>    <NA>

但是,一旦我这样做……

But as soon as I do this…

merge(temp, cond_b, all.x=TRUE)

没有条件B的值.怎么会来?

There are no values for condition B. How come?

  userid condition answer1 answer2
1    bar         A       1       2
2    bar         B    <NA>    <NA>
3    bar         C    <NA>    <NA>
4    bar         D    <NA>    <NA>
5    foo         A       1       2
6    foo         B    <NA>    <NA>
7    foo         C    <NA>    <NA>
8    foo         D    <NA>    <NA>

您可以按以下方式使用Reduce()complete.cases():

You can use Reduce() and complete.cases() as follows:

merged <- Reduce(function(x, y) merge(x, y, all=TRUE), 
                 list(master, cond_a, cond_b, cond_c, cond_d))
merged[complete.cases(merged), ]
#    userid condition answer1 answer2
# 1     bar         A       1       2
# 2     bar         B       3       4
# 4     bar         C       5       6
# 6     bar         D       7       8
# 8     foo         A       1       2
# 9     foo         B       3       4
# 11    foo         C       5       6
# 13    foo         D       7       8

Reduce()可能需要一些习惯.您定义函数,然后提供对象的list以重复地将函数应用到该对象.因此,该声明就像:

Reduce() might take some getting accustomed to. You define your function, and then provide a list of objects to repeatedly apply the function to. Thus, that statement is like doing:

temp1 <- merge(master, cond_a, all=TRUE)
temp2 <- merge(temp1, cond_b, all=TRUE)
temp3 <- merge(temp2, ....)

或类似的东西

merge(merge(merge(master, cond_a, all=TRUE), cond_b, all=TRUE), cond_c, all=TRUE)

complete.cases()创建一个逻辑向量,用于确定指定的列是否完整";该逻辑向量可用于从合并后的data.frame子集中.

complete.cases() creates a logical vector of whether the specified columns are "complete" or not; this logical vector can be used to subset from the merged data.frame.