Pandas 中的 join 和 merge 有什么区别?

Pandas 中的 join 和 merge 有什么区别?

问题描述:

假设我有两个像这样的 DataFrame:

Suppose I have two DataFrames like so:

left = pd.DataFrame({'key1': ['foo', 'bar'], 'lval': [1, 2]})

right = pd.DataFrame({'key2': ['foo', 'bar'], 'rval': [4, 5]})

我想合并它们,所以我尝试了这样的操作:

I want to merge them, so I try something like this:

pd.merge(left, right, left_on='key1', right_on='key2')

我很开心

    key1    lval    key2    rval
0   foo     1       foo     4
1   bar     2       bar     5

但我正在尝试使用 join 方法,我一直认为它非常相似.

But I'm trying to use the join method, which I've been lead to believe is pretty similar.

left.join(right, on=['key1', 'key2'])

我明白了:

//anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in _validate_specification(self)
    406             if self.right_index:
    407                 if not ((len(self.left_on) == self.right.index.nlevels)):
--> 408                     raise AssertionError()
    409                 self.right_on = [None] * n
    410         elif self.right_on is not None:

AssertionError: 

我错过了什么?

我总是在索引上使用 join:

I always use join on indices:

import pandas as pd
left = pd.DataFrame({'key': ['foo', 'bar'], 'val': [1, 2]}).set_index('key')
right = pd.DataFrame({'key': ['foo', 'bar'], 'val': [4, 5]}).set_index('key')
left.join(right, lsuffix='_l', rsuffix='_r')

     val_l  val_r
key            
foo      1      4
bar      2      5

在以下列上使用 merge 可以获得相同的功能:

The same functionality can be had by using merge on the columns follows:

left = pd.DataFrame({'key': ['foo', 'bar'], 'val': [1, 2]})
right = pd.DataFrame({'key': ['foo', 'bar'], 'val': [4, 5]})
left.merge(right, on=('key'), suffixes=('_l', '_r'))

   key  val_l  val_r
0  foo      1      4
1  bar      2      5