根据多个条件连接两个 pandas 数据框

问题描述:

df_adf_b是看起来像下面的两个数据帧

df_a and df_b are two dataframes that looks like following

df_a
A   B       C      D     E
x1  Apple   0.3   0.9    0.6
x1  Orange  0.1   0.5    0.2
x2  Apple   0.2   0.2    0.1
x2  Orange  0.3   0.4    0.9
x2  Mango   0.1   0.2    0.3
x3  Orange  0.3   0.1    0.2


df_b
A   B_new   F    
x1  Apple   0.3  
x1  Mango   0.2  
x1  Orange  0.1   
x2  Apple   0.2   
x2  Orange  0.3     
x2  Mango   0.1  
x3  Orange  0.3  
x3  Mango   0.2  
x3  Apple   0.1  

我希望我的final_df包含df_a中包含的所有行,以便它考虑df_a['A'] == df_b['A']df_a['B'] == df_b['B_new']的唯一组合.

I want my final_df to contain all the rows contained in df_a such that it contemplates the unique combination of df_a['A'] == df_b['A'] and df_a['B'] == df_b['B_new'].

我尝试进行外部联接,然后将重复的w.r.t列A和B放入final_df中,但未保留B_new的值.

I've tried doing outer join and then drop duplicates w.r.t columns A and B in final_df but the value of B_new is not retained.

以下是我希望我的result_df看起来像的样子:

Following is how I want my result_df to look like:

result_df

result_df

 A   B       C      D     E   B_new  F
x1  Apple   0.3   0.9    0.6  Apple  0.3
x1  Orange  0.1   0.5    0.2  Orange 0.1
x2  Apple   0.2   0.2    0.1  Apple   0.2 
x2  Orange  0.3   0.4    0.9  Orange  0.3
x2  Mango   0.1   0.2    0.3  Mango   0.1
x3  Orange  0.3   0.1    0.2  Orange  0.3

我还尝试了左外部联接:

I also tried left outer join:

final_df = pd.merge(df_a, df_b, how="left", on=['A'])

此数据帧的大小是df_adf_b的并集,这不是我想要的.

The size of this dataframe is a union of df_a and df_b which is not what I want.

赞赏任何建议.

您需要一个内部合并,在每种情况下均指定两者合并列:

You need an inner merge, specifying both merge columns in each case:

res = df_a.merge(df_b, how='inner', left_on=['A', 'B'], right_on=['A', 'B_new'])

print(res)

    A       B    C    D    E   B_new    F
0  x1   Apple  0.3  0.9  0.6   Apple  0.3
1  x1  Orange  0.1  0.5  0.2  Orange  0.1
2  x2   Apple  0.2  0.2  0.1   Apple  0.2
3  x2  Orange  0.3  0.4  0.9  Orange  0.3
4  x2   Mango  0.1  0.2  0.3   Mango  0.1
5  x3  Orange  0.3  0.1  0.2  Orange  0.3