根据多个条件连接两个 pandas 数据框
df_a
和df_b
是看起来像下面的两个数据帧
df_a
and df_b
are two dataframes that looks like following
df_a
A B C D E
x1 Apple 0.3 0.9 0.6
x1 Orange 0.1 0.5 0.2
x2 Apple 0.2 0.2 0.1
x2 Orange 0.3 0.4 0.9
x2 Mango 0.1 0.2 0.3
x3 Orange 0.3 0.1 0.2
df_b
A B_new F
x1 Apple 0.3
x1 Mango 0.2
x1 Orange 0.1
x2 Apple 0.2
x2 Orange 0.3
x2 Mango 0.1
x3 Orange 0.3
x3 Mango 0.2
x3 Apple 0.1
我希望我的final_df
包含df_a
中包含的所有行,以便它考虑df_a['A'] == df_b['A']
和df_a['B'] == df_b['B_new']
的唯一组合.
I want my final_df
to contain all the rows contained in df_a
such that it contemplates the unique combination of df_a['A'] == df_b['A']
and df_a['B'] == df_b['B_new']
.
我尝试进行外部联接,然后将重复的w.r.t列A和B放入final_df
中,但未保留B_new的值.
I've tried doing outer join and then drop duplicates w.r.t columns A and B in final_df
but the value of B_new is not retained.
以下是我希望我的result_df
看起来像的样子:
Following is how I want my result_df
to look like:
result_df
result_df
A B C D E B_new F
x1 Apple 0.3 0.9 0.6 Apple 0.3
x1 Orange 0.1 0.5 0.2 Orange 0.1
x2 Apple 0.2 0.2 0.1 Apple 0.2
x2 Orange 0.3 0.4 0.9 Orange 0.3
x2 Mango 0.1 0.2 0.3 Mango 0.1
x3 Orange 0.3 0.1 0.2 Orange 0.3
我还尝试了左外部联接:
I also tried left outer join:
final_df = pd.merge(df_a, df_b, how="left", on=['A'])
此数据帧的大小是df_a
和df_b
的并集,这不是我想要的.
The size of this dataframe is a union of df_a
and df_b
which is not what I want.
赞赏任何建议.
您需要一个内部合并,在每种情况下均指定两者合并列:
You need an inner merge, specifying both merge columns in each case:
res = df_a.merge(df_b, how='inner', left_on=['A', 'B'], right_on=['A', 'B_new'])
print(res)
A B C D E B_new F
0 x1 Apple 0.3 0.9 0.6 Apple 0.3
1 x1 Orange 0.1 0.5 0.2 Orange 0.1
2 x2 Apple 0.2 0.2 0.1 Apple 0.2
3 x2 Orange 0.3 0.4 0.9 Orange 0.3
4 x2 Mango 0.1 0.2 0.3 Mango 0.1
5 x3 Orange 0.3 0.1 0.2 Orange 0.3