Python Pandas-Concat两个具有不同行数和列数的数据框

问题描述:

我有两个具有不同行号和列的数据框。这两个表都有很少的公共列,包括客户ID。这两个表的大小分别为11697行×15列和385839行×6列。客户ID在第二张表中可能重复。我想合并两个表,并希望使用Customer ID合并相似的列。如何使用python PANDAS做到这一点。
一张桌子看起来像这样-

I have two data frames with different row numbers and columns. Both tables has few common columns including "Customer ID". Both tables look like this with a size of 11697 rows × 15 columns and 385839 rows × 6 columns respectively. Customer ID might be repeating in second table. I want to concat both of the tables and want to merge similar columns using Customer ID. How can I do that with python PANDAS. One table looks like this -

,另一个看起来像这样-

and the other one looks like this -

我正在使用以下代码-

 pd.concat([df1, df2], sort=False)

只是要确保我不会丢失任何信息?如何检查一个ID是否有多个条目,如何将其合并为一个结果?

Just wanted to make sure that I am not losing any information ? How can I check if there are multiple entries with one ID and how can I combine it in one result ?

编辑-

当我使用上述代码时,这是数据集中NA'S值的前后-

When I am using above code, here is before and after values of NA'S in the dataset -

有人可以告诉我我错了吗?

Can someone tell, where I went wrong ?

我相信 DataFrame.merge 在这种情况下会起作用:

I believe that DataFrame.merge would work in this case:

# use how='outer' to preserve all information from both DataFrames
df1.merge(df2, how='outer', on='customer_id')

DataFrame.join 也可以工作这两个DataFrame的索引都设置为 customer_id (也更简单):

DataFrame.join could also work if both DataFrames had their indexes set to customer_id (it is also simpler):

df1 = df1.set_index('customer_id')
df2 = df2.set_index('customer_id')
df1.join(df2, how='outer')