根据索引使Pandas Dataframe列等于另一个Dataframe中的值

问题描述:

我有以下3个数据框

df1

id     first_name      surname     state
1                  
88     
190
2509  
....  

df2

id     given_name      surname     state    street_num
17     John            Doe         NY       5
88     Tom             Murphy      CA       423
190    Dave            Casey       KY       250
....

df3

id     first_name      family_name state    car
1      John            Woods       NY       ford
74     Tom             Kite        FL       vw
2509   Mike            Johnson     KY       toyota

df1中的某些ID在df2中,而其他在df3中。 df2和df3中也有ID,而df1中没有。

Some id's from df1 are in df2 and others are in df3. There are also id's in df2 and df3 that are not in df1.

编辑::df1中也有一些ID不在df2或df3中。

there are also some id's in df1 that re not in either df2 or df3.

我想用包含id的数据框中的值填充df1中的列。但是,我不想所有的列(所以我认为合并是不合适的)。我尝试使用isin函数,但是那样我无法单独更新记录并出现错误。这是我使用isin的尝试:

I want to fill the columns in df1 with the values in the dataframe containing the id. However, I do not want all columns (so i think merge is not suitable). I have tried to use the isin function but that way I could not update records individually and got an error. This was my attempt using isin:

df1.loc[df1.index.isin(df2.index), 'first_name'] = df2.given_name

是否有一种简便的方法,而无需遍历数据帧来检查索引是否匹配?

Is there an easy way to do this without iterating through the dataframes checking if index matches?

我认为您首先需要重命名您的列才能对齐DataFrame noreferrer> concat ,然后 reindex 过滤以 df1.index df1.columns

I think you first need to rename your columns to align the DataFrames in concat and then reindex to filter by df1.index and df1.columns:

df21 = df2.rename(columns={'given_name':'first_name'})
df31 = df3.rename(columns={'family_name':'surname'})
df = pd.concat([df21, df31]).reindex(index=df1.index, columns=df1.columns)
print (df)
     first_name  surname state
d                             
1          John    Woods    NY
88          Tom   Murphy    CA
190        Dave    Casey    KY
2509       Mike  Johnson    KY

编辑:如果需要 交点 仅索引:

If need intersection of indices only:

df4 = pd.concat([df21, df31])
df = df4.reindex(index=df1.index.intersection(df4.index), columns=df1.columns)