在 pandas 中合并两个数据框时如何“模糊"匹配字符串



df1 = pd.DataFrame ({'Name': ['Adam Smith', 'Anne Kim', 'John Weber', 'Ian Ford'],
                     'Age': [43, 21, 55, 24]})
df2 = pd.DataFrame ({'Name': ['adam Smith', 'Annie Kim', 'John  Weber', 'Ian Ford'],
                     'gender': ['M', 'F', 'M', 'M']})


I need to join these two dataframe with pandas.merge on the column Name. However, as you notice, there are some slight difference between column Name from the two dataframe. Let's assume they are the same person. If I simply do:

pd.merge(df1, df2, how='inner', on='Name')


I only got a dataframe back with only one row, which is 'Ian Ford'.


Does anyone know how to merge these two dataframe ? I guess this is pretty common situation if we join two tables on a string column. I have absolutely no idea how to handle this. Thanks a lot in advance.


from fuzzywuzzy import fuzz
from fuzzywuzzy import process

df2['key']=df2.Name.apply(lambda x : [process.extract(x, df1.Name, limit=1)][0][0][0])

        Name_x gender         key  Age      Name_y
0   adam Smith      M  Adam Smith   43  Adam Smith
1    Annie Kim      F    Anne Kim   21    Anne Kim
2  John  Weber      M  John Weber   55  John Weber
3     Ian Ford      M    Ian Ford   24    Ian Ford