如何在 Pandas 中用多个唯一字符串替换重复值?
import pandas as pd
import numpy as np
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry'], 'Age':[20, 21, 19, 18]}
df = pd.DataFrame(data)
假设我有一个看起来像这样的数据框.我想弄清楚如何检查 Name 列的值Tom",如果我第一次找到它,我用值FirstTom"替换它,第二次出现时我用值SecondTom"替换它.你如何做到这一点?我之前使用过 replace 方法,但仅用于用单个值替换所有 Toms.我不想在值的末尾添加 1,而是将字符串完全更改为其他内容.
Lets say I have a dataframe that looks like this. I am trying to figure out how to check the Name column for the value 'Tom' and if I find it the first time I replace it with the value 'FirstTom' and the second time it appears I replace it with the value 'SecondTom'. How do you accomplish this? I've used the replace method before but only for replacing all Toms with a single value. I don't want to add a 1 on the end of the value, but completely change the string to something else.
如果df看起来更像下面这样,我们将如何检查第一列和第二列中的Tom,然后用FirstTom替换第一个实例,用SecondTom替换第二个实例
If the df looked more like this below, how would we check for Tom in the first column and the second column and then replace the first instance with FirstTom and the second instance with SecondTom
data = {'Name':['Tom', 'Jerry', 'Jack', 'Terry'], 'OtherName':[Tom, John, Bob, Steve]}
只需添加到现有解决方案中,即可使用 inflect
创建动态字典
Just adding in to the existing solutions , you can use inflect
to create dynamic dictionary
import inflect
p = inflect.engine()
df['Name'] += df.groupby('Name').cumcount().add(1).map(p.ordinal).radd('_')
print(df)
Name Age
0 Tom_1st 20
1 Tom_2nd 21
2 Jack_1st 19
3 Terry_1st 18