如何在Python Pandas DataFrame中的特定行中更新值?
有了Pandas中不错的索引方法,我可以用各种方式提取数据没有问题.另一方面,我仍然对如何更改现有DataFrame中的数据感到困惑.
With the nice indexing methods in Pandas I have no problems extracting data in various ways. On the other hand I am still confused about how to change data in an existing DataFrame.
在下面的代码中,我有两个DataFrames,我的目标是从第二个df的值更新第一个df中特定行的值.我该如何实现?
In the following code I have two DataFrames and my goal is to update values in a specific row in the first df from values of the second df. How can I achieve this?
import pandas as pd
df = pd.DataFrame({'filename' : ['test0.dat', 'test2.dat'],
'm': [12, 13], 'n' : [None, None]})
df2 = pd.DataFrame({'filename' : 'test2.dat', 'n':16}, index=[0])
# this overwrites the first row but we want to update the second
# df.update(df2)
# this does not update anything
df.loc[df.filename == 'test2.dat'].update(df2)
print(df)
给予
filename m n
0 test0.dat 12 None
1 test2.dat 13 None
[2 rows x 3 columns]
但是我怎么能做到这一点:
but how can I achieve this:
filename m n
0 test0.dat 12 None
1 test2.dat 13 16
[2 rows x 3 columns]
因此,首先,熊猫使用索引进行更新.当更新命令不更新任何内容时,请同时检查左侧和右侧.如果由于某种原因您懒于更新索引以遵循标识逻辑,则可以按照
So first of all, pandas updates using the index. When an update command does not update anything, check both left-hand side and right-hand side. If for some reason you are too lazy to update the indices to follow your identification logic, you can do something along the lines of
>>> df.loc[df.filename == 'test2.dat', 'n'] = df2[df2.filename == 'test2.dat'].loc[0]['n']
>>> df
Out[331]:
filename m n
0 test0.dat 12 None
1 test2.dat 13 16
如果要对整个表执行此操作,建议使用一种我认为优于上述方法的方法:由于您的标识符为filename
,因此将filename
设置为索引,然后使用update()
如您所愿. merge
和apply()
方法都包含不必要的开销:
If you want to do this for the whole table, I suggest a method I believe is superior to the previously mentioned ones: since your identifier is filename
, set filename
as your index, and then use update()
as you wanted to. Both merge
and the apply()
approach contain unnecessary overhead:
>>> df.set_index('filename', inplace=True)
>>> df2.set_index('filename', inplace=True)
>>> df.update(df2)
>>> df
Out[292]:
m n
filename
test0.dat 12 None
test2.dat 13 16