在熊猫数据框中使用 NaN 条目折叠行
问题描述:
我有一个带有数据行的 Pandas DataFrame::
I have a pandas DataFrame with rows of data::
# objectID grade OS method
object_id_0001 AAA Mac organic
object_id_0001 AAA Mac NA
object_id_0001 AAA NA organic
object_id_0002 NA NA NA
object_id_0002 ABC Win NA
即同一个 objectID 通常有多个条目,但有时/通常这些条目有 NA.
i.e. there are often multiple entries for the same objectID but sometimes/often the entries have NAs.
因此,我只是在寻找一种可以结合 ObjectID 并报告非 NA 条目的方法,例如以上折叠为::
As such, I'm just looking for a way that would combine on ObjectID, and report the non-NA entries e.g. the above collapses down to::
object_id_0001 AAA Mac organic
object_id_0002 ABC Win NA
答
Quick and Dirty
这很有效,并且已经持续了很长时间.但是,有些人声称这是一个可以修复的错误.正如当前实现的那样,first
返回第一个非空元素(如果每列都存在).
Quick and Dirty
This works and has for a long time. However, some claim that this is a bug that may be fixed. As it is currently implemented, first
returns the first non-null element if it exists per column.
df.groupby('objectID', as_index=False).first()
objectID grade OS method
0 object_id_0001 AAA Mac organic
1 object_id_0002 ABC Win NaN
pd.concat
pd.concat([
pd.DataFrame([d.lookup(d.notna().idxmax(), d.columns)], columns=d.columns)
for _, d in df.groupby('objectID')
], ignore_index=True)
objectID grade OS method
0 object_id_0001 AAA Mac organic
1 object_id_0002 ABC Win NaN
堆栈
df.set_index('objectID').stack().groupby(level=[0, 1]).head(1).unstack()
grade OS method
objectID
object_id_0001 AAA Mac organic
object_id_0002 ABC Win None
如果碰巧这些是字符串 ('NA'
)
df.mask(df.astype(str).eq('NA')).groupby('objectID', as_index=False).first()