如何在 pandas 中展平具有相似索引的行?
问题描述:
我有一个类似
df = pd.DataFrame({'a':[np.array([5,6]),6,np.array([8,10]),7],'b':[np.array([7,8]),9,np.array([15,10]),7]},index=[0,0,1,1])
a b
0 [5, 6] [7, 8]
0 6 9
1 [8, 10] [15, 10]
1 7 7
当我尝试groupby
df.groupby(level=0).apply(lambda x: pd.Series(x.values.flatten()))
0 1 2 3
0 [5, 6] [7, 8] 6 9
1 [8, 10] [15, 10] 7 7
因此,如何使用apply使得我最终将同一列下具有相似索引的单元格弄平.
So how to use apply in such a way that I end up flattening the cells with similar index under the same column.
a b
0 [5, 6,6] [7, 8,9]
1 [8, 10,7] [15, 10,7]
答
这是numpy.hstack
的工作.但是,当这些值是多维值时,将groupby的输出放入数据帧总是有些棘手.将内容整理成系列通常可以正常工作:
This is a job for numpy.hstack
. However, getting the output of a groupby into a dataframe is always a bit tricky when the values are multidimensional. Fitting things into a series usually works:
df.groupby(level=0).apply(lambda g: pd.Series({
'a': np.hstack(g['a'].values),
'b': np.hstack(g['b'].values)
}))
当然,枚举字典会更好...
Of course, enumerating the dictionary would be nicer...
对于n
列,dict理解会更好,即
For n
columns a dict comprehension would be better i.e
df.groupby(level=0).apply(lambda g: pd.Series({i: np.hstack(g[i].values) for i in df.columns}))