如何在 pandas 中展平具有相似索引的行?

问题描述：

我有一个类似

df = pd.DataFrame({'a':[np.array([5,6]),6,np.array([8,10]),7],'b':[np.array([7,8]),9,np.array([15,10]),7]},index=[0,0,1,1])


        a         b
0   [5, 6]    [7, 8]
0        6         9
1  [8, 10]  [15, 10]
1        7         7

当我尝试groupby

df.groupby(level=0).apply(lambda x: pd.Series(x.values.flatten()))


         0         1  2  3
0   [5, 6]    [7, 8]  6  9
1  [8, 10]  [15, 10]  7  7

因此，如何使用apply使得我最终将同一列下具有相似索引的单元格弄平.

So how to use apply in such a way that I end up flattening the cells with similar index under the same column.


       a         b
0   [5, 6,6]    [7, 8,9]
1  [8, 10,7]  [15, 10,7]

答

这是numpy.hstack的工作.但是，当这些值是多维值时，将groupby的输出放入数据帧总是有些棘手.将内容整理成系列通常可以正常工作:

This is a job for numpy.hstack. However, getting the output of a groupby into a dataframe is always a bit tricky when the values are multidimensional. Fitting things into a series usually works:

df.groupby(level=0).apply(lambda g: pd.Series({
    'a': np.hstack(g['a'].values), 
    'b': np.hstack(g['b'].values)
}))

当然，枚举字典会更好...

Of course, enumerating the dictionary would be nicer...

对于n列，dict理解会更好，即

For n columns a dict comprehension would be better i.e

df.groupby(level=0).apply(lambda g: pd.Series({i: np.hstack(g[i].values) for i in df.columns}))

如何在 pandas 中展平具有相似索引的行?

相关推荐