pandas -从数据框创建差异矩阵

问题描述：

我正在尝试创建一个矩阵以显示Pandas数据框中各行之间的差异.

I'm trying to create a matrix to show the differences between the rows in a Pandas data frame.

import pandas as pd

data = {'Country':['GB','JP','US'],'Values':[20.2,-10.5,5.7]}
df = pd.DataFrame(data)

我想要这样:

  Country  Values
0      GB    20.2
1      JP   -10.5
2      US     5.7

要变成这样(差异垂直):

To become something like this (differences going vertically):

  Country     GB     JP     US
0      GB    0.0  -30.7   14.5
1      JP   30.7    0.0   16.2
2      US   14.5  -16.2    0.0

这是可以通过内置函数实现的吗?还是需要构建一个循环以获取所需的输出?感谢您的帮助！

Is this achievable with built-in function or would I need to build a loop to get the desired output? Thanks for your help!

答

这是numpy的我们使用values属性访问基础的numpy数组，并且[:, None]引入了一个新轴，因此结果是二维的.

We access the underlying numpy array with the values attribute and [:, None] introduces a new axis so the result is two dimensional.

您可以将其与原始系列配合使用

You can concat this with your original Series:

arr = df['Values'].values - df['Values'].values[:, None]
pd.concat((df['Country'], pd.DataFrame(arr, columns=df['Country'])), axis=1)
Out: 
  Country    GB    JP    US
0      GB   0.0 -30.7 -14.5
1      JP  30.7   0.0  16.2
2      US  14.5 -16.2   0.0

由于@Divakar，还可以使用以下命令生成数组:

The array can also be generated with the following, thanks to @Divakar:

arr = np.subtract.outer(*[df.Values]*2).T

在这里，我们正在呼叫在subtract ufunc上，并将其应用于所有输入对.

Here we are calling .outer on the subtract ufunc and it applies it to all pair of its inputs.

pandas -从数据框创建差异矩阵

相关推荐