pandas -从数据框创建差异矩阵
我正在尝试创建一个矩阵以显示Pandas数据框中各行之间的差异.
I'm trying to create a matrix to show the differences between the rows in a Pandas data frame.
import pandas as pd
data = {'Country':['GB','JP','US'],'Values':[20.2,-10.5,5.7]}
df = pd.DataFrame(data)
我想要这样:
Country Values
0 GB 20.2
1 JP -10.5
2 US 5.7
要变成这样(差异垂直):
To become something like this (differences going vertically):
Country GB JP US
0 GB 0.0 -30.7 14.5
1 JP 30.7 0.0 16.2
2 US 14.5 -16.2 0.0
这是可以通过内置函数实现的吗?还是需要构建一个循环以获取所需的输出?感谢您的帮助!
Is this achievable with built-in function or would I need to build a loop to get the desired output? Thanks for your help!
这是numpy的我们使用values属性访问基础的numpy数组,并且[:, None]
引入了一个新轴,因此结果是二维的.
We access the underlying numpy array with the values attribute and [:, None]
introduces a new axis so the result is two dimensional.
您可以将其与原始系列配合使用
You can concat this with your original Series:
arr = df['Values'].values - df['Values'].values[:, None]
pd.concat((df['Country'], pd.DataFrame(arr, columns=df['Country'])), axis=1)
Out:
Country GB JP US
0 GB 0.0 -30.7 -14.5
1 JP 30.7 0.0 16.2
2 US 14.5 -16.2 0.0
由于@Divakar,还可以使用以下命令生成数组:
The array can also be generated with the following, thanks to @Divakar:
arr = np.subtract.outer(*[df.Values]*2).T
在这里,我们正在呼叫 subtract
ufunc上,并将其应用于所有输入对.
Here we are calling .outer
on the subtract
ufunc and it applies it to all pair of its inputs.