用Python中的二维数组总结ndarray
我想使用包含在 2d 数组 idx
中的索引来总结一个 3d 数组 dat
.
I want to summarize a 3d array dat
using indices contained in a 2d array idx
.
考虑下面的例子.对于沿 dat[:, :, i]
的每个边距,我想根据某个索引 idx
计算中位数.所需的输出 (out
) 是一个二维数组,其行记录索引,列记录边距.以下代码有效,但效率不高.有什么建议吗?
Consider the example below. For each margin along dat[:, :, i]
, I want to compute the median according to some index idx
. The desired output (out
) is a 2d array, whose rows record the index and columns record the margin. The following code works but is not very efficient. Any suggestions?
import numpy as np
dat = np.arange(12).reshape(2, 2, 3)
idx = np.array([[0, 0], [1, 2]])
out = np.empty((3, 3))
for i in np.unique(idx):
out[i,] = np.median(dat[idx==i], axis = 0)
print(out)
输出:
[[ 1.5 2.5 3.5]
[ 6. 7. 8. ]
[ 9. 10. 11. ]]
为了更好地可视化问题,我将数组的 2x2 维称为行和列,将 3 维称为深度.我将沿第 3 维的向量称为像素"(像素的长度为 3),沿前两个维度的平面称为通道".
To visualize the problem better, I will refer to the 2x2 dimensions of the array as the rows and columns, and the 3 dimension as depth. I will refer to vectors along the 3rd dimension as "pixels" (pixels have length 3), and planes along the first two dimensions as "channels".
您的循环正在累积由掩码 idx == i
选择的一组像素,并取该组中每个通道的中值.结果是一个 Nx3 数组,其中 N 是您拥有的不同事件的数量.
Your loop is accumulating a set of pixels selected by the mask idx == i
, and taking the median of each channel within that set. The result is an Nx3 array, where N is the number of distinct incides that you have.
有一天,广义ufuncs将在 numpy 中无处不在,并且 np.median
就是这样一个函数.那天,你将可以使用reduceat
magic1 做类似的事情
One day, generalized ufuncs will be ubiquitous in numpy, and np.median
will be such a function. On that day, you will be able to use reduceat
magic1 to do something like
unq, ind = np.unique(idx, return_inverse=True)
np.median.reduceat(dat.reshape(-1, dat.shape[-1]), np.r_[0, np.where(np.diff(unq[ind]))[0]+1])
1请参阅将操作应用于 numpy 数组的不均匀分割部分以了解更多信息有关特定魔法类型的信息.
1 See Applying operation to unevenly split portions of numpy array for more info on the specific type of magic.
由于目前无法实现,您可以使用 scipy.ndimage.median
代替.此版本允许您计算数组中一组标记区域的中位数,这正是您使用 idx
所拥有的.此方法假定您的索引数组包含 N 个密集打包的值,所有这些值都在 range(N)
内.否则整形操作将无法正常进行.
Since this is not currently possible, you can use scipy.ndimage.median
instead. This version allows you to compute medians over a set of labeled areas in an array, which is exactly what you have with idx
. This method assumes that your index array contains N densely packed values, all of which are in range(N)
. Otherwise the reshaping operations will not work properly.
如果不是这种情况,请从转换 idx
开始:
If that is not the case, start by transforming idx
:
_, ind = np.unique(idx, return_inverse=True)
idx = ind.reshape(idx.shape)
或
idx = np.unique(idx, return_inverse=True)[1].reshape(idx.shape)
由于您实际上是为每个区域和通道计算单独的中值,因此您需要为每个通道设置一组标签.充实 idx
以便为每个通道设置一组不同的索引:
Since you are actually computing a separate median for each region and channel, you will need to have a set of labels for each channel. Flesh out idx
to have a distinct set of indices for each channel:
chan = dat.shape[-1]
offset = idx.max() + 1
index = np.stack([idx + i * offset for i in range(chan)], axis=-1)
现在 index
在每个通道中定义了一组相同的区域,您可以在 scipy.ndimage.median
中使用:
Now index
has an identical set of regions defined in each channel, which you can use in scipy.ndimage.median
:
out = scipy.ndimage.median(dat, index, index=range(offset * chan)).reshape(chan, offset).T
输入标签必须从零到 offset * chan
密集打包,index=range(offset * chan)
才能正常工作,并且 reshape
操作以获得正确数量的元素.最后的转置只是标签排列方式的产物.
The input labels must be densely packed from zero to offset * chan
for index=range(offset * chan)
to work properly, and the reshape
operation to have the right number of elements. The final transpose is just an artifact of how the labels are arranged.
这是完整的产品,以及结果的 IDEOne 演示:
Here is the complete product, along with an IDEOne demo of the result:
import numpy as np
from scipy.ndimage import median
dat = np.arange(12).reshape(2, 2, 3)
idx = np.array([[0, 0], [1, 2]])
def summarize(dat, idx):
idx = np.unique(idx, return_inverse=True)[1].reshape(idx.shape)
chan = dat.shape[-1]
offset = idx.max() + 1
index = np.stack([idx + i * offset for i in range(chan)], axis=-1)
return median(dat, index, index=range(offset * chan)).reshape(chan, offset).T
print(summarize(dat, idx))