将具有坐标的1D数组转换为numpy中的2D数组

问题描述:

我有一个形状为(N,)的值 arr 数组和一个形状为 coords 的坐标数组(N,2)。我想用(M,M)数组 grid 表示它,使得 grid 在坐标为不在坐标中,并且对于包含的坐标,它应将所有值的总和存储在 arr 中那个坐标。因此,如果M = 3,则 arr = np.arange(4)+1 ,而 coords = np.array([[0,0,1 ,2],[0,0,2,2]]),然后 grid 应该是:

I have an array of values arr with shape (N,) and an array of coordinates coords with shape (N,2). I want to represent this in an (M,M) array grid such that grid takes the value 0 at coordinates that are not in coords, and for the coordinates that are included it should store the sum of all values in arr that have that coordinate. So if M=3, arr = np.arange(4)+1, and coords = np.array([[0,0,1,2],[0,0,2,2]]) then grid should be:

array([[3., 0., 0.],
       [0., 0., 3.],
       [0., 0., 4.]])

这很重要的原因是我需要重复此步骤很多次, arr 中的值每次都会更改,因此坐标也会更改。理想情况下,我正在寻找矢量化解决方案。我怀疑我可能能够以某种方式使用 np.where ,但如何使用尚不是很明显。

The reason this is nontrivial is that I need to be able to repeat this step many times and the values in arr change each time, and so can the coordinates. Ideally I am looking for a vectorized solution. I suspect that I might be able to use np.where somehow but it's not immediately obvious how.

对解决方案进行计时

我已经对当前出现的解决方案进行了计时,看来累加器方法比稀疏矩阵方法要快一些,第二种累积方法是最慢的,其原因在注释中解释:

I have timed the solutions present at this time and it appear that the accumulator method is slightly faster than the sparse matrix method, with the second accumulation method being the slowest for the reasons explained in the comments:

%timeit for x in range(100): accumulate_arr(np.random.randint(100,size=(2,10000)),np.random.normal(0,1,10000))
%timeit for x in range(100): accumulate_arr_v2(np.random.randint(100,size=(2,10000)),np.random.normal(0,1,10000))
%timeit for x in range(100): sparse.coo_matrix((np.random.normal(0,1,10000),np.random.randint(100,size=(2,10000))),(100,100)).A
47.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
103 ms ± 255 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
48.2 ms ± 36 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


使用 np.bincount -

def accumulate_arr(coords, arr):
    # Get output array shape
    m,n = coords.max(1)+1

    # Get linear indices to be used as IDs with bincount
    lidx = np.ravel_multi_index(coords, (m,n))
    # Or lidx = coords[0]*(coords[1].max()+1) + coords[1]

    # Accumulate arr with IDs from lidx
    return np.bincount(lidx,arr,minlength=m*n).reshape(m,n)

样本运行-

In [58]: arr
Out[58]: array([1, 2, 3, 4])

In [59]: coords
Out[59]: 
array([[0, 0, 1, 2],
       [0, 0, 2, 2]])

In [60]: accumulate_arr(coords, arr)
Out[60]: 
array([[3., 0., 0.],
       [0., 0., 3.],
       [0., 0., 4.]])

另一个带有 np.add的对象.at 在相似的行上,可能更容易理解-

Another with np.add.at on similar lines and might be easier to follow -

def accumulate_arr_v2(coords, arr):
    m,n = coords.max(1)+1
    out = np.zeros((m,n), dtype=arr.dtype)
    np.add.at(out, tuple(coords), arr)
    return out