将具有坐标的1D数组转换为numpy中的2D数组
我有一个形状为(N,)的值 arr
数组和一个形状为 coords
的坐标数组(N,2)。我想用(M,M)数组 grid
表示它,使得 grid
在坐标为不在坐标
中,并且对于包含的坐标,它应将所有值的总和存储在 arr
中那个坐标。因此,如果M = 3,则 arr = np.arange(4)+1
,而 coords = np.array([[0,0,1 ,2],[0,0,2,2]])
,然后 grid
应该是:
I have an array of values arr
with shape (N,) and an array of coordinates coords
with shape (N,2). I want to represent this in an (M,M) array grid
such that grid
takes the value 0 at coordinates that are not in coords
, and for the coordinates that are included it should store the sum of all values in arr
that have that coordinate. So if M=3, arr = np.arange(4)+1
, and coords = np.array([[0,0,1,2],[0,0,2,2]])
then grid
should be:
array([[3., 0., 0.],
[0., 0., 3.],
[0., 0., 4.]])
这很重要的原因是我需要重复此步骤很多次, arr
中的值每次都会更改,因此坐标也会更改。理想情况下,我正在寻找矢量化解决方案。我怀疑我可能能够以某种方式使用 np.where
,但如何使用尚不是很明显。
The reason this is nontrivial is that I need to be able to repeat this step many times and the values in arr
change each time, and so can the coordinates. Ideally I am looking for a vectorized solution. I suspect that I might be able to use np.where
somehow but it's not immediately obvious how.
对解决方案进行计时
我已经对当前出现的解决方案进行了计时,看来累加器方法比稀疏矩阵方法要快一些,第二种累积方法是最慢的,其原因在注释中解释:
I have timed the solutions present at this time and it appear that the accumulator method is slightly faster than the sparse matrix method, with the second accumulation method being the slowest for the reasons explained in the comments:
%timeit for x in range(100): accumulate_arr(np.random.randint(100,size=(2,10000)),np.random.normal(0,1,10000))
%timeit for x in range(100): accumulate_arr_v2(np.random.randint(100,size=(2,10000)),np.random.normal(0,1,10000))
%timeit for x in range(100): sparse.coo_matrix((np.random.normal(0,1,10000),np.random.randint(100,size=(2,10000))),(100,100)).A
47.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
103 ms ± 255 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
48.2 ms ± 36 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
使用 np.bincount
-
def accumulate_arr(coords, arr):
# Get output array shape
m,n = coords.max(1)+1
# Get linear indices to be used as IDs with bincount
lidx = np.ravel_multi_index(coords, (m,n))
# Or lidx = coords[0]*(coords[1].max()+1) + coords[1]
# Accumulate arr with IDs from lidx
return np.bincount(lidx,arr,minlength=m*n).reshape(m,n)
样本运行-
In [58]: arr
Out[58]: array([1, 2, 3, 4])
In [59]: coords
Out[59]:
array([[0, 0, 1, 2],
[0, 0, 2, 2]])
In [60]: accumulate_arr(coords, arr)
Out[60]:
array([[3., 0., 0.],
[0., 0., 3.],
[0., 0., 4.]])
另一个带有 np.add的对象.at
在相似的行上,可能更容易理解-
Another with np.add.at
on similar lines and might be easier to follow -
def accumulate_arr_v2(coords, arr):
m,n = coords.max(1)+1
out = np.zeros((m,n), dtype=arr.dtype)
np.add.at(out, tuple(coords), arr)
return out