使用填充堆叠不同长度的Numpy数组
a = np.array([1,2,3])
b = np.array([4,5])
l = [a,b]
我想要一个函数stack_padding
这样:
assert(stack_padding(l) == np.array([[1,2,3],[4,5,0]])
是否有实现numpy的标准方法
Is there a standard way in numpy of achieving
l
可能还有更多的元素
如果您不想使用itertools
和column_stack
,numpy.ndarray.resize
也会做得很好.正如jtweeder所提到的,您只需要知道每行的结果大小即可.使用resize
的优点是numpy.ndarray
在内存中是连续的.当每一行的大小不同时,调整大小的速度更快.两种方法之间的性能差异是可以观察到的.
If you don't want to use itertools
and column_stack
, numpy.ndarray.resize
will also do the job perfectly. As mentioned by jtweeder, you just need to know to resulting size of each rows. The advantage to use resize
is that numpy.ndarray
is contiguous in memory. Resizing is faster when each row differs alot in size. The performance difference is observable between the two approaches.
import numpy as np
import timeit
import itertools
def stack_padding(it):
def resize(row, size):
new = np.array(row)
new.resize(size)
return new
# find longest row length
row_length = max(it, key=len).__len__()
mat = np.array( [resize(row, row_length) for row in it] )
return mat
def stack_padding1(l):
return np.column_stack((itertools.zip_longest(*l, fillvalue=0)))
if __name__ == "__main__":
n_rows = 200
row_lengths = np.random.randint(30, 50, size=n_rows)
mat = [np.random.randint(0, 100, size=s) for s in row_lengths]
def test_stack_padding():
global mat
stack_padding(mat)
def test_itertools():
global mat
stack_padding1(mat)
t1 = timeit.timeit(test_stack_padding, number=1000)
t2 = timeit.timeit(test_itertools, number=1000)
print('With ndarray.resize: ', t1)
print('With itertool and vstack: ', t2)
resize
方法在上述比较中胜出:
The resize
method wins in the above comparison:
>>> With ndarray.resize: 0.30080295499647036
>>> With itertool and vstack: 1.0151802329928614