使用填充堆叠不同长度的Numpy数组

问题描述：

a = np.array([1,2,3])
b = np.array([4,5])

l = [a,b]

我想要一个函数stack_padding这样:

assert(stack_padding(l) == np.array([[1,2,3],[4,5,0]])

是否有实现numpy的标准方法

Is there a standard way in numpy of achieving

l可能还有更多的元素

答

如果您不想使用itertools和column_stack，numpy.ndarray.resize也会做得很好.正如jtweeder所提到的，您只需要知道每行的结果大小即可.使用resize的优点是numpy.ndarray在内存中是连续的.当每一行的大小不同时，调整大小的速度更快.两种方法之间的性能差异是可以观察到的.

If you don't want to use itertools and column_stack, numpy.ndarray.resize will also do the job perfectly. As mentioned by jtweeder, you just need to know to resulting size of each rows. The advantage to use resize is that numpy.ndarray is contiguous in memory. Resizing is faster when each row differs alot in size. The performance difference is observable between the two approaches.

import numpy as np
import timeit
import itertools

def stack_padding(it):

    def resize(row, size):
        new = np.array(row)
        new.resize(size)
        return new

    # find longest row length
    row_length = max(it, key=len).__len__()
    mat = np.array( [resize(row, row_length) for row in it] )

    return mat

def stack_padding1(l):
    return np.column_stack((itertools.zip_longest(*l, fillvalue=0)))


if __name__ == "__main__":
    n_rows = 200
    row_lengths = np.random.randint(30, 50, size=n_rows)
    mat = [np.random.randint(0, 100, size=s) for s in row_lengths]

    def test_stack_padding():
        global mat
        stack_padding(mat)

    def test_itertools():
        global mat
        stack_padding1(mat)

    t1 = timeit.timeit(test_stack_padding, number=1000)
    t2 = timeit.timeit(test_itertools, number=1000)
    print('With ndarray.resize: ', t1)
    print('With itertool and vstack: ', t2)

resize方法在上述比较中胜出:

The resize method wins in the above comparison:

>>> With ndarray.resize:  0.30080295499647036
>>> With itertool and vstack:  1.0151802329928614

使用填充堆叠不同长度的Numpy数组

相关推荐