计算矩阵乘法的子集

计算矩阵乘法的子集

问题描述:

当我有两个非稀疏矩阵AB时,当我只想要C元素的子集时,是否可以有效地计算C=A.T.dot(B)?我有以CSC格式存储的所需的C索引,该格式指定为

When I have two non-sparse matrices A and B, is there a way to efficiently calculate C=A.T.dot(B) when I only want a subset of the elements of C? I have the desired indices of C stored in CSC format which is specified here.

您可以让numpy进行循环,而不是使用Python迭代坐标(GaryBishop的回答),从而大大提高了速度(如下所示):

Instead of iterating on the coordinates using Python (GaryBishop's answer), you can have numpy do the looping, which constitutes a substantial speed-up (timings below):

def sparse_mult(a, b, coords) :
    rows, cols = zip(*coords)
    rows, r_idx = np.unique(rows, return_inverse=True)
    cols, c_idx = np.unique(cols, return_inverse=True)
    C = np.dot(a[rows, :], b[:, cols])
    return C[r_idx, c_idx]

>>> A = np.arange(12).reshape(3, 4)
>>> B = np.arange(15).reshape(3, 5)
>>> np.dot(A.T, B)
array([[100, 112, 124, 136, 148],
       [115, 130, 145, 160, 175],
       [130, 148, 166, 184, 202],
       [145, 166, 187, 208, 229]])
>>> sparse_mult(A.T, B, [(0, 0), (1, 2), (2, 4), (3, 3)])
array([100, 145, 202, 208])

sparse_mult在您提供的元组列表中的坐标处返回值的展平数组.我对稀疏矩阵格式不是很熟悉,所以我不知道如何根据上述数据定义CSC,但是可以进行以下工作:

sparse_mult returns a flattened array of the values at the coordinates you provide as a list of tuples. I am not very familiar with sparse matrix formats, so I don't know how to define CSC from the above data, but the following works:

>>> coords = [(0, 0), (1, 2), (2, 4), (3, 3)]
>>> sparse.coo_matrix((sparse_mult(A.T, B, coords), zip(*coords))).tocsc()
<4x5 sparse matrix of type '<type 'numpy.int32'>'
    with 4 stored elements in Compressed Sparse Column format>

这是各种选择的时机:

>>> import timeit
>>> a = np.random.rand(2000, 3000)
>>> b = np.random.rand(3000, 5000)
>>> timeit.timeit('np.dot(a,b)[[0, 0, 1999, 1999], [0, 4999, 0, 4999]]', 'from __main__ import np, a, b', number=1)
5.848562187263569
>>> timeit.timeit('sparse_mult(a, b, [(0, 0), (0, 4999), (1999, 0), (1999, 4999)])', 'from __main__ import np, a, b, sparse_mult', number=1)
0.0018596387374678613
>>> np.dot(a,b)[[0, 0, 1999, 1999], [0, 4999, 0, 4999]]
array([ 758.76351111,  750.32613815,  751.4614542 ,  758.8989648 ])
>>> sparse_mult(a, b, [(0, 0), (0, 4999), (1999, 0), (1999, 4999)])
array([ 758.76351111,  750.32613815,  751.4614542 ,  758.8989648 ])