为什么scipy csr矩阵的行索引比numpy数组慢
我不确定自己在做什么错,但似乎索引scipy csr_matrix
的行比numpy数组慢大约2倍(请参见下面的代码).
I'm not sure what I am doing wrong but it seems that row indexing a scipy csr_matrix
is approximately 2 folds slower compared to numpy arrays (see code below).
csr矩阵的行索引是否应该比密集矩阵快,因为像下面的情况一样,只有很少的非零元素被提取出来?
Shouldn't row indexing of csr matrices be faster than dense matrices because only few non-zero elements are extracted like in the case below ?
是否有技巧使scipy csr矩阵的行索引更快?
Are there tricks to make row indexing faster for scipy csr matrices ?
import numpy as np
import timeit
from scipy.sparse import csr_matrix
# Generate random matrix
A = np.random.rand(5000, 1000)
# Make A sparse
A[:, 4:] =0
# Create sparse matrix
A_sparse = csr_matrix(A)
# Create row indexing functions
def index_A_dense():
A[4]
def index_A_dense_copy():
a = A[4].copy()
def index_A_sparse():
A_sparse[4]
timeit.timeit(index_A_sparse, number=10000)
Out: 0.6668063667304978
timeit.timeit(index_A_dense, number=10000)
Out: 0.0029007720424942818
timeit.timeit(index_A_dense_copy, number=10000)
Out: 0.00979283193282754
提前谢谢!
我在下面演示的简短答案是,构造新的稀疏矩阵非常昂贵.开销很大,不依赖于行数或特定行中非零元素的数量.
The short answer that I demonstrate below is that constructing a new sparse matrix is expensive. There's a significant overhead that is not dependent on the number of rows or the number on nonzero elements in a particular row.
稀疏矩阵的数据表示形式与密集数组的数据表示形式完全不同.数组将数据存储在一个连续的缓冲区中,并有效地使用shape
和strides
遍历选定的值.这些值加上索引定义了将在缓冲区中找到数据的确切位置.将这些N
个字节从一个位置复制到另一个字节是整个操作中相对较小的部分.
Data representation for sparse matrices is quite different from that for dense array. Arrays store the data in one one contiguous buffer, and efficiently use the shape
and strides
to iterate over selected values. Those values, plus the index, define exactly were in the buffer it will find the data. Copying those N
bytes from location to another is a relatively minor part of the whole operation.
稀疏矩阵将数据存储在几个包含索引和数据的数组(或其他结构)中.然后,选择一行需要查找相关索引,并使用选定的索引和数据构建一个新的稀疏矩阵.稀疏包中有已编译的代码,但是底层代码不如numpy数组那么多.
A sparse matrix stores the data in several arrays (or other structures), containing the indexes and the data. Selecting a row then requires looking up the relevant indices, and constructing a new sparse matrix with selected indices and data. There is compiled code in the sparse package, but not nearly as much low level code as with numpy arrays.
为了说明这一点,我将制作一个小的矩阵,而不是那么密集,所以我们没有很多空行:
To illustrate I'll make small matrix, and not so dense, so we don't have a lot of empty rows:
In [259]: A = (sparse.rand(5,5,.4,'csr')*20).floor()
In [260]: A
Out[260]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in Compressed Sparse Row format>
密集的等价物,以及一行副本:
The dense equivalent, and a row copy:
In [262]: Ad=A.A
In [263]: Ad
Out[263]:
array([[ 0., 0., 0., 0., 10.],
[ 0., 0., 0., 0., 0.],
[ 17., 16., 14., 19., 6.],
[ 0., 0., 1., 0., 0.],
[ 14., 0., 9., 0., 0.]])
In [264]: Ad[4,:]
Out[264]: array([ 14., 0., 9., 0., 0.])
In [265]: timeit Ad[4,:].copy()
100000 loops, best of 3: 4.58 µs per loop
矩阵行:
In [266]: A[4,:]
Out[266]:
<1x5 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
查看此csr
矩阵的3个1d数组的数据表示形式:
Look at the data representation for this csr
matrix, 3 1d arrays:
In [267]: A.data
Out[267]: array([ 0., 10., 17., 16., 14., 19., 6., 1., 14., 9.])
In [268]: A.indices
Out[268]: array([3, 4, 0, 1, 2, 3, 4, 2, 0, 2], dtype=int32)
In [269]: A.indptr
Out[269]: array([ 0, 2, 2, 7, 8, 10], dtype=int32)
以下是选择行的方式(但在已编译的代码中):
Here's how the row is selected (but in compiled code):
In [270]: A.indices[A.indptr[4]:A.indptr[5]]
Out[270]: array([0, 2], dtype=int32)
In [271]: A.data[A.indptr[4]:A.indptr[5]]
Out[271]: array([ 14., 9.])
行"是另一个稀疏矩阵,具有相同类型的数据数组:
The 'row' is another sparse matrix, with the same sort of data arrays:
In [272]: A[4,:].indptr
Out[272]: array([0, 2])
In [273]: A[4,:].indices
Out[273]: array([0, 2])
In [274]: timeit A[4,:]
是的,稀疏矩阵的时序很慢.我不知道实际选择数据要花费多少时间,以及构造新矩阵要花费多少时间.
Yes, timing for the sparse matrix is slow. I don't know how much of the time is spent in actually selecting the data, and how much is spent constructing the new matrix.
10000 loops, best of 3: 145 µs per loop
In [275]: timeit Ad[4,:].copy()
100000 loops, best of 3: 4.56 µs per loop
lil
格式可能更容易理解,因为数据和索引存储在子列表中,每行一个.
lil
format may easier to understand, since the data and indices are stored in sublists, one per row.
In [276]: Al=A.tolil()
In [277]: Al.data
Out[277]: array([[0.0, 10.0], [], [17.0, 16.0, 14.0, 19.0, 6.0], [1.0], [14.0, 9.0]], dtype=object)
In [278]: Al.rows
Out[278]: array([[3, 4], [], [0, 1, 2, 3, 4], [2], [0, 2]], dtype=object)
In [279]: Al[4,:].data
Out[279]: array([[14.0, 9.0]], dtype=object)
In [280]: Al[4,:].rows
Out[280]: array([[0, 2]], dtype=object)
这样的速度比较在处理紧密的编译代码时是有意义的,在这种情况下,将字节从内存的一部分移动到另一个内存是大量的时间消耗者.在numpy
和scipy
中混合使用Python和已编译的代码,您不能仅仅计算O(n)
操作.
Speed comparisons like this make some sense when dealing with tight compiled code, where movements for bytes from part of memory to another are significant time consumers. With the mix of Python and compiled code in numpy
and scipy
you can't just count O(n)
operations.
============================
=============================
这里是从A
中选择一行所需的时间,以及返回一个新的稀疏矩阵所需的时间:
Here's an estimate of time it takes to selected a row from A
, and time it takes to return a new sparse matrix:
只需获取数据:
In [292]: %%timeit
d1=A.data[A.indptr[4]:A.indptr[5]]
i1=A.indices[A.indptr[4]:A.indptr[5]]
.....:
100000 loops, best of 3: 4.92 µs per loop
加上制作矩阵所需的时间:
plus the time it takes to make a matrix:
In [293]: %%timeit
d1=A.data[A.indptr[4]:A.indptr[5]]
i1=A.indices[A.indptr[4]:A.indptr[5]]
sparse.csr_matrix((d1,([0,0],i1)),shape=(1,5))
.....:
1000 loops, best of 3: 445 µs per loop
尝试使用更简单的coo
矩阵
In [294]: %%timeit
d1=A.data[A.indptr[4]:A.indptr[5]]
i1=A.indices[A.indptr[4]:A.indptr[5]]
sparse.coo_matrix((d1,([0,0],i1)),shape=(1,5))
.....:
10000 loops, best of 3: 135 µs per loop