在行上迭代矩阵的推荐方法是什么?
鉴于矩阵m = [10i+j for i=1:3, j=1:4]
,我可以通过对矩阵进行切片来遍历其行:
Given a matrix m = [10i+j for i=1:3, j=1:4]
, I can iterate over its rows by slicing the matrix:
for i=1:size(m,1)
print(m[i,:])
end
这是唯一的可能性吗?是推荐的方法吗?
Is this the only possibility? Is it the recommended way?
那么理解呢?切片是唯一迭代矩阵行的可能性吗?
And what about comprehensions? Is slicing the only possibility to iterate over the rows of a matrix?
[ sum(m[i,:]) for i=1:size(m,1) ]
您列出的解决方案以及mapslices
都可以正常工作.但是,如果通过推荐"您真正的意思是高性能",那么最好的答案是:不要遍历行.
The solution you listed yourself, as well as mapslices
, both work fine. But if by "recommended" what you really mean is "high-performance", then the best answer is: don't iterate over rows.
问题在于,由于数组是按列优先顺序存储的,因此对于除小矩阵之外的任何其他内容,您最终都会得到不好的
The problem is that since arrays are stored in column-major order, for anything other than a small matrix you'll end up with a poor cache hit ratio if you traverse the array in row-major order.
如优秀博客文章所述,如果您想总结一下一排排,最好的选择是做这样的事情:
As pointed out in an excellent blog post, if you want to sum over rows, your best bet is to do something like this:
msum = zeros(eltype(m), size(m, 1))
for j = 1:size(m,2)
for i = 1:size(m,1)
msum[i] += m[i,j]
end
end
我们以本机存储顺序遍历m
和msum
,因此每次加载高速缓存行时,我们都会使用所有值,从而产生1的高速缓存命中率.您可能天真地认为遍历它更好按行优先顺序并将结果累加到tmp
变量,但是在任何现代计算机上,高速缓存未命中比msum[i]
查找要昂贵得多.
We traverse both m
and msum
in their native storage order, so each time we load a cache line we use all the values, yielding a cache hit ratio of 1. You might naively think it's better to traverse it in row-major order and accumulate the result to a tmp
variable, but on any modern machine the cache miss is much more expensive than the msum[i]
lookup.
Julia的许多采用region
参数的内部算法(例如sum(m, 2)
)都可以为您处理此问题.
Many of Julia's internal algorithms that take a region
parameter, like sum(m, 2)
, handle this for you.