在行上迭代矩阵的推荐方法是什么?

问题描述：

鉴于矩阵m = [10i+j for i=1:3, j=1:4]，我可以通过对矩阵进行切片来遍历其行:

Given a matrix m = [10i+j for i=1:3, j=1:4], I can iterate over its rows by slicing the matrix:

for i=1:size(m,1)
    print(m[i,:])
end

这是唯一的可能性吗?是推荐的方法吗?

Is this the only possibility? Is it the recommended way?

那么理解呢?切片是唯一迭代矩阵行的可能性吗?

And what about comprehensions? Is slicing the only possibility to iterate over the rows of a matrix?

[ sum(m[i,:]) for i=1:size(m,1) ]

答

您列出的解决方案以及mapslices都可以正常工作.但是，如果通过推荐"您真正的意思是高性能"，那么最好的答案是:不要遍历行.

The solution you listed yourself, as well as mapslices, both work fine. But if by "recommended" what you really mean is "high-performance", then the best answer is: don't iterate over rows.

问题在于，由于数组是按列优先顺序存储的，因此对于除小矩阵之外的任何其他内容，您最终都会得到不好的

The problem is that since arrays are stored in column-major order, for anything other than a small matrix you'll end up with a poor cache hit ratio if you traverse the array in row-major order.

如优秀博客文章所述，如果您想总结一下一排排，最好的选择是做这样的事情:

As pointed out in an excellent blog post, if you want to sum over rows, your best bet is to do something like this:

msum = zeros(eltype(m), size(m, 1))
for j = 1:size(m,2)
    for i = 1:size(m,1)
        msum[i] += m[i,j]
    end
end

我们以本机存储顺序遍历m和msum，因此每次加载高速缓存行时，我们都会使用所有值，从而产生1的高速缓存命中率.您可能天真地认为遍历它更好按行优先顺序并将结果累加到tmp变量，但是在任何现代计算机上，高速缓存未命中比msum[i]查找要昂贵得多.

We traverse both m and msum in their native storage order, so each time we load a cache line we use all the values, yielding a cache hit ratio of 1. You might naively think it's better to traverse it in row-major order and accumulate the result to a tmp variable, but on any modern machine the cache miss is much more expensive than the msum[i] lookup.

Julia的许多采用region参数的内部算法(例如sum(m, 2))都可以为您处理此问题.

Many of Julia's internal algorithms that take a region parameter, like sum(m, 2), handle this for you.

在行上迭代矩阵的推荐方法是什么?

相关推荐