如何有效计算字符串的单元格数组的字符串长度

问题描述：

我在Matlab中有一个单元格数组:

I have a cell array in Matlab:

strings = {'one', 'two', 'three'};

如何有效地计算所有三个字符串的长度?现在，我使用一个for循环:

How can I efficiently calculate the length of all three strings? Right now I use a for loop:

lengths = zeros(3,1);
for i = 1:3
    lengths(i) = length(strings{i});
end

但是，当您有大量的字符串(我有480,863个)时，这是无法使用的缓慢.有什么建议吗?

This is however unusable slow when you have a large amount of strings (I've got 480,863 of them). Any suggestions?

答

您还可以使用:

cellfun(@length, strings)

它不会更快，但是会使代码更清晰.
关于速度，您应该首先运行探查器以检查瓶颈在哪里.只有这样，您才能进行优化.

It will not be faster, but makes the code clearer.
Regarding the slowness, you should first run the profiler to check where the bottleneck is. Only then should you optimize.

编辑:我只是想起了'length'曾经是旧版Matlab中cellfun中的内置函数.因此实际上可能更快！试试

Edit: I just recalled that 'length' used to be a built-in function in cellfun in older Matlab versions. So it might actually be faster! Try

 cellfun('length',strings)

编辑(2):我必须承认我的第一个答案是一个疯狂的猜测.在@Rodin发表评论后，我决定检查一下加速比.

Edit(2) : I have to admit that my first answer was a wild guess. Following @Rodin s comment, I decided to check out the speedup.

以下是基准代码:

首先，生成大量字符串并保存到磁盘的代码:

First, the code that generates a lot of strings and saves to disk:

function GenerateCellStrings()
    strs = cell(1,10000);
    for i=1:10000
        strs{i} = GenerateRandomString();
    end
    save strs;
end

function st = GenerateRandomString()
    MAX_STR_LENGTH = 1000;
    n = randi(MAX_STR_LENGTH);
    st = char(randi([97 122], 1,n ));

end

然后，基准本身:

 function CheckRunTime()
    load strs;
    tic;
    disp('Loop:');
    for i=1:numel(strs)
        n = length(strs{i});
    end
    toc;

    disp('cellfun (String):');
    tic;
    cellfun('length',strs);
    toc;

    disp('cellfun (function handle):');
    tic;
    cellfun(@length,strs);
    toc;

end

结果是:

循环:
经过的时间为 0.010663 秒.
cellfun(字符串):
经过的时间为 0.000313 秒.
cellfun(功能句柄):
经过的时间为 0.006280 秒.

Loop:
Elapsed time is 0.010663 seconds.
cellfun (String):
Elapsed time is 0.000313 seconds.
cellfun (function handle):
Elapsed time is 0.006280 seconds.

哇！ 'length'语法比循环快30倍！我只能猜测为什么它会变得如此之快.也许事实是它专门识别length.可能是JIT优化.

Wow!! The 'length' syntax is about 30 times faster than a loop! I can only guess why it becomes so fast. Maybe the fact that it recognizes length specifically. Might be JIT optimization.

编辑(3)-我发现了原因以提高速度.实际上，确实是length的识别.感谢@reve_etrange提供的信息.

Edit(3) - I found out the reason for the speedup. It is indeed recognition of length specifically. Thanks to @reve_etrange for the info.

如何有效计算字符串的单元格数组的字符串长度

相关推荐