如何有效计算字符串的单元格数组的字符串长度
我在Matlab中有一个单元格数组:
I have a cell array in Matlab:
strings = {'one', 'two', 'three'};
如何有效地计算所有三个字符串的长度?现在,我使用一个for循环:
How can I efficiently calculate the length of all three strings? Right now I use a for loop:
lengths = zeros(3,1);
for i = 1:3
lengths(i) = length(strings{i});
end
但是,当您有大量的字符串(我有480,863个)时,这是无法使用的缓慢.有什么建议吗?
This is however unusable slow when you have a large amount of strings (I've got 480,863 of them). Any suggestions?
您还可以使用:
cellfun(@length, strings)
它不会更快,但是会使代码更清晰.
关于速度,您应该首先运行探查器以检查瓶颈在哪里.只有这样,您才能进行优化.
It will not be faster, but makes the code clearer.
Regarding the slowness, you should first run the profiler to check where the bottleneck is. Only then should you optimize.
编辑:我只是想起了'length'曾经是旧版Matlab中cellfun中的内置函数.因此实际上可能更快!试试
Edit: I just recalled that 'length' used to be a built-in function in cellfun in older Matlab versions. So it might actually be faster! Try
cellfun('length',strings)
编辑(2):我必须承认我的第一个答案是一个疯狂的猜测.在@Rodin发表评论后,我决定检查一下加速比.
Edit(2) : I have to admit that my first answer was a wild guess. Following @Rodin s comment, I decided to check out the speedup.
以下是基准代码:
首先,生成大量字符串并保存到磁盘的代码:
First, the code that generates a lot of strings and saves to disk:
function GenerateCellStrings()
strs = cell(1,10000);
for i=1:10000
strs{i} = GenerateRandomString();
end
save strs;
end
function st = GenerateRandomString()
MAX_STR_LENGTH = 1000;
n = randi(MAX_STR_LENGTH);
st = char(randi([97 122], 1,n ));
end
然后,基准本身:
function CheckRunTime()
load strs;
tic;
disp('Loop:');
for i=1:numel(strs)
n = length(strs{i});
end
toc;
disp('cellfun (String):');
tic;
cellfun('length',strs);
toc;
disp('cellfun (function handle):');
tic;
cellfun(@length,strs);
toc;
end
结果是:
循环:
经过的时间为 0.010663 秒.
cellfun(字符串):
经过的时间为 0.000313 秒.
cellfun(功能句柄):
经过的时间为 0.006280 秒.
Loop:
Elapsed time is 0.010663 seconds.
cellfun (String):
Elapsed time is 0.000313 seconds.
cellfun (function handle):
Elapsed time is 0.006280 seconds.
哇! 'length'语法比循环快30倍!我只能猜测为什么它会变得如此之快.也许事实是它专门识别length
.可能是JIT优化.
Wow!! The 'length' syntax is about 30 times faster than a loop! I can only guess why it becomes so fast. Maybe the fact that it recognizes length
specifically. Might be JIT optimization.
编辑(3)-我发现了原因以提高速度.实际上,确实是length
的识别.感谢@reve_etrange提供的信息.
Edit(3) - I found out the reason for the speedup. It is indeed recognition of length
specifically. Thanks to @reve_etrange for the info.