ruby 正则表达式扫描与 =~

问题描述：

Ruby (1.9.3) 文档似乎暗示 scan 等同于 =~ 除了

The Ruby (1.9.3) documentation seems to imply that scan is equivalent to =~ except that

scan 返回多个匹配项，而 =~ 仅返回第一个匹配项，并且
scan 返回匹配数据，而 =~ 返回索引.

但是，在下面的示例中，对于相同的字符串和表达式，这两种方法似乎返回不同的结果.这是为什么?

However, in the following example, the two methods seem to return different results for the same string and expression. Why is that?

1.9.3p0 :002 > str = "Perl and Python - the two languages"
 => "Perl and Python - the two languages" 
1.9.3p0 :008 > exp = /P(erl|ython)/
 => /P(erl|ython)/ 
1.9.3p0 :009 > str =~ exp
 => 0 
1.9.3p0 :010 > str.scan exp
 => [["erl"], ["ython"]]

如果第一个匹配的索引为 0，扫描时不应该返回Perl"和Python"而不是erl"和python"?

If the index of first match is 0, shouldn't scan return "Perl" and "Python" instead of "erl" and "python"?

谢谢

答

当给定一个没有捕获组的正则表达式时，scan 将返回一个字符串数组，其中每个字符串代表一个正则匹配表达.如果您使用 scan(/P(?:erl|ython)/)(与您的正则表达式相同，但没有捕获组)，您将获得 ["Perl", "Python"]，这正是您所期望的.

When given a regular expression without capturing groups, scan will return an array of strings, where each string represents a match of the regular expression. If you use scan(/P(?:erl|ython)/) (which is the same as your regex except without capturing groups), you'll get ["Perl", "Python"], which is what you expect.

然而，当给定一个带有捕获组的正则表达式时，scan 将返回一个数组数组，其中每个子数组包含给定匹配的捕获.因此，如果您有例如正则表达式 (\w*):(\w*)，您将获得一个数组数组，其中每个子数组包含两个字符串:冒号之前的部分和冒号后面的部分.在您的示例中，每个子数组都包含一个字符串:由 (erl|ython) 匹配的部分.

However when given a regex with capturing groups, scan will return an array of arrays, where each sub-array contains the captures of a given match. So if you have for example the regex (\w*):(\w*), you'll get an array of arrays where each sub-array contains two strings: the part before the colon and the part after the colon. And in your example each sub-array contains one string: the part matched by (erl|ython).

ruby 正则表达式扫描与 =~

相关推荐