ruby 正则表达式扫描与 =~
Ruby (1.9.3) 文档似乎暗示 scan 等同于 =~ 除了
The Ruby (1.9.3) documentation seems to imply that scan is equivalent to =~ except that
- scan 返回多个匹配项,而 =~ 仅返回第一个匹配项,并且
- scan 返回匹配数据,而 =~ 返回索引.
但是,在下面的示例中,对于相同的字符串和表达式,这两种方法似乎返回不同的结果.这是为什么?
However, in the following example, the two methods seem to return different results for the same string and expression. Why is that?
1.9.3p0 :002 > str = "Perl and Python - the two languages"
=> "Perl and Python - the two languages"
1.9.3p0 :008 > exp = /P(erl|ython)/
=> /P(erl|ython)/
1.9.3p0 :009 > str =~ exp
=> 0
1.9.3p0 :010 > str.scan exp
=> [["erl"], ["ython"]]
如果第一个匹配的索引为 0,扫描时不应该返回Perl"和Python"而不是erl"和python"?
If the index of first match is 0, shouldn't scan return "Perl" and "Python" instead of "erl" and "python"?
谢谢
当给定一个没有捕获组的正则表达式时,scan
将返回一个字符串数组,其中每个字符串代表一个正则匹配表达.如果您使用 scan(/P(?:erl|ython)/)
(与您的正则表达式相同,但没有捕获组),您将获得 ["Perl", "Python"]
,这正是您所期望的.
When given a regular expression without capturing groups, scan
will return an array of strings, where each string represents a match of the regular expression. If you use scan(/P(?:erl|ython)/)
(which is the same as your regex except without capturing groups), you'll get ["Perl", "Python"]
, which is what you expect.
然而,当给定一个带有捕获组的正则表达式时,scan
将返回一个数组数组,其中每个子数组包含给定匹配的捕获.因此,如果您有例如正则表达式 (\w*):(\w*)
,您将获得一个数组数组,其中每个子数组包含两个字符串:冒号之前的部分和冒号后面的部分.在您的示例中,每个子数组都包含一个字符串:由 (erl|ython)
匹配的部分.
However when given a regex with capturing groups, scan
will return an array of arrays, where each sub-array contains the captures of a given match. So if you have for example the regex (\w*):(\w*)
, you'll get an array of arrays where each sub-array contains two strings: the part before the colon and the part after the colon. And in your example each sub-array contains one string: the part matched by (erl|ython)
.