如何使用 Tcl 正则表达式提取所有匹配项?
大家好,我想要这个正则表达式的解决方案,我的问题是以H'xxxx
形式提取所有十六进制数字,我使用了这个正则表达式,但我没有得到所有的十六进制值,只有我得到一个数字,如何从这个字符串中得到整个十六进制数
hi everybody i want solution for this regular expression, my problem is Extract all the hex numbers in the form H'xxxx
, i used this regexp but i didn't get all hexvalues only i get one number, how to get whole hex number from this string
set hex "V5CCH,IA=H'22EF&H'2354&H'4BD4&H'4C4B&H'4D52&H'4DC9"
set res [regexp -all {H'([0-9A-Z]+)&} $hex match hexValues]
puts "$res H$hexValues"
我得到的输出是 5 H4D52
i am getting output is 5 H4D52
On -all -inline
来自文档:
-all
:使正则表达式在字符串中尽可能多地匹配,返回找到的匹配总数.如果这是用匹配变量指定的,它们将只包含最后一次匹配的信息.
-all
: Causes the regular expression to be matched as many times as possible in the string, returning the total number of matches found. If this is specified with match variables, they will contain information for the last match only.
-inline
:使命令以列表形式返回原本会放置在匹配变量中的数据.使用-inline
时,可能不指定匹配变量.如果与 -all
一起使用,列表将在每次迭代时连接,这样总是返回一个平面列表.对于每次匹配迭代,该命令将附加整个匹配数据,并为正则表达式中的每个子表达式添加一个元素.
-inline
: Causes the command to return, as a list, the data that would otherwise be placed in match variables. When using -inline
, match variables may not be specified. If used with -all
, the list will be concatenated at each iteration, such that a flat list is always returned. For each match iteration, the command will append the overall match data, plus one element for each subexpression in the regular expression.
因此要将所有匹配项(包括按组捕获)作为 Tcl 中的平面列表返回,您可以编写:
Thus to return all matches --including captures by groups-- as a flat list in Tcl, you can write:
set matchTuples [regexp -all -inline $pattern $text]
如果模式有 0…N-1
组,那么每个匹配项都是列表中的一个 N
元组.因此实际匹配的数量是这个列表的长度除以N
.然后,您可以使用 foreach
和 N
变量来遍历列表的每个元组.
If the pattern has groups 0…N-1
, then each match is an N
-tuple in the list. Thus the number of actual matches is the length of this list divided by N
. You can then use foreach
with N
variables to iterate over each tuple of the list.
如果 N = 2
例如,你有:
set numMatches [expr {[llength $matchTuples] / 2}]
foreach {group0 group1} $matchTuples {
...
}
参考资料
- regular-expressions.info/Tcl
这是针对此特定问题的解决方案,将输出注释为注释(另见 ideone.com):
Here's a solution for this specific problem, annotated with output as comments (see also on ideone.com):
set text "V5CCH,IA=H'22EF&H'2354&H'4BD4&H'4C4B&H'4D52&H'4DC9"
set pattern {H'([0-9A-F]{4})}
set matchTuples [regexp -all -inline $pattern $text]
puts $matchTuples
# H'22EF 22EF H'2354 2354 H'4BD4 4BD4 H'4C4B 4C4B H'4D52 4D52 H'4DC9 4DC9
# \_________/ \_________/ \_________/ \_________/ \_________/ \_________/
# 1st match 2nd match 3rd match 4th match 5th match 6th match
puts [llength $matchTuples]
# 12
set numMatches [expr {[llength $matchTuples] / 2}]
puts $numMatches
# 6
foreach {whole hex} $matchTuples {
puts $hex
}
# 22EF
# 2354
# 4BD4
# 4C4B
# 4D52
# 4DC9
关于模式
请注意,我稍微改变了模式:
On the pattern
Note that I've changed the pattern slightly:
- 代替
[0-9A-Z]+
,例如[0-9A-F]{4}
更具体地用于精确匹配 4 个十六进制数字 - 如果你坚持要匹配
&
,那么最后一个十六进制字符串(你输入的H'4DC9
)就匹配不上了- 这解释了为什么您在原始脚本中得到
4D52
,因为这是与&
的最后一次匹配 - 也许去掉
&
,或者使用(&|$)
代替,即一个&
或结尾字符串$
.
- Instead of
[0-9A-Z]+
, e.g.[0-9A-F]{4}
is more specific for matching exactly 4 hexadecimal digits - If you insist on matching the
&
, then the last hex string (H'4DC9
in your input) can not be matched- This explains why you get
4D52
in the original script, because that's the last match with&
- Maybe get rid of the
&
, or use(&|$)
instead, i.e. a&
or the end of the string$
.
- This explains why you get
- 这解释了为什么您在原始脚本中得到