matlab正则表达式:单词以空格"\< \ s.* \ s \>"开头和结尾

问题描述：

在matlab中，通过使用'\< \ s.* \ s \>'

In matlab, to find words starting and ending both with space by using '\<\s.*\s\>'

命令:

str = 'A body or collection of such stories s@@5%%suchstro end';

regexp(str, '\<\s.*\s\>', 'match')

结果不返回任何内容.

但是，八度中的相同命令会返回:'正文或此类故事的集合s @@ 5 %% suchstro'

However, same commands in Octave, returns: ' body or collection of such stories s@@5%%suchstro '

'\<\s.*?\s\>'也可以在Octave中使用，但不能在matlab中使用.

'\<\s.*?\s\>' also works in Octave, but not in matlab.

有什么想法吗?谢谢.

答

\<\s.*?\s\>读取为:单词开头，空格，任何内容，空格，单词结尾.但是单词不能以空格开头，因此该模式不匹配任何内容.

\<\s.*?\s\> reads as: beginning of word, whitespace, anything, whitespace, end of word. But a word cannot begin with whitespace, so this pattern does not match anything.

模式\s\<.*?\>\s返回

` body or collection of such stories s@@5%%suchstro `

这可能不是您想要的.这不是单词的集合，而是所有单词的集合，因为匹配是贪婪的.变得懒惰:

which is probably not what you wanted. This is not a collection of words, but everything together, because the match is greedy. Make it lazy:

regexp(str, '\s\<?.*?\>\s', 'match')

' body '    ' collection '    ' such '    ' s@@5%%suchstro '

而且，您不想捕获这些空间，是吗?对他们使用前瞻性和后向性:

Also, you don't want to capture those spaces, do you? Use lookahead and lookbehind for them:

regexp(str, '(?<=\s)\<?.*?\>(?=\s)', 'match')

'body'    'or'    'collection'    'of'    'such'    'stories'    's@@5%%suchstro'

最后... s @@ 5 %% suchstro可能不是一个字，是吗?也许您需要\w文字字符代替\.

Finally... s@@5%%suchstro is probably not a word, is it? Maybe you need \w, word characters, in place of \.

regexp(str, '(?<=\s)\<?\w*?\>(?=\s)', 'match')

'body'    'or'    'collection'    'of'    'such'    'stories'

在这种形式下，懒惰/贪婪的区别不再是问题，因此可以将表达式简化为(?<=\s)\<\w*\>(?=\s)甚至是(?<=\s)\w*(?=\s)，因为空格提供了单词边界.

In this form, the lazy/greedy distinction is no longer an issue, so the expression can be simplified to (?<=\s)\<\w*\>(?=\s) or even to (?<=\s)\w*(?=\s) since spaces provide word boundaries.

matlab正则表达式:单词以空格"\< \ s.* \ s \>"开头和结尾

相关推荐