如何找到除引号之间的空格以外的所有空格?
我需要按空格分隔字符串,但引号中的短语应保留为不分隔.示例:
I need to split string by spaces, but phrase in quotes should be preserved unsplitted. Example:
word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5
这应该导致在preg_split之后的数组:
this should result in array after preg_split:
array(
[0] => 'word1',
[1] => 'word2',
[2] => 'this is a phrase',
[3] => 'word3',
[4] => 'word4',
[5] => 'this is a second phrase',
[6] => 'word5'
)
我应该如何编写我的正则表达式来做到这一点?
How should I compose my regexp to do that?
PS.有相关问题,但我没有认为这对我来说有效.接受的答案提供了正则表达式来查找单词而不是空格.
PS. There is related question, but I don't think it works in my case. Accepted answer provides regexp to find words instead of whitespaces.
在#regex irc频道(irc.freenode.net)的用户MizardX的帮助下,找到了解决方案.它甚至支持单引号.
With the help of user MizardX from #regex irc channel (irc.freenode.net) solution was found. It even supports single quotes.
$str= 'word1 word2 \'this is a phrase\' word3 word4 "this is a second phrase" word5 word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5';
$regexp = '/\G(?:"[^"]*"|\'[^\']*\'|[^"\'\s]+)*\K\s+/';
$arr = preg_split($regexp, $str);
print_r($arr);
结果是:
Array (
[0] => word1
[1] => word2
[2] => 'this is a phrase'
[3] => word3
[4] => word4
[5] => "this is a second phrase"
[6] => word5
[7] => word1
[8] => word2
[9] => "this is a phrase"
[10] => word3
[11] => word4
[12] => "this is a second phrase"
[13] => word5
)
PS.唯一的缺点是此正则表达式仅适用于PCRE 7.
PS. Only disadvantage is that this regexp works only for PCRE 7.
原来,我在生产服务器上不支持PCRE 7,仅在其中安装了PCRE 6.即使它不像以前的PCRE 7那样灵活,仍可以使用的regexp是(摆脱了\ G和\ K):
It turned out that I do not have PCRE 7 support on production server, only PCRE 6 is installed there. Even though it is not as flexible as previous one for PCRE 7, regexp that will work is (got rid of \G and \K):
/(?:"[^"]*"|\'[^\']*\'|[^"\'\s]+)+/
对于给定的输入结果与上面相同.
For the given input result is the same as above.