用于匹配字符串中的双引号和/或单引号字符串的PHP正则表达式
I'm working on a template class and I've an issue when trying to parse out a list of quoted strings from a string argument list. Take for example the string:
$string = 'VAR_SELECTED, \'Hello m\'lady\', "null"';
I'm having a problem coming up with a regex that extracts the string "Hello m'lady" and "null". The closest I have got is
$string = 'VAR_SELECTED, \'Hello m\'lady\', "null", \'TE\'ST\'';
preg_match_all('/(?:[^\']|\\\\.)+|(?:[^"]|\\\\.)+/', $string, $matches);
print_r($matches);
Which outputs:
Array
(
[0] => Array
(
[0] => VAR_SELECTED,
[1] => 'Hello m'lady',
[2] => "null",
[3] => 'TE'ST'
)
)
However a more complex case of:
$string = 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"';
preg_match_all('/(?:[^\']|\\\\.)+|(?:[^"]|\\\\.)+/', $string, $matches);
print_r($matches);
outputs:
Array
(
[0] => Array
(
[0] => VAR_SELECTED,
[1] => 'Hello
[2] => "Father"
[3] => ',
[4] => "Hello
[5] => 'Luke'
[6] => "
)
)
Can anyone help me solve this problem? Are multiple regexes the way forward?
Edit Maybe it would be easier to replace the commas within the strings with a placeholder and then break apart the strings with an explode?
Edit 2 Just thought of a simple insecure option (that I am not going to use), but generates an E_NOTICE error.
$string = 'return array(VAR_SELECTED, \'Hello , "Father"\', "Hello \'Luke\'4");';
$string = eval($string);
print_r($string);
我正在处理模板类,在尝试解析引用字符串列表时遇到问题 来自字符串参数列表。 以字符串为例: p>
$ string ='VAR_SELECTED,\'Hello m \'lady \',“null”';
code> pre >
我遇到了一个提取字符串“Hello m'lady”和“null”的正则表达式的问题。 我最接近的是 p>
$ string ='VAR_SELECTED,\'Hello m \'lady \',“null”,\'TE \'ST \'';
preg_match_all('/(?:[^ \'] | \\\\。)+ |(?:[^“] | \\\\。)+ /',$ string,$ matches);
print_r( $ matches);
code> pre>
哪些输出: p>
Array
(
[0] => 数组
(
[0] => VAR_SELECTED,
[1] =>'Hello m'lady',
[2] =>“null”,
[3] =>' TE'ST'
)
)
code> pre>
然而,更复杂的情况是: p>
$ string ='VAR_SELECTED,\'Hello“父亲”\“,”你好\'Luke \'“';
npreg_match_all('/(?:[^ \'] | \\\\。)+ |(?: [^“] | \\\\。)+ /',$ string,$ matches);
print_r($ matches);
code> pre>
输出: p>
Array
(
[0] => Array
(\ n [0] => VAR_SELECTED,
[1] =>'Hello
[2] =>“父亲”
[3] =>',
[4] =>“你好
[5] =>'Luke'
[6] =>“
)
)
code> pre>
任何人都可以帮我解决 这个问题? 多个正则表达式是前进的方向吗? p>
编辑 strong>也许用占位符替换字符串中的逗号会更容易,然后用爆炸拆分字符串 ? p>
编辑2 strong>想到一个简单的不安全选项(我不打算使用),但会产生E_NOTICE错误。 p>
$ string ='return array(VAR_SELECTED,\'Hello,“Father”\',“Hello \'Luke \'4”);';
$ string = eval($ string) ;
print_r($ string);
code> pre>
div>
Try this:
/(?<=^|[\s,])(?:(['"]).*?\1|[^\s,'"]+)(?=[\s,]|$)/
Or, as a PHP single-quoted string literal:
'/(?<=^|[\s,])(?:([\'"]).*?\1|[^\s,\'"]+)(?=[\s,]|$)/'
That regex yields the desired result, but I think you're going about this wrong. Usually, if a quoted string needs to contain a literal quote character, the quote is escaped, either with a backslash or with another quote. You aren't doing that, so I had to use a fragile hack based on lookarounds. Are you sure the data isn't supposed to look like this?
$string = 'VAR_SELECTED, \'Hello m\\'lady\', "null"';
$string = 'VAR_SELECTED, \'Hello "Father"\', "Hello \\'Luke\\'"';
Come to think of it, doesn't PHP have built-in support for CSV data?
You want to use a back reference in the match string.
preg_match_all('@([\'"]).*[^\\\\]\1@', $string, $matches);
This will start matching with the first instance of " or ' and then match the longest string that ends with a matching " or ' that isn't escaped.
Array (
[0] => Array
(
[0] => 'Hello m'lady', "null", 'TE'ST'
)
[1] => Array
(
[0] => '
)
Here's how i would do it:
Break the task down into the component steps you want to take:
1.) Explode the string on commas.
For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>" \'Hello m\'lady\'"
[2]=>" "null""
For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>" \'Hello "Father"\'"
[2]=>" "Hello \'Luke\'""
2.) Run Trim on all three to get rid of any whitespace
For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"\'Hello m\'lady\'"
[2]=>""null""
For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"\'Hello "Father"\'"
[2]=>""Hello \'Luke\'""
3.) Run str_replace(" \ "," ",$text) to get rid of the slashes. (remove spaces..added for readability only, so that should be a naked slash and an "empty" string)
For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"'Hello m'lady'"
[2]=>""null""
For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"'Hello "Father"'"
[2]=>""Hello 'Luke'""
4.) Run trim again, only trim($text, " ' " ") (remove spaces..added for readability only)
For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"Hello m'lady"
[2]=>"null"
For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"Hello "Father""
[2]=>"Hello 'Luke'"
I haven't tested this, but the logic is sound. A quick and dirty way to test 98% of all the regex's (in my experience) is to use http://rubular.com/ It's a great site. Usually if it starts to choke on a regex, it's my first sign that i should break the problem down more. (that's just opinion ~dons flameproof suit~)