用于匹配字符串中的双引号和/或单引号字符串的PHP正则表达式

用于匹配字符串中的双引号和/或单引号字符串的PHP正则表达式

问题描述:

I'm working on a template class and I've an issue when trying to parse out a list of quoted strings from a string argument list. Take for example the string:

$string = 'VAR_SELECTED, \'Hello m\'lady\', "null"';

I'm having a problem coming up with a regex that extracts the string "Hello m'lady" and "null". The closest I have got is

$string = 'VAR_SELECTED, \'Hello m\'lady\', "null", \'TE\'ST\'';
preg_match_all('/(?:[^\']|\\\\.)+|(?:[^"]|\\\\.)+/', $string, $matches);
print_r($matches);

Which outputs:

Array
(
    [0] => Array
        (
            [0] => VAR_SELECTED, 
            [1] => 'Hello m'lady', 
            [2] => "null", 
            [3] => 'TE'ST'
        )

)

However a more complex case of:

$string = 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"';
preg_match_all('/(?:[^\']|\\\\.)+|(?:[^"]|\\\\.)+/', $string, $matches);
print_r($matches);  

outputs:

Array
(
    [0] => Array
        (
            [0] => VAR_SELECTED, 
            [1] => 'Hello 
            [2] => "Father"
            [3] => ', 
            [4] => "Hello 
            [5] => 'Luke'
            [6] => "
        )

)

Can anyone help me solve this problem? Are multiple regexes the way forward?

Edit Maybe it would be easier to replace the commas within the strings with a placeholder and then break apart the strings with an explode?

Edit 2 Just thought of a simple insecure option (that I am not going to use), but generates an E_NOTICE error.

$string = 'return array(VAR_SELECTED, \'Hello , "Father"\', "Hello \'Luke\'4");';
$string = eval($string);
print_r($string);

我正在处理模板类,在尝试解析引用字符串列表时遇到问题 来自字符串参数列表。 以字符串为例: p>

  $ string ='VAR_SELECTED,\'Hello m \'lady \',“null”'; 
  code>  pre  > 
 
 

我遇到了一个提取字符串“Hello m'lady”和“null”的正则表达式的问题。 我最接近的是 p>

  $ string ='VAR_SELECTED,\'Hello m \'lady \',“null”,\'TE \'ST \'';  
preg_match_all('/(?:[^ \'] | \\\\。)+ |(?:[^“] | \\\\。)+ /',$ string,$ matches); 
print_r(  $ matches); 
  code>  pre> 
 
 

哪些输出: p>

  Array 
(
 [0] => 数组
(
 [0] => VAR_SELECTED,
 [1] =>'Hello m'lady',
 [2] =>“null”,
 [3] =>'  TE'ST'
)
 
)
  code>  pre> 
 
 

然而,更复杂的情况是: p>

   $ string ='VAR_SELECTED,\'Hello“父亲”\“,”你好\'Luke \'“'; 
 
 
 npreg_match_all('/(?:[^ \'] | \\\\。)+ |(?:  [^“] | \\\\。)+ /',$ string,$ matches); 
print_r($ matches);  
  code>  pre> 
 
 

输出: p>

  Array 
(
 [0] => Array 
(\  n [0] => VAR_SELECTED,
 [1] =>'Hello 
 [2] =>“父亲”
 [3] =>',
 [4] =>“你好 
 [5] =>'Luke'
 [6] =>“
)
 
)
  code>  pre> 
 
 

任何人都可以帮我解决 这个问题? 多个正则表达式是前进的方向吗? p>

编辑 strong>也许用占位符替换字符串中的逗号会更容易,然后用爆炸拆分字符串 ? p>

编辑2 strong>想到一个简单的不安全选项(我不打算使用),但会产生E_NOTICE错误。 p>

  $ string ='return array(VAR_SELECTED,\'Hello,“Father”\',“Hello \'Luke \'4”);'; 
 $ string = eval($ string)  ; 
print_r($ string); 
  code>  pre> 
  div>

Try this:

/(?<=^|[\s,])(?:(['"]).*?\1|[^\s,'"]+)(?=[\s,]|$)/

Or, as a PHP single-quoted string literal:

'/(?<=^|[\s,])(?:([\'"]).*?\1|[^\s,\'"]+)(?=[\s,]|$)/'

That regex yields the desired result, but I think you're going about this wrong. Usually, if a quoted string needs to contain a literal quote character, the quote is escaped, either with a backslash or with another quote. You aren't doing that, so I had to use a fragile hack based on lookarounds. Are you sure the data isn't supposed to look like this?

$string = 'VAR_SELECTED, \'Hello m\\'lady\', "null"';

$string = 'VAR_SELECTED, \'Hello "Father"\', "Hello \\'Luke\\'"';

Come to think of it, doesn't PHP have built-in support for CSV data?

You want to use a back reference in the match string.

preg_match_all('@([\'"]).*[^\\\\]\1@', $string, $matches);

This will start matching with the first instance of " or ' and then match the longest string that ends with a matching " or ' that isn't escaped.

Array (
[0] => Array
    (
        [0] => 'Hello m'lady', "null", 'TE'ST'
    )

[1] => Array
    (
        [0] => '
    )

Here's how i would do it:

Break the task down into the component steps you want to take:

1.) Explode the string on commas.

For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>" \'Hello m\'lady\'"
[2]=>" "null""

For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>" \'Hello "Father"\'"
[2]=>" "Hello \'Luke\'""

2.) Run Trim on all three to get rid of any whitespace

For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"\'Hello m\'lady\'"
[2]=>""null""

For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"\'Hello "Father"\'"
[2]=>""Hello \'Luke\'""

3.) Run str_replace(" \ "," ",$text) to get rid of the slashes. (remove spaces..added for readability only, so that should be a naked slash and an "empty" string)

For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"'Hello m'lady'"
[2]=>""null""

For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"'Hello "Father"'"
[2]=>""Hello 'Luke'""

4.) Run trim again, only trim($text, " ' " ") (remove spaces..added for readability only)

For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"Hello m'lady"
[2]=>"null"

For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"Hello "Father""
[2]=>"Hello 'Luke'"

I haven't tested this, but the logic is sound. A quick and dirty way to test 98% of all the regex's (in my experience) is to use http://rubular.com/ It's a great site. Usually if it starts to choke on a regex, it's my first sign that i should break the problem down more. (that's just opinion ~dons flameproof suit~)