您的位置: 首页 > 技术问答 > 用于匹配字符串中的双引号和/或单引号字符串的PHP正则表达式

用于匹配字符串中的双引号和/或单引号字符串的PHP正则表达式

分类: 技术问答 • 2022-03-12 17:40:05

问题描述：

I'm working on a template class and I've an issue when trying to parse out a list of quoted strings from a string argument list. Take for example the string:

$string = 'VAR_SELECTED, \'Hello m\'lady\', "null"';

I'm having a problem coming up with a regex that extracts the string "Hello m'lady" and "null". The closest I have got is

$string = 'VAR_SELECTED, \'Hello m\'lady\', "null", \'TE\'ST\'';
preg_match_all('/(?:[^\']|\\\\.)+|(?:[^"]|\\\\.)+/', $string, $matches);
print_r($matches);

Which outputs:

Array
(
    [0] => Array
        (
            [0] => VAR_SELECTED, 
            [1] => 'Hello m'lady', 
            [2] => "null", 
            [3] => 'TE'ST'
        )

)

However a more complex case of:

$string = 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"';
preg_match_all('/(?:[^\']|\\\\.)+|(?:[^"]|\\\\.)+/', $string, $matches);
print_r($matches);

outputs:

Array
(
    [0] => Array
        (
            [0] => VAR_SELECTED, 
            [1] => 'Hello 
            [2] => "Father"
            [3] => ', 
            [4] => "Hello 
            [5] => 'Luke'
            [6] => "
        )

)

Can anyone help me solve this problem? Are multiple regexes the way forward?

Edit Maybe it would be easier to replace the commas within the strings with a placeholder and then break apart the strings with an explode?

Edit 2 Just thought of a simple insecure option (that I am not going to use), but generates an E_NOTICE error.

$string = 'return array(VAR_SELECTED, \'Hello , "Father"\', "Hello \'Luke\'4");';
$string = eval($string);
print_r($string);

我正在处理模板类，在尝试解析引用字符串列表时遇到问题来自字符串参数列表。以字符串为例： p>

  $ string ='VAR_SELECTED，\'Hello m \'lady \'，“null”'; 
  code>  pre  > 
 
 我遇到了一个提取字符串“Hello m'lady”和“null”的正则表达式的问题。 我最接近的是 p> 
 
 
  $ string ='VAR_SELECTED，\'Hello m \'lady \'，“null”，\'TE \'ST \'';  
preg_match_all（'/（？：[^ \'] | \\\\。）+ |（？：[^“] | \\\\。）+ /'，$ string，$ matches）; 
print_r（  $ matches）; 
  code>  pre> 
 
 哪些输出： p> 
 
 
  Array 
（
 [0] =＆gt; 数组
（
 [0] =＆gt; VAR_SELECTED，
 [1] =＆gt;'Hello m'lady'，
 [2] =＆gt;“null”，
 [3] =＆gt;'  TE'ST'
）
 
）
  code>  pre> 
 
 然而，更复杂的情况是： p> 
 
 
   $ string ='VAR_SELECTED，\'Hello“父亲”\“，”你好\'Luke \'“'; 
 
 
 npreg_match_all（'/（？：[^ \'] | \\\\。）+ |（？：  [^“] | \\\\。）+ /'，$ string，$ matches）; 
print_r（$ matches）;  
  code>  pre> 
 
 输出： p> 
 
 
  Array 
（
 [0] =＆gt; Array 
（\  n [0] =＆gt; VAR_SELECTED，
 [1] =＆gt;'Hello 
 [2] =＆gt;“父亲”
 [3] =＆gt;'，
 [4] =＆gt;“你好 
 [5] =＆gt;'Luke'
 [6] =＆gt;“
）
 
）
  code>  pre> 
 
 任何人都可以帮我解决 这个问题？ 多个正则表达式是前进的方向吗？ p> 
 
 
 编辑 strong>也许用占位符替换字符串中的逗号会更容易，然后用爆炸拆分字符串 ？ p> 
 
 
 编辑2  strong>想到一个简单的不安全选项（我不打算使用），但会产生E_NOTICE错误。 p> 
  
 
  $ string ='return array（VAR_SELECTED，\'Hello，“Father”\'，“Hello \'Luke \'4”）;'; 
 $ string = eval（$ string）  ; 
print_r（$ string）; 
  code>  pre> 
  div>

答

Try this:

/(?<=^|[\s,])(?:(['"]).*?\1|[^\s,'"]+)(?=[\s,]|$)/

Or, as a PHP single-quoted string literal:

'/(?<=^|[\s,])(?:([\'"]).*?\1|[^\s,\'"]+)(?=[\s,]|$)/'

That regex yields the desired result, but I think you're going about this wrong. Usually, if a quoted string needs to contain a literal quote character, the quote is escaped, either with a backslash or with another quote. You aren't doing that, so I had to use a fragile hack based on lookarounds. Are you sure the data isn't supposed to look like this?

$string = 'VAR_SELECTED, \'Hello m\\'lady\', "null"';

$string = 'VAR_SELECTED, \'Hello "Father"\', "Hello \\'Luke\\'"';

Come to think of it, doesn't PHP have built-in support for CSV data?

答

You want to use a back reference in the match string.

preg_match_all('@([\'"]).*[^\\\\]\1@', $string, $matches);

This will start matching with the first instance of " or ' and then match the longest string that ends with a matching " or ' that isn't escaped.

Array (
[0] => Array
    (
        [0] => 'Hello m'lady', "null", 'TE'ST'
    )

[1] => Array
    (
        [0] => '
    )

答

Here's how i would do it:

Break the task down into the component steps you want to take:

1.) Explode the string on commas.

For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>" \'Hello m\'lady\'"
[2]=>" "null""

For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>" \'Hello "Father"\'"
[2]=>" "Hello \'Luke\'""

2.) Run Trim on all three to get rid of any whitespace

For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"\'Hello m\'lady\'"
[2]=>""null""

For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"\'Hello "Father"\'"
[2]=>""Hello \'Luke\'""

3.) Run str_replace(" \ "," ",$text) to get rid of the slashes. (remove spaces..added for readability only, so that should be a naked slash and an "empty" string)

For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"'Hello m'lady'"
[2]=>""null""

For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"'Hello "Father"'"
[2]=>""Hello 'Luke'""

4.) Run trim again, only trim($text, " ' " ") (remove spaces..added for readability only)

For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"Hello m'lady"
[2]=>"null"

For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"Hello "Father""
[2]=>"Hello 'Luke'"

I haven't tested this, but the logic is sound. A quick and dirty way to test 98% of all the regex's (in my experience) is to use http://rubular.com/ It's a great site. Usually if it starts to choke on a regex, it's my first sign that i should break the problem down more. (that's just opinion ~dons flameproof suit~)

用于匹配字符串中的双引号和/或单引号字符串的PHP正则表达式

相关推荐