可靠地将包含PHP数组信息的字符串转换为数组[duplicate]

问题描述:

Possible Duplicate:
Simulate php array language construct or parse with regexp?

suppose I have the string

$str = "array(1,3,4),array(array(4,5,6)),'this is a comma , inside a string',array('asdf' => 'lalal')";

and I try to explode this into an array by comma so that the desired end result is

$explode[0] =  array(1,3,4);
$explode[1] = array(array(4,5,6));
$explode[2] = 'this is a comma , inside a string';
$explode[3] = array('asdf' => 'lalal');

simply calling explode(',',$str) is not going to cut it since there are also commas within those chunks...

is there a way to explode this reliably even if there is commas inside the desired chunks

可能重复: strong>
模拟php数组语言构造或用regexp解析? p> blockquote>

假设我有字符串 p>

  $ str =“array(1,3,4),array  (array(4,5,6)),'这是逗号,在字符串内',数组('asdf'=>'lalal')“; 
  code>  pre> 
 
  

我尝试用逗号将其分解为一个数组,以便所需的最终结果是 p>

  $ explode [0] = array(1,3,4);  
 $ explode [1] = array(array(4,5,6)); 
 $ explode [2] ='这是逗号,在字符串内'; 
 $ explode [3] = array('  asdf'=>'lalal'); 
  code>  pre> 
 
 

只需调用 explode(',',$ str) code>就不会削减 因为在这些块中也有逗号... p>

有没有办法可靠地爆炸它,即使所需的块内有逗号 p> DIV>

is there a way to explode this reliably even if there is commas inside the desired chunks?

PHP by default does not provide such a function. However you have a compact subset of PHP inside your string and PHP offers some tools here: A PHP tokenizer and a PHP parser.

Therefore it's possible for your string specification to create a helper function that validates the input against allowed tokens and then parse it:

$str = "array(1,3,4),array(array(4,5,6)),'this is a comma , inside a string', array('asdf' => 'lalal')";

function explode_string($str)
{
    $result = NULL;

    // validate string
    $isValid = FALSE;
    $tokens = token_get_all(sprintf('<?php %s', $str));
    array_shift($tokens);
    $valid = array(305, 315, 358, 360, 371, '(', ')', ',');
    foreach($tokens as $token)
    {
        list($index) = (array) $token;
        if (!in_array($index, $valid))
        {
            $isValid = FALSE;
            break;
        }
    }
    if (!$isValid)
        throw new InvalidArgumentException('Invalid string.');

    // parse string
    $return = eval(sprintf('return array(%s);', $str));

    return $return;
}

echo $str, "
";

$result = explode_string($str);

var_dump($result);

The tokens used are:

T_LNUMBER (305)
T_CONSTANT_ENCAPSED_STRING (315)
T_DOUBLE_ARROW (358)
T_ARRAY (360)
T_WHITESPACE (371)

The token index number can be given a token name by using token_name.

Which gives you (Demo):

Array
(
    [0] => Array
        (
            [0] => 1
            [1] => 3
            [2] => 4
        )

    [1] => Array
        (
            [0] => Array
                (
                    [0] => 4
                    [1] => 5
                    [2] => 6
                )

        )

    [2] => this is a comma , inside a string
    [3] => Array
        (
            [asdf] => lalal
        )

)

You can write a simple parser:

function explode_str_arr($str) {
    $str.=',';
    $escape_char = '';
    $str_len = strlen($str);
    $cur_value = '';
    $return_arr = array();
    $cur_bracket_level = 0;
    for ($i = 0; $i < $str_len; $i++) {
        if ($escape_char) {
            if ($str[$i] === $escape_char) {
                $escape_char = '';
            }
            $cur_value.=$str[$i];
            continue;
        }

        switch ($str[$i]) {
            case '\'':
            case '"':
                $escape_char = $str[$i];
                break;
            case '(':
                $cur_bracket_level++;
                break;
            case ')':
                $cur_bracket_level--;
                break;
            case ',':
                if (!$cur_bracket_level) {
                    $return_arr[] = $cur_value;
                    $cur_value = '';
                    continue 2;
                }
        }
        $cur_value.=$str[$i];
    }
    return $return_arr;
}

It is ugly unicode-breaking fast code, but I think you may get the idea.