可靠地将包含PHP数组信息的字符串转换为数组[duplicate]
Possible Duplicate:
Simulate php array language construct or parse with regexp?
suppose I have the string
$str = "array(1,3,4),array(array(4,5,6)),'this is a comma , inside a string',array('asdf' => 'lalal')";
and I try to explode this into an array by comma so that the desired end result is
$explode[0] = array(1,3,4);
$explode[1] = array(array(4,5,6));
$explode[2] = 'this is a comma , inside a string';
$explode[3] = array('asdf' => 'lalal');
simply calling explode(',',$str)
is not going to cut it since there are also commas within those chunks...
is there a way to explode this reliably even if there is commas inside the desired chunks
可能重复: strong>
模拟php数组语言构造或用regexp解析? p> blockquote>假设我有字符串 p>
$ str =“array(1,3,4),array (array(4,5,6)),'这是逗号,在字符串内',数组('asdf'=>'lalal')“; code> pre>
我尝试用逗号将其分解为一个数组,以便所需的最终结果是 p>
$ explode [0] = array(1,3,4); $ explode [1] = array(array(4,5,6)); $ explode [2] ='这是逗号,在字符串内'; $ explode [3] = array(' asdf'=>'lalal'); code> pre>
只需调用
explode(',',$ str) code>就不会削减 因为在这些块中也有逗号... p>
有没有办法可靠地爆炸它,即使所需的块内有逗号 p> DIV>
is there a way to explode this reliably even if there is commas inside the desired chunks?
PHP by default does not provide such a function. However you have a compact subset of PHP inside your string and PHP offers some tools here: A PHP tokenizer and a PHP parser.
Therefore it's possible for your string specification to create a helper function that validates the input against allowed tokens and then parse it:
$str = "array(1,3,4),array(array(4,5,6)),'this is a comma , inside a string', array('asdf' => 'lalal')";
function explode_string($str)
{
$result = NULL;
// validate string
$isValid = FALSE;
$tokens = token_get_all(sprintf('<?php %s', $str));
array_shift($tokens);
$valid = array(305, 315, 358, 360, 371, '(', ')', ',');
foreach($tokens as $token)
{
list($index) = (array) $token;
if (!in_array($index, $valid))
{
$isValid = FALSE;
break;
}
}
if (!$isValid)
throw new InvalidArgumentException('Invalid string.');
// parse string
$return = eval(sprintf('return array(%s);', $str));
return $return;
}
echo $str, "
";
$result = explode_string($str);
var_dump($result);
The tokens used are:
T_LNUMBER (305)
T_CONSTANT_ENCAPSED_STRING (315)
T_DOUBLE_ARROW (358)
T_ARRAY (360)
T_WHITESPACE (371)
The token index number can be given a token name by using token_name
.
Which gives you (Demo):
Array
(
[0] => Array
(
[0] => 1
[1] => 3
[2] => 4
)
[1] => Array
(
[0] => Array
(
[0] => 4
[1] => 5
[2] => 6
)
)
[2] => this is a comma , inside a string
[3] => Array
(
[asdf] => lalal
)
)
You can write a simple parser:
function explode_str_arr($str) {
$str.=',';
$escape_char = '';
$str_len = strlen($str);
$cur_value = '';
$return_arr = array();
$cur_bracket_level = 0;
for ($i = 0; $i < $str_len; $i++) {
if ($escape_char) {
if ($str[$i] === $escape_char) {
$escape_char = '';
}
$cur_value.=$str[$i];
continue;
}
switch ($str[$i]) {
case '\'':
case '"':
$escape_char = $str[$i];
break;
case '(':
$cur_bracket_level++;
break;
case ')':
$cur_bracket_level--;
break;
case ',':
if (!$cur_bracket_level) {
$return_arr[] = $cur_value;
$cur_value = '';
continue 2;
}
}
$cur_value.=$str[$i];
}
return $return_arr;
}
It is ugly unicode-breaking fast code, but I think you may get the idea.