拆分具有多个分隔符的字符串
I have seen many (before you go flagging this as a duplicate) on how to do this, but for some reason my output isn't working:
// $delimiters wanted: ', ' | '; ' | ',' | ';' | ' , ' | ', and ' | ' and ' | ',and '
$str = 'Name 1, Name 2; Name 3;Name4 , Name 5,Name 6, and Name 7,and Name 8 and Name 9';
$delimiter = array(
', ',
'; ',
';',
',',
' , ',
', and ',
' and ',
',and '
);
$str_new = explode( $delimiter[0], str_replace($delimiter, $delimiter[0], $str) );
However, when I output the array, I get this:
<?php foreach($str_new as $new) { echo 'a' . $new; } ?>
Array (
[0] => Name 1
[1] => Name 2
[2] => Name 3
[3] => // WHY IS THIS EMPTY?
[4] => Name 4
...
)
So is there a better way to match the delimiters I have listed?
我已经看到很多(在你将其标记为副本之前)如何做到这一点,但由于某种原因 我的输出不起作用: p>
// $ delimiters wanted:','| “; '| ','| ';' | ','| '和'| '和'| ','和'
$ str ='名称1,名称2; 名称3;名称4,名称5,名称6和名称7,名称8和名称9';
$ delimiter = array(
',',
';',
';',
',',
',',
'和',
'和',
'和'
);
$ str_new = explode($ delimiter [0],str_replace($ delimiter) ,$ delimiter [0],$ str));
code> pre>
但是,当我输出数组时,我得到了这个: p>
&lt;?php foreach($ str_new as $ new){echo'a'。 美元的新; }?&gt;
Array(
[0] =&gt;名称1
[1] =&gt;名称2
[2] =&gt;名称3
[3] =&gt; //为什么 这是空吗?
[4] =&gt;名称4
...
)
code> pre>
那么是否有更好的方法来匹配分隔符我 已列出? p>
div>
I'd use regexp like this in your case:
preg_split('/,? ?and | ?[,;] ?/', $str)
You may also want to replace spaces by \s
if the other space characters may appear (like TAB, for example) or even \s*
instead of ?
to cover the case of multiple spaces.
Have you tried something like this from php.net?
<?php
//$delimiters has to be array
//$string has to be array
function multiexplode ($delimiters,$string) {
$ready = str_replace($delimiters, $delimiters[0], $string);
$launch = explode($delimiters[0], $ready);
return $launch;
}
$text = "here is a sample: this text, and this will be exploded. this also | this one too :)";
$exploded = multiexplode(array(",",".","|",":"),$text);
print_r($exploded);
?>
Or something like Split String by Multiple Delimiters in PHP
In your code, between Name 6, and Name 7
, first the ,
gets replaced, then the and
.
Therefore you end up with this string:
Name 1, Name 2, Name 3, Name4, Name 5, Name 6, , Name 7, Name 8, Name 9
Hence, the empty value...
Clean your result array before outputting and you should be fine:
$str_out = array_filter($str_new);
The problem in your approach is, that you want to solve a problem using the wrong way. Even if you manage to create a list of delimiters, what happens if you need e.g. separate the words by another character, let's say, a '$' sign?
You should implement a tokenizer/lexer which reads the input char by char and distinguishes between white spaces, terminal and non terminal symbols/characters. The lexer would then generate a sequence of token, e.g.
STRING-SYMBOL:'NAME1'
KOMMA-SYMBOL
AND-SYMBOL
STRING-SYMBOL:'NAME2'
SEMICOLON-SYMBOL
STRING-SYMBOL:'NAME3'
AND-SYMBOL
...
EOF-SYMBOL
You then simply filter out any non STRING-SYMBOL
symbols (or you combine strings using the AND-SYMBOL
. This is (imho) the only rock solid solution. It is also very easy to extend and to generalize: Once you have written a nice tokenizer/lexer, you can use this approach for almost any string analyzing problem.
Writing a tokenizer is generally very simple: It scans the input char by char and first categorizes the char. It implements a simple state machine to collect characters which will form a symbol.
You may try to implement this using a regex, which should be possible as well. Anyway, the tokenizer will generate a list of token (or will retrieve the next one upon request). The last token it will retrieve is the EOF-TOKEN
indicating that the input sequence has been fully traversed.