PHP:将字符串拆分为字母和数字组件的最佳方法
I have several strings of the format
AA11 AAAAAA1111111 AA1111111
Which is the best (most efficient) way to separate the alphabetic and numeric components of the string?
我有几个格式的字符串 p>
AA11 AAAAAA1111111 AA1111111 code> pre> blockquote>分离字符串的字母和数字组件的最佳(最有效)方法是什么? p> div>
If they're all a series of alpha, followed by a series of numeric, with no non-alphameric characters, then sscanf() is probably more efficient than regexp
$example = 'AAA11111';
list($alpha,$numeric) = sscanf($example, "%[A-Z]%d");
var_dump($alpha);
var_dump($numeric);
preg_split
should do the job fine.
preg_split('/(\w+)/', $input, -1, PREG_SPLIT_DELIM_CAPTURE);
The preg library is surprisingly efficient in handling strings, so I would assume it to be more efficient than anything you can write by hand, using more primitive string functions. But do a test and see for your self.
Instead of using RegEx straight away you can add one extra check for example:
if (ctype_alpha($testcase)) {
// Return the value it's only letters
} else if(ctype_digit($testcase)) {
// Return the value it's only numbers
} else {
//RegEx your string to split nums and alphas
}
EDIT: Obviously my answer didn't give an evidence which will perform better, that's why I did a test that produced the following result:
- preg_split took 5.3319189548492 seconds
- sscanf took 3.4432129859924 seconds
And the answer should have been sscanf
Here's the code that produced the result:
$string = "AAAAAAAAAA111111111111111";
$count = 1000000;
function prSplit($string) {
return preg_split( '/([A-Za-z]+)/', $string, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
}
function sScanfTest($string) {
return sscanf($string, "%[A-Z]%[0-9]");
}
function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
$startTime1 = microtime_float();
for($i=0; $i<$count; ++$i) {
prSplit($string);
}
$time1 = microtime_float() - $startTime1;
echo '1. preg_split took '.$time1.' seconds<br />';
$startTime2 = microtime_float();
for($i=0; $i<$count; ++$i) {
sScanfTest($string);
}
$time2 = microtime_float() - $startTime2;
echo '2. sscanf took '.$time2.' seconds';
Here is a working example using preg_split()
:
$strs = array( 'AA11', 'AAAAAA1111111', 'AA1111111');
foreach( $strs as $str)
foreach( preg_split( '/([A-Za-z]+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY) as $temp)
var_dump( $temp);
This outputs:
string(2) "AA"
string(2) "11"
string(6) "AAAAAA"
string(7) "1111111"
string(2) "AA"
string(7) "1111111"
This seems to work but when you try to pass something like "111111", it doesn't.
In my application, I am expecting several scenarios and what seems to be doing the trick is this
$referenceNumber = "AAA12132";
$splited = preg_split('/(\d+)/', $referenceNumber, -1, PREG_SPLIT_DELIM_CAPTURE);
var_dump($splited);
Note:
- Getting an array of 2 elements, it means the 0th index is the alpha while the 1st is the numerics.
- Getting array of just 1 element, means the 0th element is the numeric and no alphas.
- If you get more than 2 array items, then your string must be in this format “AAA1323SDC”
So given the above, you can play around with it based on your use case.
Cheers!