如何检查字符串是否仅由字母和数字组成? (PHP)

问题描述:

Title says it all: I am checking to see if a user's username contains anything that isn't a number or letter, such as €{¥]^}+<€, punctuation, spaces or even things like âæłęč. Is this possible in php?

You can use the ctype_alnum() function in PHP.

From the manual..

Check for alphanumeric character(s)
Returns TRUE if every character in text is either a letter or a digit, FALSE otherwise.

var_dump(ctype_alnum("æøsads")); // false
var_dump(ctype_alnum("123asd")); // true

PHP does REGEX

What you want to do is fairly trivial, PHP has a number of regex functions

Testing a String For a Character

If all you want is to know IF a string contains non-alphanumeric characters, then just use preg_match():

preg_match( '/[^A-Z|a-z|0-9]*/', $userName );

This will return 1 if the username contains anything other than alphanumeric (A-Z or a-z or 0to9), it returns 0 if it doesn't contain a non-alphanumeric.

Regex Pattern Elements

Regex PCRE patterns open and close with a delimiter such as a slash/, and that needs to be treated like a string (quoted):'/myPattern/' Some other key features are:

[ brackets contain match sets ]
[a-z] // means match any lowercase letter This pattern means check the current character in the $String relative to the pattern in these brackets, in this case match any lowercase letter a to z.

^ Caret (Meta-Character)
[^a-z] // means no lowercase letters If the caret ^ (aka hat) is the first character inside brackets, it NEGATES the pattern inside brackets so [^A|7] means match anything EXCEPT uppercase A and the numeral 7. (Note: when outside brackets, the caret ^ means the start of the string.)

\wWdDsS. Meta-Characters (WildCards)
\w // match all alphanumeric An escaped (i.e. preceded by a backslash \ ) lowercase w means match any "word" character, i.e. alphanumeric and the underscore _, this is shorthand for [A-Z|a-z|0-9|_]. The uppercase \W is the NOT word character, equivalent to [^A-Z|a-z|0-9|_] or [^\w]

.   // (dot) match ANY single character except return/newline
\w  // match any word character [A-Z|a-z|0-9|_]
\W  // NOT any word character [^A-Z|a-z|0-9|_]
\d  // match any digit [0-9]
\D  // NOT any digit [^0-9].
\s  // match any whitespace (tab, space, newline)
\S  // NOT any whitespace 

.*+?| Meta-Characters (Quantifiers))
These modify the behavior of the above.

*   // match previous character or [set] zero or more times, 
    // so .* means match everything (including nothing) until reaching a return/newline.
+   // match previous at least one or more times.
?   // match previous only zero or one time (i.e. optional).
|   // means logical OR, so: 
    [A-C|X-Z] // means match any of A,B,C,X,Y, or Z

Not shown: capture groups, backreferences, substitution (the real power of regex). See https://www.phpliveregex.com/#tab-preg-match for more including a live pattern-match playground that is based on the PHP functions, and delivers results as arrays.

Back To Your StringCleaning

So for your pattern, to match all non-letters and numbers (including underscores) you need either: '/[^A-Z|a-z|0-9]*/' or '/[\W|_]*/'

Strip Search

If instead you want to STRIP all the non-alpha characters from a string then use preg_replace( $Regex, $Replacement, $StringToClean )

<?php
    $username = 'Svéñ déGööfinøff';
    echo preg_replace('/[\W|_]*/', '', $username);
?>

The output is: SvdGfinff If you'd prefer to replace certain accented letters with standard latin ones to keep the names reasonably readable, then I believe you'd need a lookup table (array). There is one ready to use at the PHP site