处理丹麦语特殊字符

问题描述:

我正在尝试解析一个字符串,将其拆分为不是字母或数字的内容

I am trying to parse a string, split it on what is not a letter or number

$parse_query_arguments = preg_split("/[^a-z0-9]+/i", 'København');

并构造一个 mysql 查询.即使我跳过 preg_split 并尝试直接输入字符串,它也会将其分成 2 个不同的字符串,'K' 和 'benhavn'.

and construct a mysql query. Even if I skip the preg_split and try to enter the string directly it breaks it into 2 different strings, 'K' and 'benhavn'.

我该如何处理这些问题?

How can I deal with these issues?

如果您使用像 a-z 这样的文字字符,那么它不会匹配重音字符.您可能想要使用各种字符类做更多的通用匹配:

If you're using literal characters like a-z then it won't match accented ones. You might want to use the various character classes available to do more generic matching:

/[[:alpha:][:digit]]/

[:alpha:] 集的范围比 a-z 更广泛.请记住,字符匹配是基于字符代码完成的,而 a-z 的顺序是按索引从字面上获取 az 之间的字符.诸如 ø 之类的字符不在此范围内,即使它们按字母顺序位于该范围之间.

The [:alpha:] set is much broader in scope than a-z. Remember character matching is done based on character code, and a-z in order take, literally, characters between a and z by index. Characters like ø lie outside this range even if they'd fall between that alphabetically.

计算机以 ASCII-abetical(UNICODEical?)顺序工作.

Computers work in ASCII-abetical (UNICODEical?) order.