PHP:如何匹配一系列unicode配对的代理表情符号/表情符号?

PHP:如何匹配一系列unicode配对的代理表情符号/表情符号?

问题描述:

anubhava's answer about matching ranges of unicode characters led me to the regex to use for cleaning up a specific range of single code point of characters. With it, now I can match all miscellaneous symbols in this list (includes emoticons) with this simple expression:

preg_replace('/[\x{2600}-\x{26FF}]/u', '', $str);

However, I also want to match those in this list of paired/double surrogates emoji, but as nhahtdh explained in a comment:

There is a range from d800 to dfff to specify surrogates in UTF-16 to allow for more characters to be specified. A single surrogate is not a valid character in UTF-16 (a pair is necessary to specify a valid character).

So, for example, when I try this:

preg_replace('/\x{D83D}\x{DE00}/u', '', $str);

For replacing only the first of the paired surrogates on this list, i.e.:

anubhava em>关于匹配unicode字符范围的答案促使我使用正则表达式来清理特定范围的单个代码点字符。 有了它,现在我可以匹配此列表中的所有杂项符号 a >(包括表情符号)这个简单的表达式: p>

  preg_replace('/ [\ x {2600}  -  \ x {26FF}] / u','',$ str  ); 
  code>  pre> 
 
 

但是,我也希望匹配配对/双代理表情符号列表,但是 nhahtdh在评论中解释: p>

有一个范围来自 d800 code>到 dfff code>以指定UTF-16中的代理,以允许指定更多字符。 单个代理不是UTF-16中的有效字符(必须有一对 strong>来指定有效字符)。 p> blockquote>

所以, 例如,当我尝试这个时: p>

  preg_replace('/ \ x {D83D} \ x {DE00} / u','',$ str); 
   pre> 
 
 

仅替换第一个此列表中的配对代理,即:

revo's comment above was very helpful to find a solution:

If your PHP isn't shipped with a PCRE build for UTF-16 then you can't perform such a match. From PHP 7.0 on, you're able to use Unicode code points following this syntax \u{XXXX} e.g. preg_replace("~\u{1F600}~", '', $str); (Mind the double quotes)

Since I am using PHP 7, echo "\u{1F602}"; outputs