正则表达式删除非utf-8字符但新行

问题描述：

I have a string which contains a new line feed and some non-utf8 characters. I'm trying to write some regex that will replace non utf-8 characters but it should keep the line endings.

Below is what I have from PHP

PHP preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);

It's stripping the non utf-8 characters but it's also stripping the new line endings and I can't find out how to do this.

I've tried /[\x00-\x1F\x80-\xFF\^ ]/ but hasn't worked.

我有一个包含新换行符和一些非utf8字符的字符串。我正在尝试编写一些将替换非utf-8字符的正则表达式，但它应该保留行结尾。 p>

以下是我从PHP获得的内容 p>

  PHP preg_replace（'/ [\ x00- \ x1F \ x80- \ xFF] /  '，''，$ string）; 
  code>  pre> 
 
 它正在剥离非utf-8字符，但它也剥离了新的行结尾，我无法知道如何 要做到这一点。 p> 
 
 
我已经尝试了 / [\ x00- \ x1F \ x80- \ xFF \ ^ 
] /  code>但是没有用。   p> 
  div>

答

Add a negative lookahead at the start. Now this won't match newline character.

preg_replace('/(?!
)[\x00-\x1F\x80-\xFF]/', '', $string);

preg_replace('/(?![
])[\x00-\x1F\x80-\xFF]/', '', $string);

正则表达式删除非utf-8字符但新行

相关推荐