正则表达式删除非utf-8字符但新行

正则表达式删除非utf-8字符但新行

问题描述:

I have a string which contains a new line feed and some non-utf8 characters. I'm trying to write some regex that will replace non utf-8 characters but it should keep the line endings.

Below is what I have from PHP

PHP preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);

It's stripping the non utf-8 characters but it's also stripping the new line endings and I can't find out how to do this.

I've tried /[\x00-\x1F\x80-\xFF\^ ]/ but hasn't worked.

我有一个包含新换行符和一些非utf8字符的字符串。 我正在尝试编写一些将替换非utf-8字符的正则表达式,但它应该保留行结尾。 p>

以下是我从PHP获得的内容 p>

  PHP preg_replace('/ [\ x00- \ x1F \ x80- \ xFF] /  ','',$ string); 
  code>  pre> 
 
 

它正在剥离非utf-8字符,但它也剥离了新的行结尾,我无法知道如何 要做到这一点。 p>

我已经尝试了 / [\ x00- \ x1F \ x80- \ xFF \ ^ ] / code>但是没有用。 p> div>

Add a negative lookahead at the start. Now this won't match newline character.

preg_replace('/(?!
)[\x00-\x1F\x80-\xFF]/', '', $string);

or

preg_replace('/(?![
])[\x00-\x1F\x80-\xFF]/', '', $string);