正则表达式删除非utf-8字符但新行
问题描述:
I have a string which contains a new line feed and some non-utf8 characters. I'm trying to write some regex that will replace non utf-8 characters but it should keep the line endings.
Below is what I have from PHP
PHP preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
It's stripping the non utf-8 characters but it's also stripping the new line endings and I can't find out how to do this.
I've tried /[\x00-\x1F\x80-\xFF\^
]/
but hasn't worked.
我有一个包含新换行符和一些非utf8字符的字符串。 我正在尝试编写一些将替换非utf-8字符的正则表达式,但它应该保留行结尾。 p>
以下是我从PHP获得的内容 p>
PHP preg_replace('/ [\ x00- \ x1F \ x80- \ xFF] / ','',$ string);
code> pre>
它正在剥离非utf-8字符,但它也剥离了新的行结尾,我无法知道如何 要做到这一点。 p>
我已经尝试了 / [\ x00- \ x1F \ x80- \ xFF \ ^
] / code>但是没有用。 p>
div>
答
Add a negative lookahead at the start. Now this won't match newline character.
preg_replace('/(?!
)[\x00-\x1F\x80-\xFF]/', '', $string);
or
preg_replace('/(?![
])[\x00-\x1F\x80-\xFF]/', '', $string);