去除换行符的正则表达式
我是 Python 的完全新手,我遇到了正则表达式问题.我试图删除文本文件中每一行末尾的换行符,但前提是它跟在小写字母之后,即 [a-z]
.如果行尾以小写字母结尾,我想用空格替换换行符/换行符.
I am a complete newbie to Python, and I'm stuck with a regex problem. I'm trying to remove the line break character at the end of each line in a text file, but only if it follows a lowercase letter, i.e. [a-z]
. If the end of the line ends in a lower case letter, I want to replace the line break/newline character with a space.
这是我目前得到的:
import re
import sys
textout = open("output.txt","w")
textblock = open(sys.argv[1]).read()
textout.write(re.sub("[a-z]\z","[a-z] ", textblock, re.MULTILINE) )
textout.close()
尝试
re.sub(r"(?<=[a-z])\r?\n"," ", textblock)
\Z
只匹配字符串的末尾,在最后一个换行符之后,所以它绝对不是你在这里需要的.\z
不被 Python 正则表达式引擎识别.
\Z
only matches at the end of the string, after the last linebreak, so it's definitely not what you need here. \z
is not recognized by the Python regex engine.
(?<=[az])
是一个 正向后视断言,用于检查当前位置之前的字符是否为小写 ASCII 字符.只有这样,正则表达式引擎才会尝试匹配换行符.
(?<=[a-z])
is a positive lookbehind assertion that checks if the character before the current position is a lowercase ASCII character. Only then the regex engine will try to match a line break.
此外,始终使用带有正则表达式的原始字符串.使反斜杠更容易处理.
Also, always use raw strings with regexes. Makes backslashes easier to handle.