Python:替换为正则表达式
问题描述:
我需要替换字符串的一部分.我正在浏览 Python 文档并找到了 re.sub.
I need to replace part of a string. I was looking through the Python documentation and found re.sub.
import re
s = '<textarea id="Foo"></textarea>'
output = re.sub(r'<textarea.*>(.*)</textarea>', 'Bar', s)
print output
>>>'Bar'
我原以为这会打印 '
而不是 'bar'.
I was expecting this to print '<textarea id="Foo">Bar</textarea>'
and not 'bar'.
谁能告诉我我做错了什么?
Could anybody tell me what I did wrong?
答
您可以捕获要保留的部分,而不是捕获要替换的部分,然后然后使用引用 \1
引用它们以将它们包含在替换字符串中.
Instead of capturing the part you want to replace you can capture the parts you want to keep and then refer to them using a reference \1
to include them in the substituted string.
试试这个:
output = re.sub(r'(<textarea.*>).*(</textarea>)', r'\1Bar\2', s)
此外,假设这是 HTML,您应该考虑为此任务使用 HTML 解析器,例如 Beautiful Soup.
Also, assuming this is HTML you should consider using an HTML parser for this task, for example Beautiful Soup.