[D]用python提取多段字符串该如何写正则表达式

[D]用python提取多段字符串该怎么写正则表达式。
<html>
<head>

</head>

<body>.....

<li>......</li>
<h2>
我需要的内容
</h2>
<p>
我需要的内容
</p>
<h3>
我需要的内容
</h3>

......

</body>
</html>
应该如何写正则表达式来提取我需要的内容呢？
望各位大侠指教。
我之前写的m=re.findall(r'(?<=<p>).+?(?=</p>)',ss,re.S)，只能提取<p>和</p>之间的内容
----------------------------------
Double活动：
原帖分数：20
加分：20

------解决方案--------------------
m=re.findall(r'(?<=<p>).+?(?=</p>)|(?<=<h2>).+?(?=</h2>)|(?<=<h3>).+?(?=</h3>)',ss)
------解决方案--------------------

Python code

r'<(h2|h3|p)>(.+?)</\1>'

------解决方案--------------------
前几天看标准库,刚好看到正则的别名
凑个热闹,可以用(?P<name>...给你想要的pattern取个别名,然后search到之后,返回一个dict
key就是你取的名字,value是值.对于多个pattern的情况下对程序员比较友好

Python code


# -*- coding: cp936 -*-
import re

s = '''<html>
<head>

</head>

<body>.....

<li>......</li>
<h2>
我需要的内容h2
</h2>
<p>
我需要的内容p
</p>
<h3>
我需要的内容h3
</h3>'''

res = r'.*?<h2>(?P<H2>.*?)</h2>.*?<p>(?P<P>.*?)</p>(?P<H3>.*?)</h3>'
target = re.compile(res,re.S|re.M)
match = target.search(s)

if match:
    for k in match.groupdict().keys():
        print k,': ',match.groupdict()[k]

[D]用python提取多段字符串该如何写正则表达式

相关推荐