[d]Regex :提取网页中特定字符串的步骤
[d]Regex :提取网页中特定字符串的方法
要提取下面这一段网页代码中的如下内容
['http://down.qnwz.cn/uploads/media/broadcast/storymagazine/陷阱.mp3', 'http://down.qnwz.cn/uploads/media/broadcast/storymagazine/视界决定行为.mp3', 'http://down.qnwz.cn/uploads/media/broadcast/storymagazine/苦难人生中的那朵草莓.mp3', 'http://down.qnwz.cn/uploads/media/broadcast/storymagazine/当群众演员的日子.mp3']
我写的Regex如下:
pattern = re.compile(r"""embed src="(http.+?mp3)" """, re.I| re.M)
pattern.findall(text)
请问还有别的更好的写法吗?
<p style="text-indent:5.25pt;">
<span><b><span style="font-family:宋体;color:#17365d;">请点击播放按钮在</span><span style="color:#17365d;">IE</span></b></span><span><span><b><span style="font-family:宋体;color:#17365d;">浏览器下收听……</span><span style="color:#17365d;"></span></b></span></span>
</p>
<p>
<img src="http://down.qnwz.cn/uploads/allimg/120920/107864-12092015541E24.jpg" alt="" />
</p>
<p>
<embed src="http://down.qnwz.cn/uploads/media/broadcast/storymagazine/陷阱.mp3" type="video/x-ms-asf-plugin" width="300" height="60" autostart="false" loop="true" />
</p>
<p>
<img src="http://down.qnwz.cn/uploads/allimg/120920/107864-1209201556353P.jpg" alt="" />
</p>
<p>
<br />
</p>
<p>
<embed src="http://down.qnwz.cn/uploads/media/broadcast/storymagazine/视界决定行为.mp3" type="video/x-ms-asf-plugin" width="300" height="60" autostart="false" loop="true" />
</p>
<p>
<img src="http://down.qnwz.cn/uploads/allimg/120920/107864-120920155J2V6.jpg" alt="" />
</p>
<p>
<embed src="http://down.qnwz.cn/uploads/media/broadcast/storymagazine/苦难人生中的那朵草莓.mp3" type="video/x-ms-asf-plugin" width="300" height="60" autostart="false" loop="true" />
</p>
<p>
<br />
</p>
<p>
<img src="http://down.qnwz.cn/uploads/allimg/120920/107864-120920155R53Y.jpg" alt="" />
</p>
<p>
<embed src="http://down.qnwz.cn/uploads/media/broadcast/storymagazine/当群众演员的日子.mp3" type="video/x-ms-asf-plugin" width="300" height="60" autostart="false" loop="true" />
</p>
<p>
<br />
</p>
<p>
<span style="font-family:宋体;">———————————————</span>
</p>
<p>
--------------------
Double行动:
原帖分数:20
帖子加分:20
------解决方案--------------------
最简单的难道不是r'http.*?\.mp3'吗?
测试
要提取下面这一段网页代码中的如下内容
['http://down.qnwz.cn/uploads/media/broadcast/storymagazine/陷阱.mp3', 'http://down.qnwz.cn/uploads/media/broadcast/storymagazine/视界决定行为.mp3', 'http://down.qnwz.cn/uploads/media/broadcast/storymagazine/苦难人生中的那朵草莓.mp3', 'http://down.qnwz.cn/uploads/media/broadcast/storymagazine/当群众演员的日子.mp3']
我写的Regex如下:
pattern = re.compile(r"""embed src="(http.+?mp3)" """, re.I| re.M)
pattern.findall(text)
请问还有别的更好的写法吗?
<p style="text-indent:5.25pt;">
<span><b><span style="font-family:宋体;color:#17365d;">请点击播放按钮在</span><span style="color:#17365d;">IE</span></b></span><span><span><b><span style="font-family:宋体;color:#17365d;">浏览器下收听……</span><span style="color:#17365d;"></span></b></span></span>
</p>
<p>
<img src="http://down.qnwz.cn/uploads/allimg/120920/107864-12092015541E24.jpg" alt="" />
</p>
<p>
<embed src="http://down.qnwz.cn/uploads/media/broadcast/storymagazine/陷阱.mp3" type="video/x-ms-asf-plugin" width="300" height="60" autostart="false" loop="true" />
</p>
<p>
<img src="http://down.qnwz.cn/uploads/allimg/120920/107864-1209201556353P.jpg" alt="" />
</p>
<p>
<br />
</p>
<p>
<embed src="http://down.qnwz.cn/uploads/media/broadcast/storymagazine/视界决定行为.mp3" type="video/x-ms-asf-plugin" width="300" height="60" autostart="false" loop="true" />
</p>
<p>
<img src="http://down.qnwz.cn/uploads/allimg/120920/107864-120920155J2V6.jpg" alt="" />
</p>
<p>
<embed src="http://down.qnwz.cn/uploads/media/broadcast/storymagazine/苦难人生中的那朵草莓.mp3" type="video/x-ms-asf-plugin" width="300" height="60" autostart="false" loop="true" />
</p>
<p>
<br />
</p>
<p>
<img src="http://down.qnwz.cn/uploads/allimg/120920/107864-120920155R53Y.jpg" alt="" />
</p>
<p>
<embed src="http://down.qnwz.cn/uploads/media/broadcast/storymagazine/当群众演员的日子.mp3" type="video/x-ms-asf-plugin" width="300" height="60" autostart="false" loop="true" />
</p>
<p>
<br />
</p>
<p>
<span style="font-family:宋体;">———————————————</span>
</p>
<p>
--------------------
Double行动:
原帖分数:20
帖子加分:20
------解决方案--------------------
最简单的难道不是r'http.*?\.mp3'吗?
测试
- Python code
>>> import re >>> s = '''<p style="text-indent:5.25pt;"> <span><b><span style="font-family:宋体;color:#17365d;">请点击播放按钮在</span><span style="color:#17365d;">IE</span></b></span><span><span><b><span style="font-family:宋体;color:#17365d;">浏览器下收听……</span><span style="color:#17365d;"></span></b></span></span> </p> <p> <img src="http://down.qnwz.cn/uploads/allimg/120920/107864-12092015541E24.jpg" alt="" /> </p> <p> <embed src="http://down.qnwz.cn/uploads/media/broadcast/storymagazine/陷阱.mp3" type="video/x-ms-asf-plugin" width="300" height="60" autostart="false" loop="true" /> </p> <p> <img src="http://down.qnwz.cn/uploads/allimg/120920/107864-1209201556353P.jpg" alt="" /> </p> <p> <br /> </p> <p> <embed src="http://down.qnwz.cn/uploads/media/broadcast/storymagazine/视界决定行为.mp3" type="video/x-ms-asf-plugin" width="300" height="60" autostart="false" loop="true" /> </p> <p> <img src="http://down.qnwz.cn/uploads/allimg/120920/107864-120920155J2V6.jpg" alt="" /> </p> <p> <embed src="http://down.qnwz.cn/uploads/media/broadcast/storymagazine/苦难人生中的那朵草莓.mp3" type="video/x-ms-asf-plugin" width="300" height="60" autostart="false" loop="true" /> </p> <p> <br /> </p> <p> <img src="http://down.qnwz.cn/uploads/allimg/120920/107864-120920155R53Y.jpg" alt="" /> </p> <p> <embed src="http://down.qnwz.cn/uploads/media/broadcast/storymagazine/当群众演员的日子.mp3" type="video/x-ms-asf-plugin" width="300" height="60" autostart="false" loop="true" /> </p> <p> <br /> </p> <p> <span style="font-family:宋体;">———————————————</span> </p> <p>''' >>> res = r'http.*?\.mp3' >>> m = re.findall(res,s) >>> len(m) 4 >>> for c in m: print c http://down.qnwz.cn/uploads/media/broadcast/storymagazine/陷阱.mp3 http://down.qnwz.cn/uploads/media/broadcast/storymagazine/视界决定行为.mp3 http://down.qnwz.cn/uploads/media/broadcast/storymagazine/苦难人生中的那朵草莓.mp3 http://down.qnwz.cn/uploads/media/broadcast/storymagazine/当群众演员的日子.mp3 >>>