python:从html获取图像链接
问题描述:
来自这样的 html/rss 片段
From a html/rss snippet like this
[...]<div class="..." style="..."></div><p><a href="..."
<img alt="" heightt="" src="http://link.to/image"
width="" /></a><span style="">[...]
我想获取图像源链接http://link.to/image.jpg".我怎么能在python中做到这一点?谢谢.
I want to get the image src link "http://link.to/image.jpg". How can I do this in python? Thanks.
答
lxml
是工作的工具.
lxml
is the tool for the job.
从网页中抓取所有图像就像这样简单:
To scrape all the images from a webpage would be as simple as this:
import lxml.html
tree = lxml.html.parse("http://example.com")
images = tree.xpath("//img/@src")
print images
给予:
['/_img/iana-logo-pageheader.png', '/_img/icann-logo-micro.png']
如果是 RSS 提要,您需要使用 lxml.etree
解析它.
If it was an RSS feed, you'd want to parse it with lxml.etree
.