吴裕雄--天生自然python学习笔记:编写网络爬虫代码获取指定网站的图片

我们经常会在网上搜索井下载图片,然而一张一张地下载就太麻烦了,本案例
就是通过网络爬虫技术, 一次性下载该网站所有的图片并保存 。
网站图片下载并保存
将指定网站的 .jpg 和 .png 格式的图片全部下载井保存在自己本地新建的 images 文件夹中 。

吴裕雄--天生自然python学习笔记:编写网络爬虫代码获取指定网站的图片

 吴裕雄--天生自然python学习笔记:编写网络爬虫代码获取指定网站的图片

import requests,os
from bs4 import BeautifulSoup
from urllib.request import urlopen

url = 'http://www.tooopen.com/img/87.aspx'

html = requests.get(url)
html.encoding="utf-8"

sp = BeautifulSoup(html.text, 'html.parser')
# 建立images目录保存图片
images_dir="E:\images\"
if not os.path.exists(images_dir):
    os.mkdir(images_dir)
# 取得所有 <a> 和 <img> 标签
all_links=sp.find_all(['a','img']) 

for link in all_links:
    # 读取 src 和 href 属性内容
    src=link.get('src')
    href = link.get('href')
    attrs=[src,src]
    for attr in attrs:
        # 读取 .jpg 和 .png 檔
        if(attr != None)and(('.jpg' in attr)or('.png' in attr)):
            # 设置图片文件完整路径
            full_path = attr            
            filename = full_path.split('/')[-1]  # 取得图片名
            ext = filename.split('.')[-1]  #取得扩展名
            filename = filename.split('.')[-2] #取得主文件名
            if('jpg' in ext):
                filename = filename + '.jpg'
            else:   
                filename = filename + '.png'
            print(attr)
            # 保存图片
            try:
                image = urlopen(full_path)
                f = open(os.path.join(images_dir,filename),'wb')
                f.write(image.read())
                f.close()
            except:
                print("{} 无法读取!".format(filename))
print("当前页图片下载完了")
/static/image/logo.png
logo.png 无法读取!
/static/image/logo.png
logo.png 无法读取!
https://www.tooopen.com/static/ad/1500X50-viw.png
https://www.tooopen.com/static/ad/1500X50-viw.png
https://www.tooopen.com/static/ad/1500X50-too.png
https://www.tooopen.com/static/ad/1500X50-too.png
http://img08.tooopen.com/20190807/tooopen_wk_131356135671827.jpg
tooopen_wk_131356135671827.jpg 无法读取!
http://img08.tooopen.com/20190807/tooopen_wk_131356135671827.jpg
tooopen_wk_131356135671827.jpg 无法读取!
http://img08.tooopen.com/20190807/tooopen_wk_131356135689691.jpg
tooopen_wk_131356135689691.jpg 无法读取!
http://img08.tooopen.com/20190807/tooopen_wk_131356135689691.jpg
tooopen_wk_131356135689691.jpg 无法读取!
http://img08.tooopen.com/20190807/tooopen_wk_131355135547978.jpg
tooopen_wk_131355135547978.jpg 无法读取!
http://img08.tooopen.com/20190807/tooopen_wk_131355135547978.jpg
tooopen_wk_131355135547978.jpg 无法读取!
http://img08.tooopen.com/20191204/tooopen_sl_135027502737700.jpg
tooopen_sl_135027502737700.jpg 无法读取!
http://img08.tooopen.com/20191204/tooopen_sl_135027502737700.jpg
tooopen_sl_135027502737700.jpg 无法读取!
http://img08.tooopen.com/20191122/tooopen_sl_102334233455130.jpg
tooopen_sl_102334233455130.jpg 无法读取!
http://img08.tooopen.com/20191122/tooopen_sl_102334233455130.jpg
tooopen_sl_102334233455130.jpg 无法读取!
http://img08.tooopen.com/20191121/tooopen_sl_095522552259405.jpg
tooopen_sl_095522552259405.jpg 无法读取!
http://img08.tooopen.com/20191121/tooopen_sl_095522552259405.jpg
tooopen_sl_095522552259405.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_093830383053575.jpg
tooopen_sl_093830383053575.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_093830383053575.jpg
tooopen_sl_093830383053575.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_093534353474034.jpg
tooopen_sl_093534353474034.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_093534353474034.jpg
tooopen_sl_093534353474034.jpg 无法读取!
http://img08.tooopen.com/20191205/tooopen_sl_134926492663201.jpg
tooopen_sl_134926492663201.jpg 无法读取!
http://img08.tooopen.com/20191205/tooopen_sl_134926492663201.jpg
tooopen_sl_134926492663201.jpg 无法读取!
http://img08.tooopen.com/20191122/tooopen_sl_102328232897349.jpg
tooopen_sl_102328232897349.jpg 无法读取!
http://img08.tooopen.com/20191122/tooopen_sl_102328232897349.jpg
tooopen_sl_102328232897349.jpg 无法读取!
http://img08.tooopen.com/20191121/tooopen_sl_162428242838278.jpg
tooopen_sl_162428242838278.jpg 无法读取!
http://img08.tooopen.com/20191121/tooopen_sl_162428242838278.jpg
tooopen_sl_162428242838278.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_093827382762634.jpg
tooopen_sl_093827382762634.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_093827382762634.jpg
tooopen_sl_093827382762634.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_093529352941470.jpg
tooopen_sl_093529352941470.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_093529352941470.jpg
tooopen_sl_093529352941470.jpg 无法读取!
http://img08.tooopen.com/20191006/tooopen_sl_09550855847444.jpg
tooopen_sl_09550855847444.jpg 无法读取!
http://img08.tooopen.com/20191006/tooopen_sl_09550855847444.jpg
tooopen_sl_09550855847444.jpg 无法读取!
http://img08.tooopen.com/20191119/tooopen_sl_115948594813304.jpg
tooopen_sl_115948594813304.jpg 无法读取!
http://img08.tooopen.com/20191119/tooopen_sl_115948594813304.jpg
tooopen_sl_115948594813304.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_16270727715545.jpg
tooopen_sl_16270727715545.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_16270727715545.jpg
tooopen_sl_16270727715545.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_093822382227436.jpg
tooopen_sl_093822382227436.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_093822382227436.jpg
tooopen_sl_093822382227436.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_092538253863445.jpg
tooopen_sl_092538253863445.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_092538253863445.jpg
tooopen_sl_092538253863445.jpg 无法读取!
http://img08.tooopen.com/20190924/tooopen_sl_095323532347706.jpg
tooopen_sl_095323532347706.jpg 无法读取!
http://img08.tooopen.com/20190924/tooopen_sl_095323532347706.jpg
tooopen_sl_095323532347706.jpg 无法读取!
http://img08.tooopen.com/20191121/tooopen_sl_101744174437980.jpg
tooopen_sl_101744174437980.jpg 无法读取!
http://img08.tooopen.com/20191121/tooopen_sl_101744174437980.jpg
tooopen_sl_101744174437980.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_094151415155508.jpg
tooopen_sl_094151415155508.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_094151415155508.jpg
tooopen_sl_094151415155508.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_093819381985689.jpg
tooopen_sl_093819381985689.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_093819381985689.jpg
tooopen_sl_093819381985689.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_092534253435074.jpg
tooopen_sl_092534253435074.jpg 无法读取!
http://img08.tooopen.com/20191115/tooopen_sl_092534253435074.jpg
tooopen_sl_092534253435074.jpg 无法读取!
https://www.tooopen.com/static/image/tooopen-2w.png
https://www.tooopen.com/static/image/tooopen-2w.png
当前页图片下载完了

吴裕雄--天生自然python学习笔记:编写网络爬虫代码获取指定网站的图片