HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: Not Found

HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: Not Found

问题描述:


from urllib import  parse
import requests
import os
from urllib import request
# result = parse.unquote("https%3A%2F%2Fshp%2Eqpic%2Ecn%2Fishow%2F2735111614%2F1637043312%5F84828260%5F13173%5FsProdImgNo%5F8%2Ejpg%2F200")
# print(result)

url = "https://apps.game.qq.com/cgi-bin/ams/module/ishow/V1.0/query/workList_inc.cgi?activityId=2735&sVerifyCode=ABCD&sDataType=JSON&iListNum=20&totalpage=0&page=0&iOrder=0&iSortNumClose=1&iAMSActivityId=51991&_everyRead=true&iTypeId=2&iFlowId=267733&iActId=2735&iModuleId=2735&_=1637242615737"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36",
    "Referer": "https://pvp.qq.com/"
}

def exact_url(data):
    url_list=[]
    for i in range(1,9):
        url = parse.unquote(data['sProdImgNo_%d'%i]).replace('200','0')
        url_list.append(url)
    return url_list

def main():


    result = requests.get(url,headers=headers).json()
    datas = result['List']
    for i,data in enumerate(datas):
        url_list = exact_url(data)
        name=parse.unquote(data["sProdName"])
        if name=="六周年庆壁纸":
            name=name+("%d"%(i-3))
        dirpath=os.path.join("images",name)
        os.mkdir(dirpath)
        for index,image_url in enumerate(url_list):
            request.urlretrieve(image_url,os.path.join(dirpath,"%d.jpg"%(index+1)))
            print("%s下载完成!"%image_url)

我这边实际测试了下,代码基本没有问题,你自己捕捉下异常,看看哪个url报的404


        for index,image_url in enumerate(url_list):
            try:
                request.urlretrieve(image_url,os.path.join(dirpath,"%d.jpg"%(index+1)))
                print("%s下载完成!"%image_url)
            except:
                print(name,image_url)

自己加上异常捕捉

img

img

网上存在着各种死链(404)、权限受限(403)、代码错误(500)等等情况,你需要自己在采集的时候只处理2xx的内容,其他都当做异常处理

指定的url访问地址错误,无法访问。