请求中的 URL 超过了最大重试次数
我正在尝试获取 App Store 的内容 >业务:
import requests
from lxml import html
page = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")
tree = html.fromstring(page.text)
flist = []
plist = []
for i in range(0, 100):
app = tree.xpath("//div[@class='column first']/ul/li/a/@href")
ap = app[0]
page1 = requests.get(ap)
当我使用 (0,2)
尝试 range
时它可以工作,但是当我将 range
放入 100
s 它显示此错误:
When I try the range
with (0,2)
it works, but when I put the range
in 100
s it shows this error:
Traceback (most recent call last):
File "/home/preetham/Desktop/eg.py", line 17, in <module>
page1 = requests.get(ap)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)
这里发生的是 itunes 服务器拒绝您的连接(您在短时间内从同一 IP 地址发送了太多请求时间)
What happened here is that itunes server refuses your connection (you're sending too many requests from same ip address in short period of time)
超过最大重试次数,网址为:/in/app/adobe-reader/id469337564?mt=8
Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8
错误跟踪具有误导性,它应该类似于无法建立连接,因为目标机器主动拒绝它".
error trace is misleading it should be something like "No connection could be made because the target machine actively refused it".
Github 上的 about python.requests lib 存在问题,请查看此处
There is an issue at about python.requests lib at Github, check it out here
要解决这个问题(与其说是问题,不如说是误导调试跟踪),您应该像这样捕获与连接相关的异常:
To overcome this issue (not so much an issue as it is misleading debug trace) you should catch connection related exceptions like so:
try:
page1 = requests.get(ap)
except requests.exceptions.ConnectionError:
r.status_code = "Connection refused"
解决这个问题的另一种方法是,如果你使用足够的时间间隔向服务器发送请求,这可以通过 python 中的 sleep(timeinsec)
函数来实现(不要忘记导入 sleep)
Another way to overcome this problem is if you use enough time gap to send requests to server this can be achieved by sleep(timeinsec)
function in python (don't forget to import sleep)
from time import sleep
总而言之,所有请求都是很棒的 Python 库,希望能解决您的问题.
All in all requests is awesome python lib, hope that solves your problem.