使用urllib2从FlightRadar24获取数据时出现问题

问题描述:

我正在尝试使用以下脚本从FlightRadar24中获取数据,该脚本基于此答案来处理Cookie.当我当前在浏览器中键入该url时,会得到一个不错的long json或字典,其中包含经纬度/经度/alt更新列表.但是当我尝试下面的代码时,我收到下面列出的错误消息.

I'm trying to get data from FlightRadar24 using the script below, based on this answer to handle cookies. When I currently type that url into a browser, I get a nice long json or dictionary including a list of lat/long/alt updates. But when I try the code below, I get the error message listed below.

要成功将json读入python,我需要做什么?

What do I need to do to successfully read the json into python?

注意:该链接可能会在一两周后停止工作-它们不会使数据永远可用.

NOTE: that link may stop working in a week or two - they don't make the data available forever.

import urllib2 
import cookielib

jar = cookielib.FileCookieJar("cookies")
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
url = "http://lhr.data.fr24.com/_external/planedata_json.1.3.php?f=72c5ef5"

response = opener.open(url)
print response.headers
print "Got page"
print "Currently have %d cookies" % len(jar)
print jar

回溯(最近通话最近): 文件"[mypath]/test v00.py",第8行,在 响应= opener.open(链接) 打开的文件"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",第410行 响应= meth(req,响应) http_response中的文件"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",第523行 'http',请求,响应,代码,msg,hdr) 文件"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",行448,错误 返回self._call_chain(* args) _call_chain中的文件"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",第382行 结果= func(* args) http://error_default中的文件"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",第531行 引发HTTPError(req.get_full_url(),code,msg,hdrs,fp) HTTPError:HTTP错误403:禁止

Traceback (most recent call last): File "[mypath]/test v00.py", line 8, in response = opener.open(link) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 448, in error return self._call_chain(*args) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 403: Forbidden

我不确定您需要Cookie的什么,但是问题是Web服务器正在阻止对请求标头中urllib发送的用户代理的访问(类似于-'Python-urllib/2.7'之类的东西).

I am not sure what you need cookies for, but the issue is that the webserver is blocking access to the user-agent being sent by urllib in the request header (which is something like - 'Python-urllib/2.7' or so) .

您应该在标题中添加有效的浏览器用户代理,以获取正确的数据.示例-

You should add a valid browser User-agent to the header to get the correct data. Example -

import urllib2
url = "http://lhr.data.fr24.com/_external/planedata_json.1.3.php?f=72c5ef5"
req = urllib2.Request(url, headers={"Connection":"keep-alive", "User-Agent":"Mozilla/5.0"})
response = urllib2.urlopen(req)
jsondata = response.read()