python urllib2 20.6. urllib2 — extensible library for opening URLs
Note
The urllib2
module has been split across several modules in Python 3 named urllib.request
and urllib.error
. The 2to3 tool will automatically adapt imports when converting your sources to Python 3.
The urllib2
module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more.
The urllib2
module defines the following functions:
-
urllib2.
urlopen
(url[, data[, timeout[, cafile[, capath[, cadefault[, context]]]]]) -
Open the URL url, which can be either a string or a
Request
object.data may be a string specifying additional data to send to the server, or
None
if no such data is needed. Currently HTTP requests are the only ones that use data; the HTTP request will be a POST instead of a GET when the data parameter is provided. data should be a buffer in the standard application/x-www-form-urlencoded format. Theurllib.urlencode()
function takes a mapping or sequence of 2-tuples and returns a string in this format. urllib2 module sends HTTP/1.1 requests withConnection:close
header included.The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). This actually only works for HTTP, HTTPS and FTP connections.
If context is specified, it must be a
ssl.SSLContext
instance describing the various SSL options. SeeHTTPSConnection
for more details.The optional cafile and capath parameters specify a set of trusted CA certificates for HTTPS requests. cafile should point to a single file containing a bundle of CA certificates, whereas capath should point to a directory of hashed certificate files. More information can be found in
ssl.SSLContext.load_verify_locations()
.The cadefault parameter is ignored.
This function returns a file-like object with three additional methods:
-
geturl()
— return the URL of the resource retrieved, commonly used to determine if a redirect was followed -
info()
— return the meta-information of the page, such as headers, in the form of anmimetools.Message
instance (see Quick Reference to HTTP Headers) -
getcode()
— return the HTTP status code of the response.
Raises
URLError
on errors.Note that
None
may be returned if no handler handles the request (though the default installed globalOpenerDirector
usesUnknownHandler
to ensure this never happens).In addition, if proxy settings are detected (for example, when a
*_proxy
environment variable like was added. -
Changed in version 2.7.9: cafile, capath, cadefault, and context were added.
-
urllib2.
install_opener
(opener) -
Install an
OpenerDirector
instance as the default global opener. Installing an opener is only necessary if you want urlopen to use that opener; otherwise, simply callOpenerDirector.open()
instead ofurlopen()
. The code does not check for a realOpenerDirector
, and any class with the appropriate interface will work.
-
urllib2.
build_opener
([handler, ...]) -
Return an
OpenerDirector
instance, which chains the handlers in the order given. handlers can be either instances ofBaseHandler
, or subclasses ofBaseHandler
(in which case it must be possible to call the constructor without any parameters). Instances of the following classes will be in front of the handlers, unless the handlers contain them, instances of them or subclasses of them:ProxyHandler
(if proxy settings are detected),UnknownHandler
,HTTPHandler
,HTTPDefaultErrorHandler
,HTTPRedirectHandler
,FTPHandler
,FileHandler
,HTTPErrorProcessor
.If the Python installation has SSL support (i.e., if the
ssl
module can be imported),HTTPSHandler
will also be added.Beginning in Python 2.3, a
BaseHandler
subclass may also change itshandler_order
attribute to modify its position in the handlers list.
The following exceptions are raised as appropriate:
-
exception
urllib2.
URLError
-
The handlers raise this exception (or derived exceptions) when they run into a problem. It is a subclass of
IOError
.-
reason
-
The reason for this error. It can be a message string or another exception instance (
socket.error
for remote URLs,OSError
for local URLs).
-
-
exception
urllib2.
HTTPError
-
Though being an exception (a subclass of
URLError
), anHTTPError
can also function as a non-exceptional file-like return value (the same thing thaturlopen()
returns). This is useful when handling exotic HTTP errors, such as requests for authentication.-
code
-
An HTTP status code as defined in RFC 2616. This numeric value corresponds to a value found in the dictionary of codes as found in
BaseHTTPServer.BaseHTTPRequestHandler.responses
.
-
reason
-
The reason for this error. It can be a message string or another exception instance.
-
The following classes are provided:
-
class
urllib2.
Request
(url[, data][, headers][, origin_req_host][, unverifiable]) -
This class is an abstraction of a URL request.
url should be a string containing a valid URL.
data may be a string specifying additional data to send to the server, or
None
if no such data is needed. Currently HTTP requests are the only ones that use data; the HTTP request will be a POST instead of a GET when the data parameter is provided. data should be a buffer in the standard application/x-www-form-urlencoded format. Theurllib.urlencode()
function takes a mapping or sequence of 2-tuples and returns a string in this format.headers should be a dictionary, and will be treated as if
add_header()
was called with each key and value as arguments. This is often used to “spoof” theUser-Agent
header value, which is used by a browser to identify itself – some HTTP servers only allow requests coming from common browsers as opposed to scripts. For example, Mozilla Firefox may identify itself as"Mozilla/5.0 (X11; U;Linux i686) Gecko/20071127 Firefox/2.0.0.11"
, whileurllib2
’s default user agent string is"Python-urllib/2.6"
(on Python 2.6).The final two arguments are only of interest for correct handling of third-party HTTP cookies:
origin_req_host should be the request-host of the origin transaction, as defined by False. An unverifiable request is one whose URL the user did not have the option to approve. For example, if the request is for an image in an HTML document, and the user had no option to approve the automatic fetching of the image, this should be true.
-
class
urllib2.
OpenerDirector
-
The
OpenerDirector
class opens URLs viaBaseHandler
s chained together. It manages the chaining of handlers, and recovery from errors.
-
class
urllib2.
BaseHandler
-
This is the base class for all registered handlers — and handles only the simple mechanics of registration.
-
class
urllib2.
HTTPDefaultErrorHandler
-
A class which defines a default handler for HTTP error responses; all responses are turned into
HTTPError
exceptions.
-
class
urllib2.
HTTPRedirectHandler
-
A class to handle redirections.
-
class
urllib2.
HTTPCookieProcessor
([cookiejar]) -
A class to handle HTTP Cookies.
-
class
urllib2.
ProxyHandler
([proxies]) -
Cause requests to go through a proxy. If proxies is given, it must be a dictionary mapping protocol names to URLs of proxies. The default is to read the list of proxies from the environment variables Note
HTTP_PROXY
will be ignored if a variableREQUEST_METHOD
is set; see the documentation ongetproxies()
.