HTTP Requests

An Expected Failure

JSON

JSON stands for JavaScript Object Notation. It’s a way of representing hierarchical data.

JSON can represent:

objects (like dictionaries)
arrays (like lists)
strings
booleans
numbers (all number are floats)
null (like None)

We can use loads (which stands for “load string”) to parse a JSON string into native Python objects:

>>> import json
>>> json.loads("""
... {"things": ["shoe", "ball"], "number": 5.5, "a string": "hello", "nothing": null, "a bool": true}
... """)
{'a string': 'hello', 'nothing': None, 'a bool': True, 'number': 5.5, 'things': ['shoe', 'ball']}

We can use dumps (which stands for “dump string”) to take Python objects and convert them to JSON:

>>> import json
>>> json.dumps([
...     {"state": "California", "capital": "Sacramento"},
...     {"state": "Texas", "capital": "Austin"},
... ])
'[{"capital": "Sacramento", "state": "California"}, {"capital": "Austin", "state": "Texas"}]'
>>> json.dumps(5)
'5'
>>> json.dumps(True)
'true'
>>> json.dumps(None)
'null'

You can also customize the way JSON data is encoded and decoded to allow for custom data types to be encoded and to customize which data types are used when decoding:

>>> import json
>>> from decimal import Decimal
>>> json.loads('{"count": 5.6}')
{'count': 5.6}
>>> json.loads('{"count": 5.6}', parse_float=Decimal)
{'count': Decimal('5.6')}

There are third party libraries that can act as faster replacements for the built-in json package:

urllib

There is a urllib.request module included in the Python standard library for performing HTTP requests. It returns a file-like object which will return byte strings when read:

HTTP responses:

>>> from urllib.request import urlopen
>>> response = urlopen('http://pseudorandom.name')
>>> name = response.read()
>>> name
'Kyle Gee\n'
>>> response = urlopen('http://pseudorandom.name')
>>> name = response.read()
>>> name
'Kimberly Foley\n'
>>> response = urlopen('http://pseudorandom.name')
>>> name = response.read()
>>> name
'Patricia Puig\n'

This module works well when you need a simple HTTP request, but when you need custom SSL certificates or complex requests you may want to look into the third-party requests library, which is a bit easier to use.

requests

We have already seen urllib.request for performing HTTP requests:

>>> from urllib.request import urlopen
>>> with urlopen('http://example.com') as response:
...     print(response.read())
...
b'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8" />\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <style type="text/css">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 50px;\n        background-color: #fff;\n        border-radius: 1em;\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        body {\n            background-color: #fff;\n        }\n        div {\n            width: auto;\n            margin: 0 auto;\n            border-radius: 0;\n            padding: 1em;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is established to be used for illustrative examples in documents. You may use this\n    domain in examples without prior coordination or asking for permission.</p>\n    <p><a href="http://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>\n'

The built-in urllib.request library is great but there is a popular third-party library called requests that many Python programmers prefer to use instead.

A few cool features of the requests library:

Browser-style SSL certificate verification (this is important!)
Supports HTTP authentication
Automatically decodes Unicode response bodies

Let’s install requests:

$ python3 -m pip install requests
Collecting requests
  Using cached requests-2.20.1-py2.py3-none-any.whl
Installing collected packages: requests
Successfully installed requests-2.20.1

Now let’s try it out:

>>> import requests
>>> response = requests.get('http://example.com')
>>> response.text
'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8" />\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <style type="text/css">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 50px;\n        background-color: #fff;\n        border-radius: 1em;\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        body {\n            background-color: #fff;\n        }\n        div {\n            width: auto;\n            margin: 0 auto;\n            border-radius: 0;\n            padding: 1em;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is established to be used for illustrative examples in documents. You may use this\n    domain in examples without prior coordination or asking for permission.</p>\n    <p><a href="http://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>\n'
>>> type(response.text)
<class 'str'>

Notice that the response text was automatically decoded from a UTF-8 byte string to a unicode string. Cool!

Requests will construct a URL query string for us if we pass it query string parameters in our GET request:

>>> request_url = 'https://httpbin.org/get'
>>> params = {'query': "python", 'page': 1}
>>> response = requests.get(request_url, params=params)
>>> response.url
'http://httpbin.org/get?query=python&page=1'

Requests can also decode JSON responses for us:

>>> response.text
'{\n  "args": {\n    "page": "1", \n    "query": "python"\n  }, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.8.0"\n  }, \n  "origin": "99.95.174.219", \n  "url": "http://httpbin.org/get?query=python&page=1"\n}\n'
>>> response.json()
{'headers': {'Accept': '*/*', 'User-Agent': 'python-requests-2.20.1', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org'}, 'url': 'http://httpbin.org/get?query=python&page=1', 'origin': '99.95.174.219', 'args': {'query': 'python', 'page': '1'}}
>>> ip_addr = response.json()['origin']
>>> ip_addr
'99.95.174.219'

To raise an exception when an HTTP error occurs, we can use the raise_for_status method:

>>> response = requests.get('https://httpbin.org/status/418')
>>> response.status_code
418
>>> print(response.text)

    -=[ teapot ]=-

       _...._
     .'  _ _ `.
    | ."` ^ `". _,
    \_;`"---"`|//
      |       ;/
      \_     _/
        `"""`

>>> response.raise_for_status()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "site-packages/requests/models.py", line 837, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 418 Client Error: I'M A TEAPOT for url: https://httpbin.org/status/418

Requests also works with authentication:

>>> response = requests.get('https://httpbin.org/basic-auth/alina/secret')
>>> response.status_code
401
>>> response = requests.get('https://httpbin.org/basic-auth/alina/secret', auth=('alina', 'secret'))
>>> response.status_code
200
>>> response.json()
{'authenticated': True, 'user': 'alina'}

Requests also automatically handles decompressing gzipped encodings:

>>> response = requests.get('https://httpbin.org/gzip')
>>> response.json()
{'headers': {'Accept': '*/*', 'User-Agent': 'python-requests/2.8.0', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org'}, 'origin': '99.95.174.219', 'method': 'GET', 'gzipped': True}

To achieve this same effect with urllib.request we would need to pipe to gzip.GzipFile and then decode the JSON response.

>>> from urllib.request import urlopen
>>> from gzip import GzipFile
>>> import json
>>> with urlopen('https://httpbin.org/gzip') as response:
...     gzip_file = GzipFile(fileobj=response)
...     data = json.loads(gzip_file.read().decode('utf-8'))
...
>>> print(data)
{'headers': {'User-Agent': 'Python-urllib/3.4', 'Accept-Encoding': 'identity', 'Host': 'httpbin.org'}, 'origin': '99.95.174.219', 'method': 'GET', 'gzipped': True}

If we tried to read the raw data with urllib.request.urlopen we would just see encrypted bytes:

>>> from urllib.request import urlopen
>>> with urlopen('https://httpbin.org/gzip') as response:
...     print(response.read())
...
b"\x1f\x8b\x08\x00W/\x1fV\x02\xff=\x8eA\x0e\x820\x10E\xf7\x9c\xa2\x99\xb5\xd4\xa0\x18\x83;\x16D\x97.\xf4\x00B'e\x12l\x9b2,\x90pw\xa7\x92\xb8\x9c\xf7\xfe\xcf\x9f%S\n\xec\x87B@\x03\x17\xc5q\xc2\x9dJ\xac\xc7\x97\xc18\n[\xe4\x14Pw\x1d\x06\xce\x1b\xd7yC\xce\x8a\x002\xe8\x98x\x86_E27?r\x12=sh\xc9i\x1f\xed\xdf=G\x8cym\xa5\x91\x12\xf7\x99{\xef\xf2)\x0e\x03\xb5\xfb\xa3.AR\xeb6\xfdFq\xe9\x1b\xb86\x8f\xad\x0f>\x92%\x97XU\xe9\xea\xa4\x8bs\xa9\x0fE\x05\xd9\x9a}\x01u.\xc2g\xc3\x00\x00\x00"