File Objects

Fake Files

What if we want to read CSV data from a string?

>>> import csv
>>> csv_data = "1,2\n3,4"
>>> csv_reader = csv.reader(csv_data)
>>> for data in csv_reader:
...     print(data)
...
['1']
['', '']
['2']
[]
['3']
['', '']
['4']

That’s not what we want. CSV reader expects a list of strings to loop over and it will treat each one as a line in the CSV file.

So we could split our data by lines:

>>> csv_data = "purple,0.15\nindigo,0.25\nred,0.3\nblue,0.05\ngreen,0.25"
>>> csv_reader = csv.reader(csv_data.splitlines())
>>> colors = list(csv_reader)
>>> colors
[['purple', '0.15'], ['indigo', '0.25'], ['red', '0.3'], ['blue', '0.05'], ['green', '0.25']]

Neat!

What if we want to use the CSV writer to create CSV data in a string?

Unfortunately CSV writer needs a file object to write to. Fortunately, we can make a file-like object in Python.

>>> import csv
>>> from io import StringIO
>>> colors = [["purple", "0.15"], ["indigo", "0.25"], ["red", "0.3"], ["blue", "0.05"], ["green", "0.25"]]
>>> csv_file = StringIO()
>>> csv_writer = csv.writer(csv_file)
>>> for line in colors:
...     csv_writer.writerow(line)
...
13
13
9
11
12
>>> csv_data = csv_file.getvalue()
>>> print(csv_data)  
purple,0.15
indigo,0.25
red,0.3
blue,0.05
green,0.25

>>>

Success! We have tricked Python into writing our CSV data into a file which isn’t really a file but is actually an in-memory file-like object.

StringIO objects support lots of methods things that file objects support:

>>> from io import StringIO
>>> fake_file = StringIO("hello")
>>> fake_file.read()
'hello'
>>> fake_file.write(' world')
6
>>> fake_file.seek(0)
0
>>> fake_file.read()
'hello world'
>>> fake_file.close()
>>> fake_file.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file
>>> fake_file = StringIO("line 1\nline 2\n line 3")
>>> fake_file.readline()
'line 1\n'
>>> fake_file.readline()
'line 2\n'

StringIO objects are basically secret agent in-memory strings, pretending to be files.

Standard Input/Output

We’ve learned that you can print to a file. We can also print to a StringIO object. It works just like a file this way.

>>> my_file = StringIO()
>>> print("hello world!", file=my_file)
>>> my_file.getvalue()
'hello world!\n'

What “file” does print write to by default? The answer is sys.stdout.

>>> print("hello world")
hello world
>>> import sys
>>> print("hello world", file=sys.stdout)
hello world

There are three streams our program has to work with, two of them writable and one of them readable.

There’s sys.stdin which we can read from:

>>> import sys
>>> sys.stdin.readline()  
hello (we're typing this right now)
"hello (we're typing this right now)\n"

This is the standard input stream which is what our program uses to get input from the user or from files or streams that are piped into it from the command line.

There’s sys.stdout which we can write to:

>>> sys.stdout.write('hello\n')
hello
6

This is the standard output stream which our program writes output to by default.

There’s also sys.stderr which we can write to:

>>> sys.stderr.write('hello\n')  
hello
6

This is the standard error stream which our program should write errors to. This can be useful when we have an error that we want to print that shouldn’t be put in standard output in case we’re actually using the output for something, like piping it to a file.

What would happen if we tried to write to stdin? Or read from stdout?

>>> import sys
>>> sys.stdin.write("hello world")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
io.UnsupportedOperation: not writable
>>> sys.stdout.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
io.UnsupportedOperation: not readable

We can’t write to standard input and we can’t read from standard output or standard error.

We can determine whether files and file-like objects are readable or writable by checking their mode like so:

>>> sys.stdin.mode
'r'
>>> sys.stdout.mode
'w'
>>> sys.stderr.mode
'w'

Files and file-like objects also have a readable and writable method that we can use to determine whether we can read from or write to a file.

>>> sys.stdin.readable()
True
>>> sys.stdin.writable()
False
>>> sys.stdout.readable()
False
>>> sys.stdout.writable()
True
>>> sys.stderr.readable()
False
>>> sys.stderr.writable()
True

These methods work on other file objects too:

>>> from io import StringIO
>>> fake_file = StringIO()
>>> fake_file.readable()
True
>>> fake_file.writable()
True

What makes a file?

>>> import sys
>>> import io
>>> fake_file = io.StringIO()
>>> my_file = open('greetings.py')
>>> isinstance(my_file, io.TextIOBase)
True
>>> isinstance(fake_file, io.TextIOBase)
True
>>> isinstance(sys.stdout, io.TextIOBase)
True

Files we open from the file system, standard input and output streams, and StringIO objects all inherit from the io.TextIOBase class.

Let’s see what’s in this class:

>>> help(io.TextIOBase)  

So at this point you might assume that for an object to act like a file, it needs to inherit from io.TextIOBase. This is incorrect.

Let’s make a class that implements the bare minimum needed for Python’s print function to accept it as a file.

class FakeFile:
    """Don't actually use this... StringIO is better."""

    def __init__(self):
        self.contents = ""

    def write(self, data):
        self.contents += data

Now let’s import this and try it out:

>>> from fake_file import FakeFile
>>> fake = FakeFile()
>>> print("hello world", file=fake)
>>> fake.contents
'hello world\n'

Python doesn’t practice type checking and neither should you. In Python, we use duck typing.

Our fake file object even works with csv.writer:

>>> import csv
>>> from fake_file import FakeFile
>>> fake = FakeFile()
>>> csv_writer = csv.writer(fake)
>>> colors = [("purple", "0.15"), ("indigo", "0.25"), ("red", "0.3"), ("blue", "0.05"), ("green", "0.25")]
>>> for line in colors:
...     csv_writer.writerow(line)
...
>>> print(fake.contents)  
purple,0.15
indigo,0.25
red,0.3
blue,0.05
green,0.25

You’ll see some obviously file-related things in there:

We’ve learned about:

  • close: close a file

  • read: read contents from a file

  • write: write to a file

All files have these methods. All files can also be looped over.

Files also have other methods for reading and writing:

  • readable: returns True if the file can be read

  • readline: reads and returns characters up to the next line break

  • writable: returns True if the file can be written to

Files also have methods for changing the current position that we’re reading from in the file:

  • seekable: return True if the file read position can be changed with seek

  • seek: change the current position we’re reading from in the file

  • tell: return the current position we’re reading from (as a number)

  • truncate: resize the file stream to a given size

Other File-Like Objects

Here are a couple other file-like things in the Python standard library.

HTTP responses:

>>> from urllib.request import urlopen
>>> response = urlopen('http://pseudorandom.name')
>>> name = response.read()
>>> name
'Kyle Gee\n'
>>> response = urlopen('http://pseudorandom.name')
>>> name = response.read()
>>> name
'Kimberly Foley\n'
>>> response = urlopen('http://pseudorandom.name')
>>> name = response.read()
>>> name
'Patricia Puig\n'

GZip files:

>>> import gzip
>>> with gzip.open('hello.txt.gz', mode='wt') as gzip_file:
...     gzip_file.write("hello world")
...
11
>>> with open('hello.txt.gz', mode='rb') as gzip_file:  
...     print(gzip_file.read())
...
b'\x1f\x8b\x08\x08\xd2\xa2\x16V\x02\xffhello.txt\x00\xcaH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x00\x00\xff\xff\x03\x00\x85\x11J\r\x0b\x00\x00\x00'

File Exercises

Country Capitals CSV

This is the country_capitals.py exercise in the modules directory. Create the file country_capitals.py in the modules sub-directory of the exercises directory. To test it, run python test.py country_capitals.py from your exercises directory.

Download this country capitals file.

Write a program country_capitals.py that opens the file and extracts country name and capital city from each row, and write a new file to disk in the following format:

country,capital,population
China,Beijing,1330044000
India,New Delhi,1173108018
United States,Washington,310232863

The country rows should be sorted by largest population first.

GZip Fetch

This is the gzip_fetch.py exercise in the modules directory. Create the file gzip_fetch.py in the modules sub-directory of the exercises directory. To test it, run python test.py gzip_fetch.py from your exercises directory.

Write a program that downloads gzipped data from the Internet, extracts it, and saves it on disk all without using a temporary file.

You can use this gzipped response: https://httpbin.org/gzip

Example:

$ python gzip_fetch.py https://httpbin.org/gzip data.json