Generator Expressions

Sum

Python has a number of functions that act on iterables. List comprehensions return an iterable so we can pass list comprehensions straight into a one of these functions.

Let’s try out the sum function:

>>> numbers = [2, 1, 3, 4]
>>> sum(numbers)
10

Let’s sum the squares of all of the numbers in our numbers list. We can use a list comprehension:

>>> sum([n**2 for n in numbers])
30

Cool!

Generators

We can use sum with tuples, sets, and any other iterable:

>>> sum((8, 9, 7))
24
>>> sum({8, 9, 7})
24

Sometimes we don’t really care if a list comprehension returns a list, or some other kind of iterable. When we passed a list comprehension into sum, we only really needed to pass in an iterable, not necessarily a list.

Let’s use a generator expression instead of a list comprehension. We can make a generator expression like this:

>>> squares = (n**2 for n in numbers)
>>> squares
<generator object <genexpr> at 0x...>

We can use a generator in our sum call like this:

>>> sum((n**2 for n in numbers))
30

When our generator expression is already in parentheses, we can leave off the redundant parentheses:

>>> sum(n**2 for n in numbers)
30

Generators don’t work like the other iterables we’ve learned about so far.

>>> squares[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable

You can not use item indexes to get values from generators.

You also can’t ask generators for their length:

>>> len(squares)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()

You can loop over generators:

>>> for s in squares:
...     print(s)
...
4
1
9
16

But only once:

>>> for s in squares:
...     print(s)
...

Generators are single-use iterables. You can also think of them as “lazy iterables”, because they don’t create their elements unless the item is requested. You can get items from a generator by using the built-in next function:

>>> squares = (n**2 for n in numbers)
>>> next(squares)
4
>>> next(squares)
1
>>> next(squares)
9
>>> next(squares)
16
>>> next(squares)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

So why are generator expressions called generator expressions? Why not generator comprehensions? I don’t know.

Calling them generator comprehensions is fine because people will know what you mean.

We’ll go into more detail about generators more in the future.

Iteration Tools

Let’s learn some more built-in functions for working with iterators.

If we want to make sure everything in our list conforms to a certain rule, we can use the all function for that.

>>> all(n > 1 for n in numbers)
False
>>> all(n > 0 for n in numbers)
True

If we want to only make sure that some of our list conforms to a certain rule, we can use the any function.

>>> any(n > 2 for n in numbers)
True
>>> any(n < 1 for n in numbers)
False

If we want to find the smallest or largest value in a collection, we can use min or max:

>>> min(numbers)
1
>>> max(numbers)
4

Generator Expression Exercises

Sum Timestamps

Let’s revisit the sum_timestamps exercise that we looked at in the “Loops” section.

Edit the function sum_timestamps in loops.py so that it takes a list of timestamp strings of “minute:seconds”, and returns a “minute:seconds” timestamp string that represents the total time from the list of timestamps.

Use a generator expression to sum the given timestamps.

Feel free to use the exercises parse_time and format_time from the “Function Exercises” section.

>>> from loops import sum_timestamps
>>> times = ["1:10", "0:12", "4:03", "2:45"]
>>> sum_timestamps(times)
'8:10'
>>> times = ["0:55", "0:55", "0:55", "0:55", "0:55"]
>>> sum_timestamps(times)
'4:35'
>>> sum_timestamps(["0:00"])
'0:00'

Here are the parse_time and format_time functions in case you need them:

def parse_time(time_string):
    """Return total seconds from string of minutes:seconds."""
    sections = time_string.split(':')
    return int(sections[0]) * 60 + int(sections[1])

def format_time(seconds):
    """Return a minutes:seconds string based on input seconds."""
    sections = divmod(seconds, 60)
    return f"{sections[0]}:{sections[1]:02d}"

Join

Edit the function join in the generators.py file so it accepts an iterable and an optional sep argument and returns a string with all the objects in the iterable joined together (by the separator) into a new string.

>>> from generators import join
>>> join([3.5, 2.4, 6], sep=", ")
'3.5, 2.4, 6'
>>> join([1, 2, 3, 4], sep="\n")
'1\n2\n3\n4'
>>> print(join([1, 2, 3, 4], sep="\n"))
1
2
3
4

The separator (sep) should default to a space character.

>>> join([1, 2, 3, 4])
'1 2 3 4'

Strip Lines

Edit the function strip_lines in the generators.py file so that it accepts either a list of lines or a file object and returns an iterator of the same lines but with newline characters removed from the end of each line.

>>> from generators import strip_lines
>>> lines = ["line 1\n", "line 2\n"]
>>> list(strip_lines(lines))
['line 1', 'line 2']

For the bonus, make sure your strip_lines function accepts any iterable of lines and returns an iterator instead of a list. The returned iterator should loop over the given lines iterable lazily (it shouldn’t be looped over all at once).

>>> stripped_lines = strip_lines(["line 1\n", "line 2\n"])
>>> next(stripped_lines)
'line 1'
>>> next(stripped_lines)
'line 2'

Hint

Use the strip() method to remove newline characters, and create a generator expression or generator function to return an iterator that processes lines lazily.

All Together

Edit the function all_together in the generators.py file so that it takes any number of iterables and returns a generator that yields their elements sequentially. Use a generator expression to do it.

Example:

>>> from generators import all_together
>>> list(all_together([1, 2], (3, 4), "hello"))
[1, 2, 3, 4, 'h', 'e', 'l', 'l', 'o']
>>> nums = all_together([1, 2], (3, 4))
>>> list(all_together(nums, nums))
[1, 2, 3, 4]

Translate

This is the translate exercise we’ve seen before in dictionaries.py.

The function translate takes a string in one language and transliterates each word into another language, returning the resulting string.

Here is an (over-simplified) example translation dictionary you can use for translating from Spanish to English:

>>> words = {'esta': 'is', 'la': 'the', 'en': 'in', 'gato': 'cat', 'casa': 'house', 'el': 'the'}

Convert the translate function to use a generator comprehension. An example of how this function should work:

>>> from dictionaries import translate
>>> translate("el gato esta en la casa")
'the cat is in the house'

Parse Number Ranges

Edit the parse_ranges in the generators.py file so that it accepts a string containing ranges of numbers and returns a generator of the actual numbers contained in the ranges. The range numbers are inclusive.

It should work like this:

>>> from generators import parse_ranges
>>> list(parse_ranges('1-2,4-4,8-10'))
[1, 2, 4, 8, 9, 10]
>>> list(parse_ranges('0-0, 4-8, 20-21, 43-45'))
[0, 4, 5, 6, 7, 8, 20, 21, 43, 44, 45]

Find TODOs

This is the todos.py exercise in the modules directory. Create the file todos.py in the modules sub-directory of the exercises directory. To test it, run python test.py todos.py from your exercises directory.

Write a program that prints out every line in a file that contains the text TODO (I add TODO notes in my files to note to-dos I need to handle). Also print the line number before the line. The line numbers should be padded with zeros so that all the printed numbers are 3 digits long.

Example:

If workshop.rst contains:

This is how you make a list::

    >>> numbers = [1, 2, 3]

.. TODO explain more about what lists are!

.. TODO add section on slicing

This is how you make a tuple::

    >>> numbers = (1, 2, 3)

.. TODO explain more about tuples!

Running:

$ python todo.py workshop.rst

Should print out:

.. TODO explain more about what lists are!
.. TODO add section on slicing
.. TODO explain more about tuples!

Primality

This is the is_prime exercise in ranges.py.

Edit the function is_prime so that it returns True if a number is prime and False otherwise. Use a generator expression.

Example:

>>> from ranges import is_prime
>>> is_prime(21)
False
>>> is_prime(23)
True

Hint

You might want to use any or all for this.

Sum All

This is the sum_all function in the loops.py file that is in the exercises directory.

Edit the function sum_all so that it accepts a list of lists of numbers and returns the sum of all of the numbers Use a generator expression.

>>> from loops import sum_all
>>> matrix = [[1, 2, 3], [4, 5, 6]]
>>> sum_all(matrix)
21
>>> sum_all([[0, 1], [4, 2], [3, 1]])
11

Deep Add

This is the deep_add exercise in exception.py.

Write a function deep_add that sums up all values given to it, including summing up the values of any contained collections.

>>> from exception import deep_add
>>> deep_add([1, 2, 3, 4])
10
>>> deep_add([(1, 2), [3, {4, 5}]])
15

Primes Over

Edit the function first_prime_over in the generators.py file so that it returns the first prime number over a given number.

Example:

>>> from generators import first_prime_over
>>> first_prime_over(1000000)
1000003

Head

This is the head exercise in generators.py.

Make a head function that lazily gives the first n items of a given iterable.

>>> numbers = [2, 1, 3, 4, 7, 11, 18, 29]
>>> squares = (n**2 for n in numbers)
>>> list(head(numbers, n=2))
[2, 1]

Note that your head function should not retrieve more than n items from the iterable. For example here we partially-consume the squares generator object using head and then continue consuming it:

>>> numbers = [2, 1, 3, 4, 7, 11, 18, 29]
>>> squares = (n**2 for n in numbers)
>>> list(head(squares, 5))
[4, 1, 9, 16, 49]
>>> list(squares)
[121, 324, 841]

Total Air Travel

This is the total_air_travel.py exercise in the modules directory.

Note

If you’ve already solved this exercise, try to refactor it to use a generator expression.

Create the file total_air_travel.py in the modules sub-directory of the exercises directory. To test it, run python test.py total_air_travel.py from your exercises directory.

To test this manually, use the file expenses.csv.

Given a CSV file containing expenses by category, I’d like you to calculate how much money was spent on the category “Air Travel”.

The file is formatted like this:

Date,Merchant,Cost,Category
1/05/2017,American Airlines,519.25,Air Travel
1/12/2017,Southwest Airlines,298.90,Air Travel
1/17/2017,Mailchimp,19.80,Software
2/01/2017,Zapier,15.00,Software
2/05/2017,Lyft,24.24,Ground Transport
2/06/2017,Hattie Bs,18.13,Food
2/06/2017,Lyft,15.65,Ground Transport

The columns are:

The date of the expense
Merchant
Amount paid
Category

Your program should accept a single CSV file as input and it should print a floating point number representing the sum of the amounts paid for all “Air Travel” category expenses.

To test it manually, cd to the modules directory and run:

$ python total_air_travel.py expenses.csv
818.15