Generator Expressions

Sum

Python has a number of functions that act on iterables. List comprehensions return an iterable so we can pass list comprehensions straight into a one of these functions.

Let’s try out the sum function:

>>> numbers = [2, 1, 3, 4]
>>> sum(numbers)
10

Let’s sum the squares of all of the numbers in our numbers list. We can use a list comprehension:

>>> sum([n**2 for n in numbers])
30

Cool!

Generators

We can use sum with tuples, sets, and any other iterable:

>>> sum((8, 9, 7))
24
>>> sum({8, 9, 7})
24

Sometimes we don’t really care if a list comprehension returns a list, or some other kind of iterable. When we passed a list comprehension into sum, we only really needed to pass in an iterable, not necessarily a list.

Let’s use a generator expression instead of a list comprehension. We can make a generator expression like this:

>>> squares = (n**2 for n in numbers)
>>> squares
<generator object <genexpr> at 0x...>

We can use a generator in our sum call like this:

>>> sum((n**2 for n in numbers))
30

When our generator expression is already in parentheses, we can leave off the redundant parentheses:

>>> sum(n**2 for n in numbers)
30

Generators don’t work like the other iterables we’ve learned about so far.

>>> squares[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable

You can not use item indexes to get values from generators.

You also can’t ask generators for their length:

>>> len(squares)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()

You can loop over generators:

>>> for s in squares:
...     print(s)
...
4
1
9
16

But only once:

>>> for s in squares:
...     print(s)
...

Generators are single-use iterables. You can also think of them as “lazy iterables”, because they don’t create their elements unless the item is requested. You can get items from a generator by using the built-in next function:

>>> squares = (n**2 for n in numbers)
>>> next(squares)
4
>>> next(squares)
1
>>> next(squares)
9
>>> next(squares)
16
>>> next(squares)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

So why are generator expressions called generator expressions? Why not generator comprehensions? I don’t know.

Calling them generator comprehensions is fine because people will know what you mean.

We’ll go into more detail about generators more in the future.

Iteration Tools

Let’s learn some more built-in functions for working with iterators.

If we want to make sure everything in our list conforms to a certain rule, we can use the all function for that.

>>> all(n > 1 for n in numbers)
False
>>> all(n > 0 for n in numbers)
True

If we want to only make sure that some of our list conforms to a certain rule, we can use the any function.

>>> any(n > 2 for n in numbers)
True
>>> any(n < 1 for n in numbers)
False

If we want to find the smallest or largest value in a collection, we can use min or max:

>>> min(numbers)
1
>>> max(numbers)
4

Generator Expression Exercises

Sum Timestamps

Let’s revisit the sum_timestamps exercise that we looked at in the “Loops” section.

Edit the function sum_timestamps in loops.py so that it takes a list of timestamp strings of “minute:seconds”, and returns a “minute:seconds” timestamp string that represents the total time from the list of timestamps.

Use a generator expression to sum the given timestamps.

Feel free to use the exercises parse_time and format_time from the “Function Exercises” section.

>>> from loops import sum_timestamps
>>> times = ["1:10", "0:12", "4:03", "2:45"]
>>> sum_timestamps(times)
'8:10'
>>> times = ["0:55", "0:55", "0:55", "0:55", "0:55"]
>>> sum_timestamps(times)
'4:35'
>>> sum_timestamps(["0:00"])
'0:00'

Here are the parse_time and format_time functions in case you need them:

def parse_time(time_string):
    """Return total seconds from string of minutes:seconds."""
    sections = time_string.split(':')
    return int(sections[0]) * 60 + int(sections[1])

def format_time(seconds):
    """Return a minutes:seconds string based on input seconds."""
    sections = divmod(seconds, 60)
    return f"{sections[0]}:{sections[1]:02d}"

Join

Edit the function join in the generators.py file so it accepts an iterable and an optional sep argument and returns a string with all the objects in the iterable joined together (by the separator) into a new string.

>>> from generators import join
>>> join([3.5, 2.4, 6], sep=", ")
'3.5, 2.4, 6'
>>> join([1, 2, 3, 4], sep="\n")
'1\n2\n3\n4'
>>> print(join([1, 2, 3, 4], sep="\n"))
1
2
3
4

The separator (sep) should default to a space character.

>>> join([1, 2, 3, 4])
'1 2 3 4'

Strip Lines

Edit the function strip_lines in the generators.py file so that it accepts either a list of lines or a file object and returns an iterator of the same lines but with newline characters removed from the end of each line.

>>> from generators import strip_lines
>>> lines = ["line 1\n", "line 2\n"]
>>> list(strip_lines(lines))
['line 1', 'line 2']

For the bonus, make sure your strip_lines function accepts any iterable of lines and returns an iterator instead of a list. The returned iterator should loop over the given lines iterable lazily (it shouldn’t be looped over all at once).

>>> stripped_lines = strip_lines(["line 1\n", "line 2\n"])
>>> next(stripped_lines)
'line 1'
>>> next(stripped_lines)
'line 2'

Hint

Use the strip() method to remove newline characters, and create a generator expression or generator function to return an iterator that processes lines lazily.

All Together

Edit the function all_together in the generators.py file so that it takes any number of iterables and returns a generator that yields their elements sequentially. Use a generator expression to do it.

Example:

>>> from generators import all_together
>>> list(all_together([1, 2], (3, 4), "hello"))
[1, 2, 3, 4, 'h', 'e', 'l', 'l', 'o']
>>> nums = all_together([1, 2], (3, 4))
>>> list(all_together(nums, nums))
[1, 2, 3, 4]

Translate

This is the translate exercise we’ve seen before in dictionaries.py.

The function translate takes a string in one language and transliterates each word into another language, returning the resulting string.

Here is an (over-simplified) example translation dictionary you can use for translating from Spanish to English:

>>> words = {'esta': 'is', 'la': 'the', 'en': 'in', 'gato': 'cat', 'casa': 'house', 'el': 'the'}

Convert the translate function to use a generator comprehension. An example of how this function should work:

>>> from dictionaries import translate
>>> translate("el gato esta en la casa")
'the cat is in the house'

Parse Number Ranges

Edit the parse_ranges in the generators.py file so that it accepts a string containing ranges of numbers and returns a generator of the actual numbers contained in the ranges. The range numbers are inclusive.

It should work like this:

>>> from generators import parse_ranges
>>> list(parse_ranges('1-2,4-4,8-10'))
[1, 2, 4, 8, 9, 10]
>>> list(parse_ranges('0-0, 4-8, 20-21, 43-45'))
[0, 4, 5, 6, 7, 8, 20, 21, 43, 44, 45]

Find TODOs

This is the todos.py exercise in the modules directory. Create the file todos.py in the modules sub-directory of the exercises directory. To test it, run python test.py todos.py from your exercises directory.

Write a program that prints out every line in a file that contains the text TODO (I add TODO notes in my files to note to-dos I need to handle). Also print the line number before the line. The line numbers should be padded with zeros so that all the printed numbers are 3 digits long.

Example:

If workshop.rst contains:

This is how you make a list::

    >>> numbers = [1, 2, 3]

.. TODO explain more about what lists are!

.. TODO add section on slicing

This is how you make a tuple::

    >>> numbers = (1, 2, 3)

.. TODO explain more about tuples!

Running:

$ python todo.py workshop.rst

Should print out:

005 .. TODO explain more about what lists are!
007 .. TODO add section on slicing
013 .. TODO explain more about tuples!

Primality

This is the is_prime exercise in ranges.py.

Edit the function is_prime so that it returns True if a number is prime and False otherwise. Use a generator expression.

Example:

>>> from ranges import is_prime
>>> is_prime(21)
False
>>> is_prime(23)
True

Hint

You might want to use any or all for this.

Sum All

This is the sum_all function in the loops.py file that is in the exercises directory.

Edit the function sum_all so that it accepts a list of lists of numbers and returns the sum of all of the numbers Use a generator expression.

>>> from loops import sum_all
>>> matrix = [[1, 2, 3], [4, 5, 6]]
>>> sum_all(matrix)
21
>>> sum_all([[0, 1], [4, 2], [3, 1]])
11

Deep Add

This is the deep_add exercise in exception.py.

Write a function deep_add that sums up all values given to it, including summing up the values of any contained collections.

>>> from exception import deep_add
>>> deep_add([1, 2, 3, 4])
10
>>> deep_add([(1, 2), [3, {4, 5}]])
15

Primes Over

Edit the function first_prime_over in the generators.py file so that it returns the first prime number over a given number.

Example:

>>> from generators import first_prime_over
>>> first_prime_over(1000000)
1000003

Total Air Travel

This is the total_air_travel.py exercise in the modules directory.

Note

If you’ve already solved this exercise, try to refactor it to use a generator expression.

Create the file total_air_travel.py in the modules sub-directory of the exercises directory. To test it, run python test.py total_air_travel.py from your exercises directory.

To test this manually, use the file expenses.csv.

Given a CSV file containing expenses by category, I’d like you to calculate how much money was spent on the category “Air Travel”.

The file is formatted like this:

Date,Merchant,Cost,Category
1/05/2017,American Airlines,519.25,Air Travel
1/12/2017,Southwest Airlines,298.90,Air Travel
1/17/2017,Mailchimp,19.80,Software
2/01/2017,Zapier,15.00,Software
2/05/2017,Lyft,24.24,Ground Transport
2/06/2017,Hattie Bs,18.13,Food
2/06/2017,Lyft,15.65,Ground Transport

The columns are:

  1. The date of the expense

  2. Merchant

  3. Amount paid

  4. Category

Your program should accept a single CSV file as input and it should print a floating point number representing the sum of the amounts paid for all “Air Travel” category expenses.

To test it manually, cd to the modules directory and run:

$ python total_air_travel.py expenses.csv
818.15