Generator Expressions
Sum
Python has a number of functions that act on iterables. List comprehensions return an iterable so we can pass list comprehensions straight into a one of these functions.
Let’s try out the sum function:
>>> numbers = [2, 1, 3, 4]
>>> sum(numbers)
10
Let’s sum the squares of all of the numbers in our numbers list. We can use a list comprehension:
>>> sum([n**2 for n in numbers])
30
Cool!
Generators
We can use sum with tuples, sets, and any other iterable:
>>> sum((8, 9, 7))
24
>>> sum({8, 9, 7})
24
Sometimes we don’t really care if a list comprehension returns a list, or some other kind of iterable. When we passed a list comprehension into sum, we only really needed to pass in an iterable, not necessarily a list.
Let’s use a generator expression instead of a list comprehension. We can make a generator expression like this:
>>> squares = (n**2 for n in numbers)
>>> squares
<generator object <genexpr> at 0x...>
We can use a generator in our sum call like this:
>>> sum((n**2 for n in numbers))
30
When our generator expression is already in parentheses, we can leave off the redundant parentheses:
>>> sum(n**2 for n in numbers)
30
Generators don’t work like the other iterables we’ve learned about so far.
>>> squares[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable
You can not use item indexes to get values from generators.
You also can’t ask generators for their length:
>>> len(squares)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()
You can loop over generators:
>>> for s in squares:
... print(s)
...
4
1
9
16
But only once:
>>> for s in squares:
... print(s)
...
Generators are single-use iterables.
You can also think of them as “lazy iterables”, because they don’t create their elements unless the item is requested.
You can get items from a generator by using the built-in next function:
>>> squares = (n**2 for n in numbers)
>>> next(squares)
4
>>> next(squares)
1
>>> next(squares)
9
>>> next(squares)
16
>>> next(squares)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
So why are generator expressions called generator expressions? Why not generator comprehensions? I don’t know.
Calling them generator comprehensions is fine because people will know what you mean.
We’ll go into more detail about generators more in the future.
Iteration Tools
Let’s learn some more built-in functions for working with iterators.
If we want to make sure everything in our list conforms to a certain rule, we can use the all function for that.
>>> all(n > 1 for n in numbers)
False
>>> all(n > 0 for n in numbers)
True
If we want to only make sure that some of our list conforms to a certain rule, we can use the any function.
>>> any(n > 2 for n in numbers)
True
>>> any(n < 1 for n in numbers)
False
If we want to find the smallest or largest value in a collection, we can use min or max:
>>> min(numbers)
1
>>> max(numbers)
4
Generator Expression Exercises
Sum Timestamps
Let’s revisit the sum_timestamps exercise that we looked at in the “Loops” section.
Edit the function sum_timestamps in loops.py so that it takes a list of timestamp strings of “minute:seconds”, and returns a “minute:seconds” timestamp string that represents the total time from the list of timestamps.
Use a generator expression to sum the given timestamps.
Feel free to use the exercises parse_time and format_time from the “Function Exercises” section.
>>> from loops import sum_timestamps
>>> times = ["1:10", "0:12", "4:03", "2:45"]
>>> sum_timestamps(times)
'8:10'
>>> times = ["0:55", "0:55", "0:55", "0:55", "0:55"]
>>> sum_timestamps(times)
'4:35'
>>> sum_timestamps(["0:00"])
'0:00'
Here are the parse_time and format_time functions in case you need them:
def parse_time(time_string):
"""Return total seconds from string of minutes:seconds."""
sections = time_string.split(':')
return int(sections[0]) * 60 + int(sections[1])
def format_time(seconds):
"""Return a minutes:seconds string based on input seconds."""
sections = divmod(seconds, 60)
return f"{sections[0]}:{sections[1]:02d}"
Join
Edit the function join in the generators.py file so it accepts an iterable and an optional sep argument and returns a string with all the objects in the iterable joined together (by the separator) into a new string.
>>> from generators import join
>>> join([3.5, 2.4, 6], sep=", ")
'3.5, 2.4, 6'
>>> join([1, 2, 3, 4], sep="\n")
'1\n2\n3\n4'
>>> print(join([1, 2, 3, 4], sep="\n"))
1
2
3
4
The separator (sep) should default to a space character.
>>> join([1, 2, 3, 4])
'1 2 3 4'
Strip Lines
Edit the function strip_lines in the generators.py file so that it accepts either a list of lines or a file object and returns an iterator of the same lines but with newline characters removed from the end of each line.
>>> from generators import strip_lines
>>> lines = ["line 1\n", "line 2\n"]
>>> list(strip_lines(lines))
['line 1', 'line 2']
For the bonus, make sure your strip_lines function accepts any iterable of lines and returns an iterator instead of a list. The returned iterator should loop over the given lines iterable lazily (it shouldn’t be looped over all at once).
>>> stripped_lines = strip_lines(["line 1\n", "line 2\n"])
>>> next(stripped_lines)
'line 1'
>>> next(stripped_lines)
'line 2'
Hint
Use the strip() method to remove newline characters, and create a generator expression or generator function to return an iterator that processes lines lazily.
All Together
Edit the function all_together in the generators.py file so that it takes any number of iterables and returns a generator that yields their elements sequentially. Use a generator expression to do it.
Example:
>>> from generators import all_together
>>> list(all_together([1, 2], (3, 4), "hello"))
[1, 2, 3, 4, 'h', 'e', 'l', 'l', 'o']
>>> nums = all_together([1, 2], (3, 4))
>>> list(all_together(nums, nums))
[1, 2, 3, 4]
Translate
This is the translate exercise we’ve seen before in dictionaries.py.
The function translate takes a string in one language and transliterates each word into another language, returning the resulting string.
Here is an (over-simplified) example translation dictionary you can use for translating from Spanish to English:
>>> words = {'esta': 'is', 'la': 'the', 'en': 'in', 'gato': 'cat', 'casa': 'house', 'el': 'the'}
Convert the translate function to use a generator comprehension. An example of how this function should work:
>>> from dictionaries import translate
>>> translate("el gato esta en la casa")
'the cat is in the house'
Parse Number Ranges
Edit the parse_ranges in the generators.py file so that it accepts a string containing ranges of numbers and returns a generator of the actual numbers contained in the ranges.
The range numbers are inclusive.
It should work like this:
>>> from generators import parse_ranges
>>> list(parse_ranges('1-2,4-4,8-10'))
[1, 2, 4, 8, 9, 10]
>>> list(parse_ranges('0-0, 4-8, 20-21, 43-45'))
[0, 4, 5, 6, 7, 8, 20, 21, 43, 44, 45]
Find TODOs
This is the todos.py exercise in the modules directory. Create the file todos.py in the modules sub-directory of the exercises directory. To test it, run python test.py todos.py from your exercises directory.
Write a program that prints out every line in a file that contains the text TODO (I add TODO notes in my files to note to-dos I need to handle). Also print the line number before the line. The line numbers should be padded with zeros so that all the printed numbers are 3 digits long.
Example:
If workshop.rst contains:
This is how you make a list::
>>> numbers = [1, 2, 3]
.. TODO explain more about what lists are!
.. TODO add section on slicing
This is how you make a tuple::
>>> numbers = (1, 2, 3)
.. TODO explain more about tuples!
Running:
$ python todo.py workshop.rst
Should print out:
005 .. TODO explain more about what lists are!
007 .. TODO add section on slicing
013 .. TODO explain more about tuples!
Primality
This is the is_prime exercise in ranges.py.
Edit the function is_prime so that it returns True if a number is prime and False otherwise.
Use a generator expression.
Example:
>>> from ranges import is_prime
>>> is_prime(21)
False
>>> is_prime(23)
True
Hint
You might want to use any or all for this.
Sum All
This is the sum_all function in the loops.py file that is in the exercises directory.
Edit the function sum_all so that it accepts a list of lists of numbers and returns the sum of all of the numbers
Use a generator expression.
>>> from loops import sum_all
>>> matrix = [[1, 2, 3], [4, 5, 6]]
>>> sum_all(matrix)
21
>>> sum_all([[0, 1], [4, 2], [3, 1]])
11
Deep Add
This is the deep_add exercise in exception.py.
Write a function deep_add that sums up all values given to it, including summing up the values of any contained collections.
>>> from exception import deep_add
>>> deep_add([1, 2, 3, 4])
10
>>> deep_add([(1, 2), [3, {4, 5}]])
15
Primes Over
Edit the function first_prime_over in the generators.py file so that it returns the first prime number over a given number.
Example:
>>> from generators import first_prime_over
>>> first_prime_over(1000000)
1000003
Head
This is the head exercise in generators.py.
Make a head function that lazily gives the first n items of a given iterable.
>>> numbers = [2, 1, 3, 4, 7, 11, 18, 29]
>>> squares = (n**2 for n in numbers)
>>> list(head(numbers, n=2))
[2, 1]
Note that your head function should not retrieve more than n items from the iterable.
For example here we partially-consume the squares generator object using head and then continue consuming it:
>>> numbers = [2, 1, 3, 4, 7, 11, 18, 29]
>>> squares = (n**2 for n in numbers)
>>> list(head(squares, 5))
[4, 1, 9, 16, 49]
>>> list(squares)
[121, 324, 841]
Total Air Travel
This is the total_air_travel.py exercise in the modules directory.
Note
If you’ve already solved this exercise, try to refactor it to use a generator expression.
Create the file total_air_travel.py in the modules sub-directory of the exercises directory. To test it, run python test.py total_air_travel.py from your exercises directory.
To test this manually, use the file expenses.csv.
Given a CSV file containing expenses by category, I’d like you to calculate how much money was spent on the category “Air Travel”.
The file is formatted like this:
Date,Merchant,Cost,Category
1/05/2017,American Airlines,519.25,Air Travel
1/12/2017,Southwest Airlines,298.90,Air Travel
1/17/2017,Mailchimp,19.80,Software
2/01/2017,Zapier,15.00,Software
2/05/2017,Lyft,24.24,Ground Transport
2/06/2017,Hattie Bs,18.13,Food
2/06/2017,Lyft,15.65,Ground Transport
The columns are:
The date of the expense
Merchant
Amount paid
Category
Your program should accept a single CSV file as input and it should print a floating point number representing the sum of the amounts paid for all “Air Travel” category expenses.
To test it manually, cd to the modules directory and run:
$ python total_air_travel.py expenses.csv
818.15