itertools
chain
The chain function is useful if you have more than one container of items and want to iterate over all of them without nested list comprehensions.
Let’s say we have two lists and we want to print out all the characters. We can use chain for this:
>>> from itertools import chain
>>> numbers = [1, 2, 3, 4, 5]
>>> letters = ['x', 'y', 'z']
>>> for item in chain(numbers, letters):
... print(item)
...
1
2
3
4
5
x
y
z
Of course if it’s just lists we’re working with we could just use +
>>> for item in (numbers + letters):
... print(item)
...
1
2
3
4
5
x
y
z
The chain function allows us to chain together any number and any type of iterables.
So chain is the more generic version of list concatenation.
count
The count function produces continuous incremental numbers.
If you want to print out every number for the rest of time, do this:
>>> from itertools import count
>>> for x in count():
... print(x)
...
You have to be careful with count, or it will go on forever. It is important to make sure that there is some way of stopping the loop.
For example, if you use it only with filter or something like that, it still will never quit, even if it never returns anything, because it just keeps looking for something that satisfies the filter.
repeat
The repeat function returns an iterator that produces the same value. It keeps going forever, unless the optional times argument is provided or the sequence is ended some other way.
from itertools import repeat
for i, string in enumerate(repeat('over-and-over', 5)):
print(i, string)
Output:
$ python itertools_repeat_count.py
0 over-and-over
1 over-and-over
2 over-and-over
3 over-and-over
4 over-and-over
This is basically the iterable version of list multiplication:
>>> ['over-and-over'] * 5
['over-and-over', 'over-and-over', 'over-and-over', 'over-and-over', 'over-and-over']
cycle
Produces an iterator that repeatedly cycles the contents of the arguments. Another that goes on forever unless you’re careful to stop it.
>>> from itertools import cycle
>>> result = cycle(['A', 'B', 'C'])
>>> next(result)
'A'
>>> next(result)
'B'
>>> next(result)
'C'
>>> next(result)
'A'
>>> next(result)
'B'
Because strings are iterable, it will cycle through the letters of a single input string:
>>> from itertools import cycle
>>> result = cycle('ABC')
>>> next(result)
'A'
>>> next(result)
'B'
>>> next(result)
'C'
>>> next(result)
'A'
>>> next(result)
'B'
>>> next(result)
'C'
>>> next(result)
'A'
islice
If we wanted to take the first 10 things from a sequence (like a list or a string), we might try to do this:
>>> numbers = range(100)
>>> numbers[:10]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
What if we want to take the first 10 things from an any iterable, say a generator?
>>> cubes = (n**3 for n in count())
>>> cubes[:10]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable
Generators are not indexable and therefore not sliceable.
Any clever ideas for getting the first 10 items from any iterable?
What about this?
>>> [c for i, c in enumerate(cubes) if i < 10]
When we try to run this, it doesn’t stop. Why not?
It doesn’t stop because that if statement is checked for every item. It doesn’t stop checking once we find an item that is greater than 10… it just keeps going until the end of the iterable. But count gives us an infinite iterable, so it will never stop.
When we need to slice an iterator or any generic iterable that might not be a sequence, we can use islice:
>>> cubes = (n**3 for n in count())
>>> list(islice(cubes, 10))
[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]
Let’s find the first 10 cubes that are also perfect squares:
>>> import math
>>> def is_perfect_square(n):
... return math.sqrt(n).is_integer()
...
>>> cubes = (n**3 for n in count())
>>> perfect_square_cubes = (n for n in cubes if is_perfect_square(n))
>>> list(islice(perfect_square_cubes, 10))
[0, 1, 64, 729, 4096, 15625, 46656, 117649, 262144, 531441]
takewhile
takewhile returns a generator that yields items from the given iterable until a given predicate test fails.
There is also a similar function dropwhile that will ignore items matching its predicate starting at the beginning. Once the predicate fails, it stops filtering and returns all the remaining input.
Using takewhile and dropwhile:
>>> from itertools import takewhile
>>> numbers = range(10, 20)
>>> mean = sum(numbers) / len(numbers)
>>> list(takewhile(lambda n: n < mean, numbers))
[10, 11, 12, 13, 14]
>>> from itertools import dropwhile
>>> numbers = [0, 0, 1, 2]
>>> list(dropwhile(lambda n: not n, numbers))
[1, 2]
itertools Exercises
Head
This is the head exercise in generators.py.
If you have already solved this exercise with a generator expression, try refactoring it to use something from the itertools module.
Make a head function that lazily gives the first n items of a given iterable.
>>> list(head([1, 2, 3, 4, 5], n=2))
[1, 2]
>>> first_4 = head([1, 2, 3, 4, 5], n=4)
>>> list(zip(first_4, first_4))
[(1, 2), (3, 4)]
All Together
This is the all_together exercise in generators.py.
Write a function that takes any number of iterables and returns an iterator which will loop over each of them in order.
Example:
>>> from generators import all_together
>>> list(all_together([1, 2], (3, 4), "hello"))
[1, 2, 3, 4, 'h', 'e', 'l', 'l', 'o']
>>> nums = all_together([1, 2], (3, 4))
>>> list(all_together(nums, nums))
[1, 2, 3, 4]
lstrip
This is the lstrip exercise in iteration.py.
Edit the lstrip function so that it accepts an iterable and an object and returns an iterator that returns the items from the original iterable except any item at the beginning of the iterable which is equal to the given object should be skipped.
Look through the itertools library for something very useful for this.
Example:
>>> list(lstrip([0, 0, 1, 0, 2, 3], 0))
[1, 0, 2, 3]
>>> list(lstrip(' hello ', ' '))
['h', 'e', 'l', 'l', 'o', ' ']
>>> x = lstrip([0, 1, 2, 3], 0)
>>> list(x)
[1, 2, 3]
>>> list(x)
[]
Big Primes
This is the get_primes_over exercise in generators.py.
Write a function that returns an iterator that will result in a specified number of prime numbers greater than 999,999. The input to the function is the number of primes that will be generated by the iterator.
Try doing this without using while loops or for loops.
You can use this function to determine whether a number is prime:
def is_prime(candidate):
"""Return True if candidate number is prime."""
for n in range(2, candidate):
if candidate % n == 0:
return False
return True
Total Length
This is the total_length exercise in iteration.py.
Make a function total length that should calculate the total length of all given iterables.
Example:
>>> total_length([1, 2, 3])
3
>>> total_length()
0
>>> total_length([1, 2, 3], [4, 5], iter([6, 7]))
7
Compact
This is the compact exercise in iteration.py.
Write a function compact that takes an iterable and lazily returns the elements of the iterable, with any adjacent duplicates removed.
Hint
Try using itertools.groupby with a generator expression to solve it.
It should work like this:
>>> from generators import compact
>>> list(compact([1, 1, 1]))
[1]
>>> list(compact([1, 1, 2, 2, 3, 2]))
[1, 2, 3, 2]
>>> list(compact([]))
[]
>>> c = compact(n**2 for n in [1, 2, 2])
>>> iter(c) is c
True
>>> list(c)
[1, 4]
Stop On
This is the stop_on exercise in iteration.py.
Write a generator function stop_on that accepts an iterable and a value and yields from the given iterable repeatedly until the given value is reached.
Example:
>>> list(stop_on([1, 2, 3], 3))
[1, 2]
>>> next(stop_on([1, 2, 3], 1), 0)
0
Random Number
This is the random_number_generator exercise in iteration.py
Make an inexhaustible generator that always provides 4 as the next number.
Hint
Try using a tool from itertools to solve this problem.
Example:
>>> number_generator = random_number_generator()
>>> next(number_generator)
4
>>> next(number_generator)
4
>>> next(number_generator)
4
>>> iter(number_generator) is number_generator
True
Running Mean
This is the running_mean exercise in iteration.py.
Create a running_mean function that takes an iterable and yields the current running mean.
Try doing this without a for loop (using only itertools function and generator expressions).
For example:
>>> numbers = [8, 4, 3, 1, 3, 5]
>>> list(running_mean(numbers))
[8.0, 6.0, 5.0, 4.0, 3.8, 4.0]