Comments:"A look at some of Python's useful itertools"
URL:http://naiquevin.github.io/a-look-at-some-of-pythons-useful-itertools.html
Couple of months back I enrolled for theFunctional Programming Principles in Scala course taught by Martin Odersky on Coursera. While solving one of the early assignments, I remember searching for "Scala equivalent to Python's range". By the end of the course, I was searching for python equivalents for some of the methods and operations on scala collections. There is no doubt that learning new and different languages help you as a programmer in more than one way. Ok, I am digressing, but what that search led me to was the insanely usefulitertools module and it left me wondering - why didn't I take a real look at it earlier!
In this article I am going to show some of the must-know itertools that will make your everyday code more memory efficient, elegant and concise. Instead of focusing on one function/class from the module at a time, it would be helpful to see some real world (and not so real world) use cases and examples of itertools.
But before that, I should mention that I have tried the examples on Python 2.7.3 and would recommend you use 2.7.x to follow along, although they should mostly work on Python 3.x too.
Taking "lazy" to the next level
In anearlier blog post, I had covered generators and how lazy evaluation can be used to write memory efficient code. Itertools let you do more with the lazily evaluated objects.
You may already know that the map
and filter
BIFs can accept
not just a list but any iterator in general, which means we can also
pass them a generator. But doing so doesn't give us truly lazy
behaviour. Let's see an example to understand this.
defis_even(x):print'is_even called for %d'%(x,)returnx%2==0defeven():returnfilter(is_even,(iforiinxrange(20)))
The above code defines a function is_even
to check if an integer
is even. In even
, we pass is_even
as a predicate to filter
thus returning all even integers between 0 to 20. To get first four of
such integers, we can use the slice operator.
>>> even()[:4] is_even called for 0 is_even called for 1 is_even called for 2 is_even called for 3 is_even called for 4 is_even called for 5 is_even called for 6 is_even called for 7 is_even called for 8 is_even called for 9 is_even called for 10 is_even called for 11 is_even called for 12 is_even called for 13 is_even called for 14 is_even called for 15 is_even called for 16 is_even called for 17 is_even called for 18 is_even called for 19>>> [0, 2, 4, 6]
While the result that we obtained is correct, our code is doing a lot
of unnecessary work. We can see this from the statements that are
getting printed every time is_even
is called (20
times). Wait.. but we just need first 4 items right, why is it even
getting called that many times? The problem is that although we
started with a generator, filter
needs to run each item through
the predicate and so the generator ends up getting consumed. Finally,
we get a list from which we slice out the first 4 items.
>>> type(even())<type 'list'>
How can we do better? The idea is, we need a way to delay the
application of filter on the generator. itertools.ifilter
does
exactly that by returning a lazy object instead of a list. Items will
be filtered whenever some other function or a for loop consumes
them. Lets define another function lazy_even
,
importitertoolsdeflazy_even():returnitertools.ifilter(is_even,(iforiinxrange(20)))
>>> nums = lazy_even()>>> type(nums)<type 'itertools.ifilter'>
Ok, this gives us a lazy object. But the problem now is, we still need
a way to slice out the first 4 items without converting the entire
thing to a list. Itertools provides islice
for this which takes
the iterator, a start index and a stop index.
>>> first_four = itertools.islice(nums, 0, 4)>>> type(first_four)<type 'itertools.islice'> for i in first_four: ... print i ... is_even called for 0 0 is_even called for 1 is_even called for 2 2 is_even called for 3 is_even called for 4 4 is_even called for 5 is_even called for 6 6
As you can see, now is_even
is called only while it's required.
Counting infinitely
Let's take another example. This time we need to find out 3 smallest
numbers that are greater than 1000 and powers of 2. In the eariler
blog post on generators, there was an example of counting infinitely
using the yield
keyword. Here we need to do something similar to
count integers and test whether or not each one is a power of 2. The
catch is that since we need to return 3 such numbers, we don't know
when to stop counting. Ok, probably in this case it's easy to
pre-calculate or guess, but what if the predicate function is a bit
more complex for eg. a check for primality? With itertools, we can use
the same technique as in the previous example. As a bonus, the module
comes with a count
function so we don't need to write our own.
>>> import math>>> from itertools import count, islice, ifilter>>> def is_pow_two(x): ... ln = math.log(x, 2) ... return math.floor(ln) == ln>>> list(islice(ifilter(is_pow_two, count(1000)), 0, 3)) [1024, 2048, 4096]
is_pow_two
will be called no more than 4096 times = win!
Grouping things in style
If you often use Python to write scripts for extracting data,
eg. scraping web pages or parsing log files, then you may have come
across this common pattern -- reading content from some source,
extracting useful data out of it and saving the extracted data in some
other form (or directly using it). And quite often, we need to group
data into parts say for eg. parsing web server access logs and group
incoming requests by their response status
codes. itertools.groupby
makes this very easy. It takes an
iterable and a key
function. The key
function is used to group
items with consecutively similar key values together. Here is an
example where given a list of integers we separate them into "even"
and "odd" groups.
fromitertoolsimportgroupbydefgroupby_even_odd(items):f=lambdax:'even'ifx%2==0else'odd'gb=groupby(items,f)fork,itemsingb:print'%s: %s'%(k,','.join(map(str,items)))
>>> groupby_even_odd([1, 3, 4, 5, 6, 8, 9, 11]) odd: 1,3 even: 4 odd: 5 even: 6,8 odd: 9,11
But something is strange, isn't it? The integers are indeed grouped but there are many "even" and "odd" groups. The reason behind this is, it only groups consecutive items together. To get around this, we can simply provide it a sorted iterable.
defgroupby_even_odd(items):f=lambdax:'even'ifx%2==0else'odd'gb=groupby(sorted(items,key=f),f)fork,itemsingb:print'%s: %s'%(k,','.join(map(str,items)))
>>> groupby_even_odd([1, 3, 4, 5, 6, 8, 9, 11]) even: 4,6,8 odd: 1,3,5,9,11
And now the grouping happens the way we want it. An important thing to
note is that the key function used to sort the list must be same as
the one which is going to be passed to groupby
.
Just flatmap that shit!
Flatmap is a commonly used pattern in functional programming where mapping a function to a list results in a list of lists that then needs to be flattened. For eg. given a list of directories, we want to get the names of all their first level children as a list.
>>> import os>>> dirs = ['project1/', 'project2/', 'project3/']>>> map(os.listdir, dirs)>>> [['settings.py', 'wsgi.py', 'templates'], ['app.py', 'templates'], ['index.html, 'config.json']]
This gives us a list of lists and we still need to flatten it. There
are of course many ways to do this, one way is to use reduce
. But
here is an elegant way using itertools.chain
. chain
takes many
iterables as arguments and chains or appends them at ends. (Edit:
As correctly pointed out bymasklinn on hacker news,
it's better to use imap
instead of map
and also the alternatechain
constructor chain.from_iterable
to avoid passing the
lazy object as *args since the former will result in eager
evaluation). Let's define a function flatmap
that will map a
function to a list of items and flatten the resulting list of lists.
fromitertoolsimportchain,imapdefflatmap(f,items):returnchain.from_iterable(imap(f,items))
And now we replace the call to map
with flatmap
,
>>> list(flatmap(os.listdir, dirs))>>> ['settings.py', 'wsgi.py', 'templates', 'app.py', 'templates', 'index.html, 'config.json']
Itertools for everything (for fun and learning!)
As a final example, let's see how we can compose an elegant solution entirely using our newly acquired utility belt. The problem is to find a set of common factors of a list of integers.
fromitertoolsimportifilter,takewhile,countdeffactors(n):returnifilter(lambdax:n%x==0,takewhile(lambday:y<=n,count(1)))
>>> set(flatmap(factors, [9, 15, 16, 23, 76, 101])) set([1, 2, 3, 4, 5, 38, 8, 9, 76, 15, 16, 19, 23, 101])
That's all for now. If you reached this far, thanks for reading and hope you found it helpful. Itertools obviously has many more useful functions and classes. I just skipped them since I failed to come up with good examples and use cases for them. May be I will do a part II some time later.