Comments:"Teaching, Playing, and Programming: Ten Years of Purely Functional Data Structures"
URL:http://okasaki.blogspot.ru/2008/02/ten-years-of-purely-functional-data.html
In 1998, I published a book called Purely Functional Data Structures. Ten years later, the book is still selling well. (Every time I get a royalty check, my wife the skeptic says “People are still buying that?!”) The ten-year anniversary seems like a good time to reflect back on my experience with this book.
Origins
My first introduction to functional programming was at Carnegie Mellon University, where I worked for Peter Lee and Phil Koopman on an implementation of a subset of Haskell (called “Eddie”), written in Standard ML. So right from the very beginning, I was working with a mix of strict and lazy evaluation.
I've always been fascinated by data structures, and I soon realized that something was strange about data structures in these two languages. On the one hand, when they worked, data structures were more pleasant to work with than in any other languages I had ever used before. On the other hand, sometimes they just didn't work.
The issue was immutability. In pure functional programming, all data structures are immutable, meaning that they cannot be changed once created. Of course, data structures frequently need to be changed, so what happens is that you create a new copy of the data structure that incorporates the change, without actually modifying the old copy. This can often be done efficiently by having the new structure share large chunks of the old one. For example, stacks are trivial to implement this way. Tree-like structures such as binary search trees and many kinds of priority queues are almost as easy.
However, some common data structures, such as FIFO queues and arrays, can actually be quite difficult to implement this way. Some people take this as a sign that functional languages are impractical. However, it's not the fault of functional programming per se, but rather of immutability. Immutable data structures are just as awkward to implement in conventional languages such as Java. (Of course, immutable data structures also offer huge advantages, which is why, for example, strings are immutable in Java.)
In 1993, I ran across an interesting paper by Tyng-Ruey Chuang and Benjamin Goldberg, showing how to implement efficient purely functional queues and double-ended queues. Their result was quite nice, but rather complicated, so I started looking for a simpler approach. Eventually, I came up a much simpler approach based on lazy evaluation, leading to my first paper in this area. (When I say simpler here, I mean simpler to implement—the new structure was actually harder to analyze than Chuang and Goldberg's.)
Almost immediately after my paper on queues, I worked on a paper about functional arrays, here called random-access lists because they combined the extensibility of lists with the random access of arrays. (Because the lead time on journal papers is longer than the lead time for conferences, this paper actually appeared before the paper on queues, even though it was written later.) The approach I used, which turned out to be very similar to one used earlier by Gene Myers, was based on properties of an obscure number system called skew binary numbers. This marked the beginning of a second theme of my work, that of designing implementations of data structures in analogy to number systems. For example, inserting an element into a data structure is similar to incrementing a number, and merging two data structures is similar to adding two numbers. I called this approach numerical representations. The idea was to choose or create a number system with the properties you want (such as constant-time addition), and then build the data structure on top of that. The advantage of this approach is that you can often do the hard work in a simplified setting, where you don't have to worry about carrying around the actual data in your data structure.
Amusingly, this paper marked the second round of a circle dance between Rob Hoogerwoord, Tyng-Ruey Chuang, and me. First, Hoogerwoord published a paper on functional queues, then Chuang, then me. And again with functional arrays, first Hoogerwoord published a paper, then Chuang, then me.
Around this time, I was looking for a thesis topic (that period that John Reynolds describes as “making even the best student look bad”). Fortunately, my advisor Peter Lee had patience with me during this search. I considered a bunch of topics related to functional programming, but none of them seemed right. I was enjoying the work I was doing with functional data structures, but there didn't really seem to be enough meat there for a dissertation. That changed one day during a walk in the snow.
I had become fascinated by the idea of concatenation (appending two lists). This is trivial to do in conventional languages using, say, doubly-linked lists. But it's much harder to do efficiently with immutable lists. I thought that I had come up with a new approach, so I made an appointment to talk to Danny Sleator about it. He had been involved in some of the early work in this topic (along with Neil Sarnak and Bob Tarjan), and his office was just a few floors below mine. The day before the meeting, I realized that my approach was completely broken. Afraid of looking like an idiot, I went into a frenzy, playing with different variations, and a few hours later I came up with an approach again based on lazy evaluation. I was sure that this approach worked, but I had no idea how to prove it. (Again, this was a case where the implementation was simple, but the analysis was a bear.) My wife had our car somewhere else that day, so I ended up walking home. The walk usually took about 45 minutes, but it was snowing pretty hard, so it took about twice that long. The whole way home I thought about nothing but how to analyze my data structure. I knew it had something to do with amortization, but the usual techniques of amortization weren't working. About halfway home, I came up with the idea of using debits instead of credits, and by the time I got home the entire framework of how to analyze data structures involving lazy evaluation had crystallized in my head.
With this framework in mind, I now believed that there was enough meat there for a dissertation, and I sailed through my thesis proposal and, about a year later, my thesis defense.
The Book
After my defense, Peter Lee suggested that I try to publish my dissertation as a book. He contacted two publishers he had worked with before, and one of them, Lauren Cowles of Cambridge University Press, bit.
The process of turning my dissertation into a book was pure joy. I thought that the basic organization of my dissertation was pretty solid, so mostly I was able to focus on adding and adjusting things to make it work better as a book. For example, I no longer had the constraint from my dissertation of having to focus on original work, so I was free to add data structures that had been developed by other people.
The main additions were expanded introductory material (such as my simplification of red-black trees, which was developed a few weeks after my thesis defense in a series of emails with Richard Bird), exercises, and an appendix including all the source code in Haskell (the main text used source code in Standard ML).
After several months of hacking, I was able to ship off a giant Postscript file to Lauren Cowles, and that was that.
Well, not quite. Lauren asked me what I wanted for cover art. My wife was an avid quilter, and I liked the idea of having a suitably geeky quilt pattern on the cover. I adapted a quilt pattern called Flying Geese, and sent a sketch to Lauren, who got a graphic artist to draw up the final version. I was quite happy with it. Later, I found out that Flying geese was actually the name of a different pattern, and to this day, I don't know the correct name of the pattern. It looks a little different on the book than in quilts because quilters usually only use two levels of the recursion, but I used three.
Reactions
Initial reviews were positive and the hardcover edition sold out pretty quickly, so Cambridge University Press released a paperback edition the following year. I don't know of any courses that used it as the main textbook (it would have to be a pretty specialized course to do that), but several mainstream courses used it as a supplementary textbook.
Sales were steady but trending downward, until 2004, when a book review by Andrew Cooke appeared on Slashdot. After that, sales doubled and have stayed at that level for several years. Of course, I can't be sure this increase was because of Slashdot, but it seems likely. Thanks, Andrew!
The most interesting place the book has shown up was in the data of a TopCoder programming contest problem involving sorting books on a bookshelf. It was part of a list of CS textbooks. (And, no, I wasn't the author of the problem, although I have written some problems for TopCoder.)
The funniest moment about the book was when I worked at Columbia University. A potential grad student from an unrelated field in CS was visiting from Germany, and I was talking to him about the department. When I told him about my research, he said, “My roommate has been reading a book about something like that. I don't remember the name, but do you know which book I mean?” I reached over and pulled my book off the shelf and said, “Do you mean this one?” His mouth fell open.
The most flattering review of the book appeared a few years ago in a blog by José Antonio Ortega Ruiz, who said
“The exposition is extremely elegant and synthetic, and i’ve spent many, many hours immersed in its 220 pages (i can’t think of any other programming book with a better signal to noise ratio, really).” Thanks, jao!Complaints
Of course, you can't please everybody. The most common complaints I've heard are
- No hash tables. This was mentioned in the first review of the book on Amazon, which called the lack of hash tables a “major omission, likely because they don't fit well into a functional environment.” More than a year later, a second reviewer mentioned that hash tables are in fact in the book, but buried in an exercise. Both reviewers are correct. I only gave a very short description of hash tables in an exercise, because there just wasn't that much to say. A hash table is basically the composition of two finite maps, one approximate and one exact. The choice of what structures to use for those finite maps are up to you. Typically, an array is used for the approximate map, but, as the first reviewer noted, arrays do not fit well into a functional environment.
- No delete for red-black trees. I present my simplified form of red-black trees very early in the book as an introduction to functional data structures, but I only describe insertions, not deletions. There were two main reasons for that omission. First, deletions are much messier than insertions, and the introduction did not seem like the right place for a lot of messy details. Second, and more importantly, probably 95+% of uses of functional red-black trees don't need deletions anyway. Because the trees are immutable, when you want to delete something, you can simply revert back to a previous version of the tree from before the unwanted item was inserted. This is almost always both the easiest and the right thing to do. The exception is when you inserted something else that you want to keep after the item to be deleted, but this is extremely rare.
- Why Standard ML? In the last 10 years, Haskell has become steadily more popular, while Standard ML has largely been replaced by OCaml. A frequent complaint has been people wishing that Haskell was in the main text, with Standard ML (or OCaml) in the appendix, instead of the reverse. Even at the time, I debated with myself about the choice of Standard ML, particularly because in some chapters I had to fake extensions to the language, such as lazy evaluation and polymorphic recursion. I went with Standard ML for two main reasons. First, I wanted to be able to distinguish between lazy evaluation and strict evaluation, which was much more difficult in Haskell. Second, I really wanted to be able to give module signatures for all the common structures (stacks, queues, etc.) that I was going to be revisiting over and over. You can sort of do this in Haskell, but not cleanly.
- Too esoteric/impractical. Guilty as charged! I did try to point out a few data structures along the way that were especially practical, but that was not my main focus. If it was, I certainly would not have included umpteen different implementations of queues! Instead, I was trying to focus on design techniques, so that readers could use those techniques to design their own data structures. Many of the example data structures were included as illustrations of those design techniques, not because the examples were particularly practical or valuable by themselves.
The Future?
People sometimes ask me if I'm planning a second edition. The answer is maybe, but not anytime soon. I'm still pretty happy with what I said and how I said it, so I don't feel a burning need to go back and correct things. If I wrote a second edition, it would be to modernize the presentation and to add some new data structures and techniques. For example, I might include the finger trees of Ralf Hinze and Ross Paterson.
In terms of the presentation, however, the biggest question would be what language to use. Standard ML probably doesn't make sense any more, but I'm still not sure Haskell is ready to take over, for the same reasons mentioned above. I suppose I could switch to OCaml, but I'm not sure that's enough of a change to warrant a new edition. I have faith that the vast majority of my readers are perfectly capable of making the translation from Standard ML to OCaml themselves.