Comments:"Untitled"
2. Stream Fusion Background
We begin by providing the background necessary for understand-ing stream fusion. There is no new material here—it is all derivedfrom Coutts et al. [6]. However, we describe fusion for functionsof
vectors
of unboxed values, as implemented in the
vector
[18]library, rather than fusion for functions over
lists
. Some of the im-plementation details are elided, but the essential aspects of streamfusion as we describe them are faithful to the implementation.
2.1 The key insight
The big idea behind stream fusion is to rewrite recursive functions,which are difficult for a compiler to automatically optimize, as non-recursive functions. The abstraction that accomplishes this is the
Stream
data type:
data
Stream a
where
Stream
::
(
s
→
Step s a
)
→
s
→
Int
→
Stream a
data
Step s a
=
Yield a s
|
Skip s
|
Done
A stream is a triple of values: an existentially-quantified state,represented by the type variable
s
in the above definition, a size,and a step function that, when given a state, produces a
Step
. A
Step
may be
Done
, indicating that there are no more values in the
Stream
, it may
Yield
a value and a new state, or it may produce anew state but
Skip
producing a value. The presence of
Skip
allowsus to easily express functions like
filter
within the stream fusionframework.To see concretely how this helps us avoid recursive functions,let us write
map
for vectors using streams
map
::
(
a
→
b
)
→
Vector a
→
Vector bmap f
=
unstream
◦
map
s
f
◦
stream
The functions
stream
and
unstream
convert a
Vector
to and from astream. A
Vector
is converted to a stream whose state is an integerindex and whose step function yields the value at the current index,which is incremented at each step. To convert a stream back intoa
Vector
,
unstream
allocates memory for a new vector and writeseach element to the vector as it is yielded by the stream—
unstream
embodies a recursive loop. Though imperative, the allocation andwriting of the vector are safely embedded in pure Haskell using the
ST
monad [17].The real work is done by
map
s
, which is happily non-recursive.
map
s
::
(
a
→
b
)
→
Stream a
→
Stream bmap
s
f
(
Stream step s
) =
Stream step
s
where
step
s
=
case
step s
of
Yield x s
→
Yield
(
f x
)
s
Skip s
→
Skip s
Done
→
Done
With this definition, the equational rule mentioned in the Introduc-tion,
map f
◦
map g
≡
map
(
f
◦
g
)
, falls out automatically. To seethis, let us first inline our new definition of
map
in the expression
map f
◦
map g
.
map f
◦
map g
≡
unstream
◦
map
s
f
◦
stream
◦
unstream
◦
map
s
g
◦
stream
Given this form, we can immediately spot where an intermedi-ate structure is formed—by the composition
stream
◦
unstream
.We can also see that this composition is the identity function, sowe should be able to eliminate it entirely! Rewrite rules [27] en-able programmers to express algebraic identities such as
stream
◦
unstream
=
id
in a form that GHC can understand and automati-cally apply. Stream fusion relies
critically
on this ability, and the
vector
library includes exactly this rule. With the rule in place,GHC transforms our original composition of maps into
map f
◦
map g
≡
unstream
◦
map
s
f
◦
map
s
g
◦
stream
Conceptually, stream fusion pushes all recursive loops into thefinal consumer. The two composed invocations of
map
becomea composition of two
non-recursive
calls to
map
s
. The inlineris now perfectly capable of combining
map
s
f
◦
map
s
g
into asingle
Stream
function. Stream fusion gives us the equational rule
map f
◦
map g
≡
map
(
f
◦
g
)
for free
.
2.2 Fusing the vector dot product
The motivating example we will use for the rest of the paper is thevector dot product. A high-level implementation of this function inHaskell might be written as follows:
dotp
::
Vector Double
→
Vector Double
→
Doubledotp v w
=
sum
(
zipWith
(
∗
)
v w
)
It seems that this implementation will suffer from severeinefficiency—the call to
zipWith
produces an unnecessary inter-mediate vector that is immediately consumed by the function
sum
.In expressing
dotp
as a composition of collective operations, wehave perhaps gained a bit of algorithmic clarity but in turn we haveincurred a performance hit.We have already seen how stream fusion eliminates interme-diate structures in the case of a composition of two calls to
map
.Previous fusion frameworks could handle that example, but werestymied by the presence of a
zipWith
. However, stream fusion hasno problem fusing
zipWith
, which we can see by applying thetransformations we saw in Section 2.1 to
dotp
.The first step is to re-express each
Vector
operation as thecomposition of a
Stream
operation and appropriate conversionsbetween
Vector
s and
Stream
s at the boundaries. The functions
zipWith
and
sum
are expressed in this form as follows.
zipWith
::
(
a
→
b
→
c
)
→
Vector a
→
Vector b
→
Vector czipWith f v w
=
unstream
(
zipWith
s
f
(
stream v
) (
stream w
))
sum
::
Num a
⇒
Vector a
→
asum v
=
foldl
s
0
(+) (
stream v
)
It is now relatively straightforward to transform
dotp
to eliminatethe intermediate structure.
dotp
::
Vector Double
→
Vector Double
→
Doubledotp
≡
sum
(
zipWith
(
∗
)
v w
)
≡
foldl
s
0
(+) (
stream
(
unstream
(
zipWith
s
(+) (
stream v
) (
stream w
))))
≡
foldl
s
0
(+)(
zipWith
s
(+) (
stream v
) (
stream w
))
This transformation again consists of inlining a few definitions,something that GHC can easily perform, and rewriting the compo-sition
stream
◦
unstream
to the identity function. After this trans-formation, the production (by
zipWith
) and following consumption(by
sum
) of an intermediate
Vector
becomes the composition of non-recursive functions on streams.We can see how iteration is once again pushed into the final con-sumer by looking at the implementations of
foldl
s
and
zipWith
s
.The final consumer in
dotp
is
foldl
s
, which is implemented by anexplicit loop that consumes stream values and combines the yieldedvalues with the accumulator
z
using the function
f
(the call to
seq
guarantees that the accumulator is strictly evaluated).
2
2013/3/29