Quantcast
Channel: Hacker News 50
Viewing all articles
Browse latest Browse all 9433

Untitled

$
0
0

Comments:"Untitled"

URL:http://www.scribd.com/vacuum?url=http://research.microsoft.com/en-us/um/people/simonpj/papers/ndp/haskell-beats-C.pdf


 

2. Stream Fusion Background

We begin by providing the background necessary for understand-ing stream fusion. There is no new material here—it is all derivedfrom Coutts et al. [6]. However, we describe fusion for functionsof 

vectors

of unboxed values, as implemented in the

vector

[18]library, rather than fusion for functions over

lists

. Some of the im-plementation details are elided, but the essential aspects of streamfusion as we describe them are faithful to the implementation.

2.1 The key insight

The big idea behind stream fusion is to rewrite recursive functions,which are difficult for a compiler to automatically optimize, as non-recursive functions. The abstraction that accomplishes this is the

Stream

data type:

data

Stream a

where

Stream

::

(

s

Step s a

)

s

Int

Stream a

data

Step s a

=

Yield a s

|

Skip s

|

Done

A stream is a triple of values: an existentially-quantified state,represented by the type variable

s

in the above definition, a size,and a step function that, when given a state, produces a

Step

. A

Step

may be

Done

, indicating that there are no more values in the

Stream

, it may

Yield

a value and a new state, or it may produce anew state but

Skip

producing a value. The presence of 

Skip

allowsus to easily express functions like

filter

within the stream fusionframework.To see concretely how this helps us avoid recursive functions,let us write

map

for vectors using streams

map

::

(

a

b

)

Vector a

Vector bmap f 

=

unstream

map

s

stream

The functions

stream

and

unstream

convert a

Vector

to and from astream. A

Vector

is converted to a stream whose state is an integerindex and whose step function yields the value at the current index,which is incremented at each step. To convert a stream back intoa

Vector

,

unstream

allocates memory for a new vector and writeseach element to the vector as it is yielded by the stream—

unstream

embodies a recursive loop. Though imperative, the allocation andwriting of the vector are safely embedded in pure Haskell using the

ST

monad [17].The real work is done by

map

s

, which is happily non-recursive.

map

s

::

(

a

b

)

Stream a

Stream bmap

s

(

Stream step s

) =

Stream step

s

where

step

s

=

case

step s

of 

Yield x s

Yield

(

f x

)

s

Skip s

Skip s

Done

Done

With this definition, the equational rule mentioned in the Introduc-tion,

map f 

map g

map

(

g

)

, falls out automatically. To seethis, let us first inline our new definition of 

map

in the expression

map f 

map g

.

map f 

map g

unstream

map

s

stream

unstream

map

s

g

stream

Given this form, we can immediately spot where an intermedi-ate structure is formed—by the composition

stream

unstream

.We can also see that this composition is the identity function, sowe should be able to eliminate it entirely! Rewrite rules [27] en-able programmers to express algebraic identities such as

stream

unstream

=

id

in a form that GHC can understand and automati-cally apply. Stream fusion relies

critically

on this ability, and the

vector

library includes exactly this rule. With the rule in place,GHC transforms our original composition of maps into

map f 

map g

unstream

map

s

map

s

g

stream

Conceptually, stream fusion pushes all recursive loops into thefinal consumer. The two composed invocations of 

map

becomea composition of two

non-recursive

calls to

map

s

. The inlineris now perfectly capable of combining

map

s

map

s

g

into asingle

Stream

function. Stream fusion gives us the equational rule

map f 

map g

map

(

g

)

for free

.

2.2 Fusing the vector dot product

The motivating example we will use for the rest of the paper is thevector dot product. A high-level implementation of this function inHaskell might be written as follows:

dotp

::

Vector Double

Vector Double

Doubledotp v w

=

sum

(

zipWith

(

)

v w

)

It seems that this implementation will suffer from severeinefficiency—the call to

zipWith

produces an unnecessary inter-mediate vector that is immediately consumed by the function

sum

.In expressing

dotp

as a composition of collective operations, wehave perhaps gained a bit of algorithmic clarity but in turn we haveincurred a performance hit.We have already seen how stream fusion eliminates interme-diate structures in the case of a composition of two calls to

map

.Previous fusion frameworks could handle that example, but werestymied by the presence of a

zipWith

. However, stream fusion hasno problem fusing

zipWith

, which we can see by applying thetransformations we saw in Section 2.1 to

dotp

.The first step is to re-express each

Vector

operation as thecomposition of a

Stream

operation and appropriate conversionsbetween

Vector

s and

Stream

s at the boundaries. The functions

zipWith

and

sum

are expressed in this form as follows.

zipWith

::

(

a

b

c

)

Vector a

Vector b

Vector czipWith f v w

=

unstream

(

zipWith

s

(

stream v

) (

stream w

))

sum

::

Num a

Vector a

asum v

=

foldl

s

0

(+) (

stream v

)

It is now relatively straightforward to transform

dotp

to eliminatethe intermediate structure.

dotp

::

Vector Double

Vector Double

Doubledotp

sum

(

zipWith

(

)

v w

)

foldl

s

0

(+) (

stream

(

unstream

(

zipWith

s

(+) (

stream v

) (

stream w

))))

foldl

s

0

(+)(

zipWith

s

(+) (

stream v

) (

stream w

))

This transformation again consists of inlining a few definitions,something that GHC can easily perform, and rewriting the compo-sition

stream

unstream

to the identity function. After this trans-formation, the production (by

zipWith

) and following consumption(by

sum

) of an intermediate

Vector

becomes the composition of non-recursive functions on streams.We can see how iteration is once again pushed into the final con-sumer by looking at the implementations of 

foldl

s

and

zipWith

s

.The final consumer in

dotp

is

foldl

s

, which is implemented by anexplicit loop that consumes stream values and combines the yieldedvalues with the accumulator

z

using the function

(the call to

seq

guarantees that the accumulator is strictly evaluated).

2

2013/3/29


Viewing all articles
Browse latest Browse all 9433

Trending Articles