PRESENTATION
Don Stewart of Galois spoke about "Designing, visualizing and
benchmarking data structures in Haskell".
These notes attempt to summarize and add content beyond what's included
in Don's posts and slides. However, I'm not not fluent in this domain,
so any errors are mine and you're encouraged to post corrections.
In a nutshell, Don's solution for improving performance was to create
specialized types and functions, which use more efficient data
structures representations and use more optimized algorithms to operate
on them.
You can find Don's original posts and slides on the topic at:
* "Self-optimizing data structures: using types to make lists faster" --
http://bit.ly/2Rxo0R
* "Fast mutable collections for Haskell: more benchmarks" --
http://bit.ly/13GEUK
* "Improving Data Structures: Visualization and Specialization" --
http://www.scribd.com/doc/19637058/Improving-Data-Structures
Notes:
* We want to be able to write the fastest, most concise programs.
Haskell does pretty well in comparisons to many other languages, but its
performance could be better.
* What we can do?
- Use Fusion, it alters code and data by eliminating the
intermediate data structures in function compositions. See also:
-
http://stackoverflow.com/questions/578063/what-is-haskells-stream-fusion
-
http://donsbot.wordpress.com/2009/09/07/stream-fusion-for-haskell-arrays/
- http://www.fing.edu.uy/inco/proyectos/fusion/tool/
- Constructor specialization: skip unnecessary steps and structures
by writing smarter rules.
- Domain-specific rewrite rules: write optimizing rules in your
libraries.
- More and better parallelism: this has gotten much better and easier.
- Smarter Haskell runtime: better compiler, smarter threads,
constructor tag bits (remembers values past pointers), etc make code run
faster for you automagically.
- Special-purpose libraries: optimized libs make things faster.
* What about making regular data types faster?
- Custom types and optimizations are great, but how to leverage
standard types to preserve existing code and libraries:
- To do this, need to see how data structures are represented...
* A secret primop: unpackClosure#
- Smuggles runtime reflection into Haskell via the GHCi debugger
- unpackClosure# :: a (# Addr#, Array# b, ByteArray# #)
- Can write interesting things:
- sizeOfClosure :: a -> Int
- hasUnboxedFields :: a -> Bool
- view :: a -> Graph
* Everyday data types:
- By visualizing how data structures are represented, we can
identify inefficiencies and come up with better data structures.
- Visualizer package and sample code in "vacuum-cairo":
http://hackage.haskell.org/package/vacuum-cairo
- Sample uses to display data structure visualization of item:
- view "hello"
- view [1..5]
- See Don's slides, starting with page 11 for the pretty
visualizations: http://www.scribd.com/doc/19637058/Improving-Data-Structures
* Opportunties for improvement
- Many parametric polymorphic container types are in common use
- All have slow uniform representations, e.g.:
data Maybe a = Nothing | Just a
- However, we know the payload already statically, e.g.:
readInt :: ByteString -> Maybe Int
- How much faster if we could provide specialized data types for
each use, and apply it to the many places these containers are used?
* Challenge
- Make parametric polymorphic types as efficient as monomorphic ones
when we know the type.
- Remove the uniform representation penalty.
* Inspiration: Habit and DPH
- "Data Parallel Haskell" (DPH)
- List-like interface
- Radical restructuring under hood
- "Habit", PSU/Galois "Systems Haskell" has per-constructor
representation allocations
- Whole program compilers, like JHC, although GHC inliner is sort of
there already
* Goals: uniform improvements for polymorphic containers
- Specialized polymorphic containers for each element types
- Retain user interface of regular parametric polymorphism, so
libraries remain mostly unchanged
- Setup uniform rules for specialization, sacrifice laziness for speed.
- Open extension: allow ad hoc representations
* Mechanism:
- Need custom monomorphics for each type and functions to use it.
- Type classes and custom functions:
- Make decisions on per-type basis
- Open, can add new instances and behaviors later
- Ad hoc, add your own rules for each type
- Class-association data types and custom structures:
- Per-instance actual data types
- Representation types added separately to function on those types
* Self-optimizing tuples, which will optimize based on what you put in it:
{ュ# LANGUAGE TypeFamilies, MultiParamTypeClasses #ュ}
ュュ data (,) a b = (a, b)
class AdaptPair a b where
data Pair a b ュュ no representation yet
curry :: (Pair a b ュ> c) ュ> a ュ> b ュ> c
fst :: Pair a b ュ> a
snd :: Pair a b ュ> b
* Functions on adaptive types
-- uncurry :: (a ュ> b ュ> c) ュ> ((a, b) ュ> c)
-- uncurry f p = f (fst p) (snd p)
uncurry :: AdaptPair a b
=> (a ュ> b ュ> c) ュ> (Pair a b ュ> c)
uncurry f p = f (fst p) (snd p)
pair :: AdaptPair a b
=> a ュ> b ュ> Pair a b
pair = curry id
* Plugging in representation
instance AdaptPair Int Double where
-- We can use our unpacking tricks per type
data Pair Int Double
= PairIntDouble {ュ# UNPACK #ュ}!Int
{ュ# UNPACK #ュ}!Double
-- boilerplate views
fst (PairIntDouble a _) = a
snd (PairIntDouble _ b) = b
curry f x y = f (PairIntDouble x y)
* Joy
- Greater data density thanks to straight and inlined code
- More strictness info for common types
- More (Constructed Product Result) analysis possible
- Removes heap checks
- Interacts well with other optimizers, like Fusion
- Can do ad hoc representation decisions
* Careful, smart compiler wanted
- GHC is already fairly clever and we don't want to fight it.
- Don't want dictionaries left over, GHC can remove them.
- Will rely heavily on inlining, fake the whole program compiler
- Need lots of instances to wrap, which increases compile time and size.
- Libraries are now templates, not directly reused -- they're
inlined, then specialized.
* Lists
class AdaptList a where
data List a
empty :: List a
cons :: a ュ> List a ュ> List a
null :: List a ュ> Bool
head :: List a ュ> a
tail :: List a ュ> List a
* List functions
fromList :: AdaptList a => [a] ュ> List a
fromList [] = empty
fromList (x:xs) = x `cons` fromList xs
(++) :: AdaptList a
=> List a ュ> List a ュ> List a
(++) xs ys
| null xs = ys
| otherwise = head xs `cons` tail xs ++ ys
* List functions
map :: (AdaptList a, AdaptList b)
=> (a ュ> b) ュ> List a ュ> List b
map f as = go as
where
go xs
| null xs = empty
| otherwise = f (head xs) `cons` go (tail xs)
* List instances
instance AdaptList Int where
data List Int
= EmptyInt
| ConsInt {ュ# UNPACK #ュ}!Int (List Int)
empty = EmptyInt
cons = ConsInt
null EmptyInt = True
null _ = False
head EmptyInt = errorEmptyList "head"
head (ConsInt x _) = x
* Benchmarks:
- Against regular Data.List type and functions, this specialized
list map made code up to 30% faster and required up to 40% less garbage
collection. The downside is this uses more code, which may be an issue
in some cases -- e.g., complex financial apps using this approach may
require multiday compiles to deal with the template-generated
specialized code.
* Non-uniform polymorphic containers
- If possible, is there a generic approach for this?
- Can we do this automatically? How to avoid combinatorial
explosion to avoid too many things being specialized? How to pick only
right things to specialize?
- Rules based on how strict?
- CPR (Constructed Product Result) analysis enabled, can we use it
on sum types?
- Treat inliner as poor man's whole program optimizer?
- Use compiler extensions in the language?
* Where is it?
- cabal install vacuum-cairo
- cabal install adaptive-containers
- cabal install fingertree
- Look for other adaptive-* types
* Actual benchmarking process:
- Don then showed us how he benchmarked the changes using Criterion,
a neat library by Brian O'Sullivan that bootstraps a benchmark and runs
it a meaningful number of times and scales it up to get rid of noise
using statistical methods, discards outliers, and displays pretty graphs:
- Description:
http://www.serpentine.com/blog/2009/09/29/criterion-a-new-benchmarking-library-for-haskell/
- Package: http://hackage.haskell.org/package/criterion