I was reading Wikipedia, and I found this:
http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
I decided to sit down and see what that looks like in Haskell. I came up
with this:
module BWT where
import Data.List
rotate1 :: [x] -> [x]
rotate1 [] = []
rotate1 xs = last xs : init xs
rotate :: [x] -> [[x]]
rotate xs = take (length xs) (iterate rotate1 xs)
bwt :: (Ord x) => [x] -> [x]
bwt = map last . sort . rotate
step :: (Ord x) => [x] -> [[x]] -> [[x]]
step xs = zipWith (:) xs . sort
inv_bwt :: (Ord x) => x -> [x] -> [x]
inv_bwt mark xs =
head . filter (\xs -> head xs == mark) .
head . drop ((length xs) - 1) . iterate (step xs) .
map (\x -> [x]) $ xs
My my, isn't that SO much shorter? I love Haskell! :-D
Unfortunately, the resident C++ expert fails to grasp the concept of
"example code", and insists on comparing the efficiency of this program
to the C one on the website.
Fact is, he's translated the presented C into C++, and it can apparently
transform a 145 KB file in 8 seconds using only 3 MB of RAM. The code
above, however, took about 11 seconds to transform 4 KB of text, and
that required about 60 MB of RAM. (I tried larger, but the OS killed the
process for comsuming too much RAM.)
Well anyway, the code was written for simplicity, not efficiency. I've
tried to explain this, but apparently that is beyond his comprehension.
So anyway, it looks like we have a race on. >:-D
The first thing I did was the optimisation mentioned on Wikipedia: you
don't *need* to build a list of lists. You can just throw pointers
around. So I arrived at this:
module BWT2 (bwt) where
import Data.List
rotate :: Int -> [x] -> Int -> [x]
rotate l xs n = (drop (l-n) xs) ++ (take (l-n) xs)
bwt xs =
let l = length xs
ys = rotate l xs
in map (last . rotate l xs) $
sortBy (\n m -> compare (ys n) (ys m)) [0..(l-1)]
This is indeed *much* faster. With this, I can transform 52 KB of text
in 9 minutes + 60 MB RAM. The previous version seemed to have quadratic
memory usage, whereas this one seems to be linear. 52 KB would have
taken many months with the first version!
Still, 9 minutes (for a file 3 times smaller) is nowhere near 8 seconds.
So we must try harder... For my next trick, ByteStrings! (Never used
them before BTW... this is my first try!)
module BWT3 (bwt) where
import Data.List
import qualified Data.ByteString as Raw
rotate :: Int -> Raw.ByteString -> Int -> Raw.ByteString
rotate l xs n = (Raw.drop (l-n) xs) `Raw.append` (Raw.take (l-n) xs)
bwt xs =
let l = Raw.length xs
ys = rotate l xs
in Raw.pack $
map (Raw.last . rotate l xs) $
sortBy (\n m -> compare (ys n) (ys m)) [0..(l-1)]
Now I can transform 52 KB in 54 seconds + 30 MB RAM. Still nowhere near
C++, but a big improvement none the less.
Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM!
Vast speed increases... Jeepers, I can transform 52 KB so fast I can't
even get to Task Manager fast enough to *check* the RAM usage! Blimey...
OK, just tried the 145 KB test file that Mr C++ used. That took 2
seconds + 43 MB RAM. Ouch.
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Mr. C++ apparently isn't a very good C++ programmer, since his best
effort absolutely *pales* in comparison to Julian Seward's BWT:
stefan@stefans:/usr/local/src/hpaste$ head -c 135000 /usr/share/dict/words | (time bzip2 -vvv) > /dev/null
(stdin):
block 1: crc = 0x25a18961, combined CRC = 0x25a18961, size = 135000
0 work, 135000 block, ratio 0.00
135000 in block, 107256 after MTF & 1-2 coding, 61+2 syms in use
initial group 6, [0 .. 0], has 20930 syms (19.5%)
initial group 5, [1 .. 1], has 4949 syms ( 4.6%)
initial group 4, [2 .. 2], has 20579 syms (19.2%)
initial group 3, [3 .. 4], has 17301 syms (16.1%)
initial group 2, [5 .. 10], has 24247 syms (22.6%)
initial group 1, [11 .. 62], has 19250 syms (17.9%)
pass 1: size is 127140, grp uses are 339 550 192 440 12 613
pass 2: size is 51693, grp uses are 321 440 288 316 139 642
pass 3: size is 51358, grp uses are 329 387 376 304 122 628
pass 4: size is 51302, grp uses are 298 421 397 304 125 601
bytes: mapping 21, selectors 433, code lengths 110, codes 51297
final combined CRC = 0x25a18961
2.602:1, 3.075 bits/byte, 61.57% saved, 135000 in, 51887 out.
real 0m0.165s
user 0m0.044s
sys 0m0.012s
Yup, does slightly more work (huffman coding) in 1/200 the time :)
(Note, on my system .Lazy BWT3 takes 5.3s on the same input)
Stefan
..
> module BWT3 (bwt) where
>
> import Data.List
> import qualified Data.ByteString as Raw
>
> rotate :: Int -> Raw.ByteString -> Int -> Raw.ByteString
> rotate l xs n = (Raw.drop (l-n) xs) `Raw.append` (Raw.take (l-n) xs)
>
> bwt xs =
> let l = Raw.length xs
> ys = rotate l xs
> in Raw.pack $
> map (Raw.last . rotate l xs) $
> sortBy (\n m -> compare (ys n) (ys m)) [0..(l-1)]
>
>
> Now I can transform 52 KB in 54 seconds + 30 MB RAM. Still nowhere near
> C++, but a big improvement none the less.
The trouble is that Raw.append is an O(N) operation, making the computation
O(N^2) where it ought to be O(NlogN).
> Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM!
> Vast speed increases... Jeepers, I can transform 52 KB so fast I can't
> even get to Task Manager fast enough to *check* the RAM usage! Blimey...
>
> OK, just tried the 145 KB test file that Mr C++ used. That took 2
> seconds + 43 MB RAM. Ouch.
In this case append is an O(1) operation. But you're still getting killed
on prefactors, because you're generating a list of size N and then sorting
it. Lists are just not nice data structures to sort, nor are they nice to
have for large N.
To get better speed and memory use, I think you'd want to avoid the
intermediate list in favor of some sort of strict array, but that'd be
ugly.
--
David Roundy
Department of Physics
Oregon State University
> Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM! Vast
> speed increases... Jeepers, I can transform 52 KB so fast I can't even get to
> Task Manager fast enough to *check* the RAM usage! Blimey...
>
> OK, just tried the 145 KB test file that Mr C++ used. That took 2 seconds + 43
> MB RAM. Ouch.
>
A note re RAM usage: it behaves differently in a GCed environment, you
might want to see if it runs with a smaller max heap size. Obviously
you'll spend more time GCing.
"My religion says so" explains your beliefs. But it doesn't explain
why I should hold them as well, let alone be restricted by them.
..OK...so how do I make Haskell go faster still?
Presumably by transforming the code into an ugly mess that nobody can
read any more...?
http://haskell.org/haskellwiki/Performance
-- Don
Well, it actually takes O(N^2 log N) time: N log N comparisons each
taking O(N) time. With O(1) append, each comparison still has a
worst-case bound of O(N) (take the degenerate input (repeat 'a')) but it
will probably be much faster in practice. The log N factor can be
squeezed to a constant by using a simple trie, but to be even faster,
you need a clever suffix tree/array. IIRC, O(N) is the best worst case
bound for the burrows wheeler transform although they're said to be not
so good in practice.
Note that the one usually adds an "end of string" character $ in the
Burrows-Wheeler transform for compression such that sorting rotated
strings becomes sorting suffices.
Concerning the algorithm at hand, you can clearly avoid calculating
Raw.append over and over:
bwt :: Raw.ByteString -> Raw.ByteString
bwt xs = Raw.pack . map (Raw.last) . sort $ rotations
where
n = length xs
rotations = take n . map (take n) . tails $ xs `Raw.append` xs
assuming that take n is O(1).
>> Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM!
>> Vast speed increases... Jeepers, I can transform 52 KB so fast I can't
>> even get to Task Manager fast enough to *check* the RAM usage! Blimey...
>>
>> OK, just tried the 145 KB test file that Mr C++ used. That took 2
>> seconds + 43 MB RAM. Ouch.
>
> In this case append is an O(1) operation. But you're still getting killed
> on prefactors, because you're generating a list of size N and then sorting
> it. Lists are just not nice data structures to sort, nor are they nice to
> have for large N.
>
> To get better speed and memory use, I think you'd want to avoid the
> intermediate list in favor of some sort of strict array, but that'd be
> ugly.
Eh? You can't really avoid the list for sorting, in the sense that
before trying an array/in-place sort, you'd better use a trie.
Regards,
apfelmus
Yeah, I noticed that the output from by program can never actually be
reverted to its original form. ;-) Maybe it I alter the code to stick a
0-byte in there or something...
(Hmm, I wonder how Mr C++ managed it? Heh. If I could read C++, maybe
I'd know...)
> Concerning the algorithm at hand, you can clearly avoid calculating
> Raw.append over and over:
>
> bwt :: Raw.ByteString -> Raw.ByteString
> bwt xs = Raw.pack . map (Raw.last) . sort $ rotations
> where
> n = length xs
> rotations = take n . map (take n) . tails $ xs `Raw.append` xs
>
> assuming that take n is O(1).
>
..which is amusing because that's what the *first* implementation did. ;-)
I was trying to avoid O(n^2) RAM usage. :-}
Saturday, June 23, 2007, 11:21:26 AM, you wrote:
> ...OK...so how do I make Haskell go faster still?
> Presumably by transforming the code into an ugly mess that nobody can
> read any more...?
bwt transformation is very good researched area, so probably you will
not get decent performance (megabytes per second) without lot of work.
and of course, no Haskell at all. take look at
http://darchiver.narod.ru/dark/Archon3fs.zip
--
Best regards,
Bulat mailto:Bulat.Z...@gmail.com
Well it can, but that's a different story told in
Richard S. Bird and Shin-Cheng Mu.
Inverting the Burrows-Wheeler transform.
http://web.comlab.ox.ac.uk/oucl/work/richard.bird/publications.html
#DBLP:journals/jfp/BirdM04
Oh, and you had a function inv_bwt, right?
>> Concerning the algorithm at hand, you can clearly avoid calculating
>> Raw.append over and over:
>>
>> bwt :: Raw.ByteString -> Raw.ByteString
>> bwt xs = Raw.pack . map (Raw.last) . sort $ rotations
>> where
>> n = length xs
>> rotations = take n . map (take n) . tails $ xs `Raw.append` xs
>>
>> assuming that take n is O(1).
>
> I was trying to avoid O(n^2) RAM usage. :-}
Note that for ByteStrings, this takes only O(n) RAM because the
substrings are shared. But for lists, this would take O(n^2) RAM since
(take n) cannot share hole sublists. An O(n) choice for lists that
doesn't recalculate ++ all the time would be
bwt :: Ord a => [a] -> [a]
bwt xs = map last . sortBy (compare `on` (take n)) $ rotations
where
n = length xs
rotations = take n . tails $ xs ++ xs
with the well-known
on :: (a -> a -> c) -> (b -> a) -> (b -> b -> c)
on g f x y = g (f x) (f y)
Regards,
apfelmus
Implementation #1 had an inverse - but it works by finding an unused
character and prepending that to the input before doing the forward BWT. ;-)
For the other implementations, there is no inverse at all.
>> I was trying to avoid O(n^2) RAM usage. :-}
>>
>
> Note that for ByteStrings, this takes only O(n) RAM because the
> substrings are shared. But for lists, this would take O(n^2) RAM since
> (take n) cannot share hole sublists.
Presumably that's only true for *strict* ByteStrings?
(I still don't actually understand how lazy bytestrings are any
different to normal lists - or why they produced such a massive
performance increase...)
> An O(n) choice for lists that
> doesn't recalculate ++ all the time would be
>
> bwt :: Ord a => [a] -> [a]
> bwt xs = map last . sortBy (compare `on` (take n)) $ rotations
> where
> n = length xs
> rotations = take n . tails $ xs ++ xs
>
> with the well-known
>
> on :: (a -> a -> c) -> (b -> a) -> (b -> b -> c)
> on g f x y = g (f x) (f y)
>
Interesting... But how does it perform? ;-)
Hey, I'm just glad I managed to get within striking distance of Mr C++.
So much for Haskell being "inherently less performant". :-P
> and of course, no Haskell at all. take look at
> http://darchiver.narod.ru/dark/Archon3fs.zip
>
Mmm, ok...
Saturday, June 23, 2007, 2:45:01 PM, you wrote:
> Hey, I'm just glad I managed to get within striking distance of Mr C++.
> So much for Haskell being "inherently less performant". :-P
my little analysis says that it's probably due to different sort()
implementations, so this says nothing about general case
--
Best regards,
Bulat mailto:Bulat.Z...@gmail.com
_______________________________________________
The point being that lots of people look at Haskell and go "oh, that's
very cute for writing trivial example code, but it can never be fast;
for that you must use C or C++".
Well, I altered the code, and it's *still* very short and very readable,
and it's just as fast as the (3 pages long) C++ version. >:-D
Indeed. The performance of modern functional programming languages never
ceases to amaze me. INRIA typically claim performance within 2x of C for
OCaml but there are very few programs where OCaml isn't within 50% now, at
least on AMDs.
Most of the pedagogical examples of functional programming (e.g. n-queens) are
too academic for general consumption but there are plenty of excellent
examples. Burrows-Wheeler is a great one. Ray tracing is another. Sudoku
solving is fairy good, albeit unusually well suited to arrays.
I also like to compare the performance of term-level interpreters or rewriters
implemented in different languages. I think it would be interesting to
compare Scheme/Mathematica interpreters written in OCaml, SML (MLton) and
Haskell, for example. I've benchmarked some interpreters in OCaml and SML
(MLton) and MLton's whole-program optimizations are a huge benefit here,
making it several times faster than OCaml. I'd like to know how well Haskell
would do.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?e
Saturday, June 23, 2007, 4:02:54 PM, you wrote:
> The point being that lots of people look at Haskell and go "oh, that's
> very cute for writing trivial example code, but it can never be fast;
> for that you must use C or C++".
and that's true :) as i said, your C++ code is very far from perfect.
by comparing with bad code you can find that even Logo is faster than
asm
--
Best regards,
Bulat mailto:Bulat.Z...@gmail.com
_______________________________________________
To recover the original string you either need to store the offset to
the first character or add a sentinel character to mark the string end.
One advantage of the sentinel character approach is that it's sufficient
to sort just the suffixes of the text. A disadvantage is that you need
an additional character. The code below reserves \0 for this purpose.
>>> Code: >>>
"bwt" implements a variation of the Burrows-Wheeler transform, using
\0 as a sentinel character for simplicity. The sentinel has to be smaller
than all other characters in the string.
> bwt xs = let
> suffixes = [(a,as) | a:as <- tails ('\0':xs)]
> in
> map fst . sortBy (\(_,a) (_,b) -> a `compare` b) $ suffixes
"rbwt" implements the corresponding inverse BWT. It's a fun knot tying
exercise.
> rbwt xs = let
> res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res)
> in
> tail . map snd . zip xs $ head res
"zipWith'" is a variant of zipWith that asserts that the third argument
has the same shape as the second one.
> zipWith' f [] _ = []
> zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys
<<< End Code <<<
Both transforms use linear memory (but quite a lot, admittedly).
regards,
Bertram
Bertram Felgenhauer wrote:
>>>> Code: >>>
>
> "bwt" implements a variation of the Burrows-Wheeler transform, using
> \0 as a sentinel character for simplicity. The sentinel has to be smaller
> than all other characters in the string.
>
>> bwt xs = let
>> suffixes = [(a,as) | a:as <- tails ('\0':xs)]
>> in
>> map fst . sortBy (\(_,a) (_,b) -> a `compare` b) $ suffixes
>
> "rbwt" implements the corresponding inverse BWT. It's a fun knot tying
> exercise.
>
>> rbwt xs = let
>> res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res)
>> in
>> tail . map snd . zip xs $ head res
>
> "zipWith'" is a variant of zipWith that asserts that the third argument
> has the same shape as the second one.
>
>> zipWith' f [] _ = []
>> zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys
>
> <<< End Code <<
I did not like the look of (map snd . zip xs) since it looks like a no-op (that
constructs a useless (,) which may or may not be elided by a sufficiently smart
compiler). But it is using the fact that xs is finite and (head res) is not to
do "take (length xs) $ head res" without the extra traversal and math. But one
can abuse (zipWith' (flip const)) for a 'rbwt' appeals to me more:
> import Data.List
>
> f `on` g = \x y -> (g x) `f` (g y)
>
> zipWith' f [] _ = []
> zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys
>
> bwt = map head . sortBy (compare `on` tail) . init . tails . ('\0':)
>
> rbwt xs = tail $ zipWith' (flip const) xs $ head res
> where res = sortBy (compare `on` head) (zipWith' (:) xs res)
While I was removing (,) from the 'rbwt' I went ahead and removed it from 'bwt'
as well. This was, of course, pointless...
Thanks for the fun example,
Chris
Indeed it's very fun =). Although I must admit that, even after
staring at the code for some time, I still don't see how it works. Is
it too much to ask for a brief explanation?
I see that zipWith' creates all the lazy list without requiring any
knowledge of the second list, which means that the list to be sorted
is created nicely. But when sortBy needs to compare, say, the first
with the second items, it forces the thunk of the lazy list. But that
thunk depends on the sorted list itself!
What am I missing? ;)
Thanks in advance,
--
Felipe.
I just read and understood it. I'll walk through my version:
> import Data.List
>
> f `on` g = \x y -> (g x) `f` (g y)
>
> zipWith' f [] _ = []
> zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys
>
> bwt = map head . sortBy (compare `on` tail) . init . tails . ('\0':)
>
> rbwt xs = tail $ zipWith' (flip const) xs $ head res
> where res = sortBy (compare `on` head) (zipWith' (:) xs res)
The key is that sort and sortBy are very very lazy.
'res' is a finite list of infinite strings.
Consider sorting such a finite list of infinite lists without
the recursive definition:
> example =
> let a = [1,3..]
> b = [1,2..]
> c = [2,4..]
> d = [1,4..]
> sorted = sort [a,b,c,d]
> firstFiveOfEach = map (take 5) sorted
> in mapM_ print firstFiveOfEach
Which in ghci produces:
> *Main> example
> [1,2,3,4,5]
> [1,3,5,7,9]
> [1,4,7,10,13]
> [2,4,6,8,10]
Now for a sidetrack:
> example =
> let a = [1,3..]
> b = [1,2..]
> c = [2,4..]
> d = [1,4..]
> sorted = sort [a,b,c,d,c]
> firstFiveOfEach = map (take 5) sorted
> in mapM_ print firstFiveOfEach
Produces the first three elements before 'c' ....
> *Main> example
> [1,2,3,4,5]
> [1,3,5,7,9]
> [1,4,7,10,13]
..and then hangs because "compare c c" never terminates.
This lazy sorting is also why "head . sort $ foo" is an O(N) way to get the
minimum of a list "foo" in Haskell.
So the code for rbwt depends on this lazy sorting behavior. The recursive
nature of the definition is the difficult to (directly) invent part.
The result of (zipWith' (:) xs ___) is a list the same length as 'xs' and where
each item is a list with the head item taken from 'xs'. So it has "primed the
pump" of the recursive definition by giving the first Char in each of the strings.
Compare with the much simpler 'ones' and 'odds':
ones = 1 : ones
odds = 1 : map (2+) odds
What about the tails of all these lists? They come from res. And res comes
from sorting the results of zipWith'. But the sorting is done only by looking
at the head element, which is the one that comes from xs. That is the "sortBy
(\(a:as) (b:bs) -> a `compare` b)" or "sortBy (compare `on` head)".
So the sorting routine does not care about the ___ in (zipWith' (:) xs ___) at
all, since it only examines the head which comes from xs.
Mainly the precise effect of the irrefutable pattern in zipWith',
I think:
> zipWith' f [] _ = []
> zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys
The ~(y:ys) does not pattern match immediately; instead it asserts
that the argument will have a (:) constructor later, when either
y or ys are forced. The last line is equivalent to
> zipWith' f (x:xs) ys' = let (y:ys) = ys' in f x y : zipWith' f xs ys
which may be easier to understand.
The end effect is that
> map head $ zipWith' (:) [1..3] undefined
returns [1..3], never touching the "undefined" value at all.
Trying
> zipWith' [1..] [[]]
is also instructive:
[[1],[2*** Exception: bwt.hs:(18,0)-(19,51): Irrefutable pattern failed for pattern (y : ys)
Note the point where the error happens.
The second thing is that the comparison in "rbwt" only compares
the first character from the result lists - that is, it really
only sorts the list xs, annotated with some additional information
that is not available yet (isn't lazy evaluation great?).
Once the sorting is done, the ~(y:ys) pattern matches of "zipWith'"
all succeed. The effect is that a single circular list is formed
from the characters in xs, with each rotation of the list occuring
exactly once. In fact we get all rotations of the original text, in
sorted order. This is by no means obvious, but is how the inverse
BWT works.
Hope that helps,
Bertram
P.S. Thanks Chris for your improvements :)