[Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

Andrew Coppin

unread,

Jun 22, 2007, 4:26:24 PM6/22/07

to haskel...@haskell.org

OK, so I *started* writing a request for help, but... well I answered my
own question! See the bottom...

I was reading Wikipedia, and I found this:

http://en.wikipedia.org/wiki/Burrows-Wheeler_transform

I decided to sit down and see what that looks like in Haskell. I came up
with this:

module BWT where

import Data.List

rotate1 :: [x] -> [x]
rotate1 [] = []
rotate1 xs = last xs : init xs

rotate :: [x] -> [[x]]
rotate xs = take (length xs) (iterate rotate1 xs)

bwt :: (Ord x) => [x] -> [x]
bwt = map last . sort . rotate

step :: (Ord x) => [x] -> [[x]] -> [[x]]
step xs = zipWith (:) xs . sort

inv_bwt :: (Ord x) => x -> [x] -> [x]
inv_bwt mark xs =
head . filter (\xs -> head xs == mark) .
head . drop ((length xs) - 1) . iterate (step xs) .
map (\x -> [x]) $ xs

My my, isn't that SO much shorter? I love Haskell! :-D

Unfortunately, the resident C++ expert fails to grasp the concept of
"example code", and insists on comparing the efficiency of this program
to the C one on the website.

Fact is, he's translated the presented C into C++, and it can apparently
transform a 145 KB file in 8 seconds using only 3 MB of RAM. The code
above, however, took about 11 seconds to transform 4 KB of text, and
that required about 60 MB of RAM. (I tried larger, but the OS killed the
process for comsuming too much RAM.)

Well anyway, the code was written for simplicity, not efficiency. I've
tried to explain this, but apparently that is beyond his comprehension.
So anyway, it looks like we have a race on. >:-D

The first thing I did was the optimisation mentioned on Wikipedia: you
don't *need* to build a list of lists. You can just throw pointers
around. So I arrived at this:

module BWT2 (bwt) where

import Data.List

rotate :: Int -> [x] -> Int -> [x]
rotate l xs n = (drop (l-n) xs) ++ (take (l-n) xs)

bwt xs =
let l = length xs
ys = rotate l xs
in map (last . rotate l xs) $
sortBy (\n m -> compare (ys n) (ys m)) [0..(l-1)]

This is indeed *much* faster. With this, I can transform 52 KB of text
in 9 minutes + 60 MB RAM. The previous version seemed to have quadratic
memory usage, whereas this one seems to be linear. 52 KB would have
taken many months with the first version!

Still, 9 minutes (for a file 3 times smaller) is nowhere near 8 seconds.
So we must try harder... For my next trick, ByteStrings! (Never used
them before BTW... this is my first try!)

module BWT3 (bwt) where

import Data.List
import qualified Data.ByteString as Raw

rotate :: Int -> Raw.ByteString -> Int -> Raw.ByteString
rotate l xs n = (Raw.drop (l-n) xs) `Raw.append` (Raw.take (l-n) xs)

bwt xs =
let l = Raw.length xs
ys = rotate l xs
in Raw.pack $
map (Raw.last . rotate l xs) $
sortBy (\n m -> compare (ys n) (ys m)) [0..(l-1)]

Now I can transform 52 KB in 54 seconds + 30 MB RAM. Still nowhere near
C++, but a big improvement none the less.

Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM!
Vast speed increases... Jeepers, I can transform 52 KB so fast I can't
even get to Task Manager fast enough to *check* the RAM usage! Blimey...

OK, just tried the 145 KB test file that Mr C++ used. That took 2
seconds + 43 MB RAM. Ouch.

_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Stefan O'Rear

unread,

Jun 22, 2007, 4:47:19 PM6/22/07

to Andrew Coppin, haskel...@haskell.org

On Fri, Jun 22, 2007 at 09:26:40PM +0100, Andrew Coppin wrote:
> OK, so I *started* writing a request for help, but... well I answered my
> own question! See the bottom...

..

> Unfortunately, the resident C++ expert fails to grasp the concept of
> "example code", and insists on comparing the efficiency of this program
> to the C one on the website.

..

> Fact is, he's translated the presented C into C++, and it can apparently
> transform a 145 KB file in 8 seconds using only 3 MB of RAM. The code

..

> Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM!
> Vast speed increases... Jeepers, I can transform 52 KB so fast I can't
> even get to Task Manager fast enough to *check* the RAM usage! Blimey...
>
> OK, just tried the 145 KB test file that Mr C++ used. That took 2
> seconds + 43 MB RAM. Ouch.

Mr. C++ apparently isn't a very good C++ programmer, since his best
effort absolutely *pales* in comparison to Julian Seward's BWT:

stefan@stefans:/usr/local/src/hpaste$ head -c 135000 /usr/share/dict/words | (time bzip2 -vvv) > /dev/null
(stdin):
block 1: crc = 0x25a18961, combined CRC = 0x25a18961, size = 135000
0 work, 135000 block, ratio 0.00
135000 in block, 107256 after MTF & 1-2 coding, 61+2 syms in use
initial group 6, [0 .. 0], has 20930 syms (19.5%)
initial group 5, [1 .. 1], has 4949 syms ( 4.6%)
initial group 4, [2 .. 2], has 20579 syms (19.2%)
initial group 3, [3 .. 4], has 17301 syms (16.1%)
initial group 2, [5 .. 10], has 24247 syms (22.6%)
initial group 1, [11 .. 62], has 19250 syms (17.9%)
pass 1: size is 127140, grp uses are 339 550 192 440 12 613
pass 2: size is 51693, grp uses are 321 440 288 316 139 642
pass 3: size is 51358, grp uses are 329 387 376 304 122 628
pass 4: size is 51302, grp uses are 298 421 397 304 125 601
bytes: mapping 21, selectors 433, code lengths 110, codes 51297
final combined CRC = 0x25a18961
2.602:1, 3.075 bits/byte, 61.57% saved, 135000 in, 51887 out.

real 0m0.165s
user 0m0.044s
sys 0m0.012s

Yup, does slightly more work (huffman coding) in 1/200 the time :)

(Note, on my system .Lazy BWT3 takes 5.3s on the same input)

Stefan

David Roundy

unread,

Jun 22, 2007, 5:01:38 PM6/22/07

to haskel...@haskell.org

On Fri, Jun 22, 2007 at 09:26:40PM +0100, Andrew Coppin wrote:

> OK, so I *started* writing a request for help, but... well I answered my
> own question! See the bottom...

..

> module BWT3 (bwt) where
>
> import Data.List
> import qualified Data.ByteString as Raw
>
> rotate :: Int -> Raw.ByteString -> Int -> Raw.ByteString
> rotate l xs n = (Raw.drop (l-n) xs) `Raw.append` (Raw.take (l-n) xs)
>
> bwt xs =
> let l = Raw.length xs
> ys = rotate l xs
> in Raw.pack $
> map (Raw.last . rotate l xs) $
> sortBy (\n m -> compare (ys n) (ys m)) [0..(l-1)]
>
>
> Now I can transform 52 KB in 54 seconds + 30 MB RAM. Still nowhere near
> C++, but a big improvement none the less.

The trouble is that Raw.append is an O(N) operation, making the computation
O(N^2) where it ought to be O(NlogN).

> Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM!
> Vast speed increases... Jeepers, I can transform 52 KB so fast I can't
> even get to Task Manager fast enough to *check* the RAM usage! Blimey...
>
> OK, just tried the 145 KB test file that Mr C++ used. That took 2
> seconds + 43 MB RAM. Ouch.

In this case append is an O(1) operation. But you're still getting killed
on prefactors, because you're generating a list of size N and then sorting
it. Lists are just not nice data structures to sort, nor are they nice to
have for large N.

To get better speed and memory use, I think you'd want to avoid the
intermediate list in favor of some sort of strict array, but that'd be
ugly.
--
David Roundy
Department of Physics
Oregon State University

Philippa Cowderoy

unread,

Jun 22, 2007, 5:30:13 PM6/22/07

to Andrew Coppin, haskel...@haskell.org

On Fri, 22 Jun 2007, Andrew Coppin wrote:

> Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM! Vast
> speed increases... Jeepers, I can transform 52 KB so fast I can't even get to
> Task Manager fast enough to *check* the RAM usage! Blimey...
>
> OK, just tried the 145 KB test file that Mr C++ used. That took 2 seconds + 43
> MB RAM. Ouch.
>

A note re RAM usage: it behaves differently in a GCed environment, you
might want to see if it runs with a smaller max heap size. Obviously
you'll spend more time GCing.

--
fli...@flippac.org

"My religion says so" explains your beliefs. But it doesn't explain
why I should hold them as well, let alone be restricted by them.

Andrew Coppin

unread,

Jun 23, 2007, 3:21:12 AM6/23/07

to haskel...@haskell.org

..OK...so how do I make Haskell go faster still?

Presumably by transforming the code into an ugly mess that nobody can
read any more...?

Donald Bruce Stewart

unread,

Jun 23, 2007, 3:25:31 AM6/23/07

to Andrew Coppin, haskel...@haskell.org

> ...OK...so how do I make Haskell go faster still?

>
> Presumably by transforming the code into an ugly mess that nobody can
> read any more...?
>

http://haskell.org/haskellwiki/Performance

-- Don

apfelmus

unread,

Jun 23, 2007, 5:30:48 AM6/23/07

to haskel...@haskell.org

David Roundy wrote:
> On Fri, Jun 22, 2007 at 09:26:40PM +0100, Andrew Coppin wrote:
>> OK, so I *started* writing a request for help, but... well I answered my
>> own question! See the bottom...
>

> ....

>
>> module BWT3 (bwt) where
>>
>> import Data.List
>> import qualified Data.ByteString as Raw
>>
>> rotate :: Int -> Raw.ByteString -> Int -> Raw.ByteString
>> rotate l xs n = (Raw.drop (l-n) xs) `Raw.append` (Raw.take (l-n) xs)
>>
>> bwt xs =
>> let l = Raw.length xs
>> ys = rotate l xs
>> in Raw.pack $
>> map (Raw.last . rotate l xs) $
>> sortBy (\n m -> compare (ys n) (ys m)) [0..(l-1)]
>

> The trouble is that Raw.append is an O(N) operation, making the computation
> O(N^2) where it ought to be O(NlogN).

Well, it actually takes O(N^2 log N) time: N log N comparisons each
taking O(N) time. With O(1) append, each comparison still has a
worst-case bound of O(N) (take the degenerate input (repeat 'a')) but it
will probably be much faster in practice. The log N factor can be
squeezed to a constant by using a simple trie, but to be even faster,
you need a clever suffix tree/array. IIRC, O(N) is the best worst case
bound for the burrows wheeler transform although they're said to be not
so good in practice.

Note that the one usually adds an "end of string" character $ in the
Burrows-Wheeler transform for compression such that sorting rotated
strings becomes sorting suffices.

Concerning the algorithm at hand, you can clearly avoid calculating
Raw.append over and over:

bwt :: Raw.ByteString -> Raw.ByteString
bwt xs = Raw.pack . map (Raw.last) . sort $ rotations
where
n = length xs
rotations = take n . map (take n) . tails $ xs `Raw.append` xs

assuming that take n is O(1).

>> Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM!
>> Vast speed increases... Jeepers, I can transform 52 KB so fast I can't
>> even get to Task Manager fast enough to *check* the RAM usage! Blimey...
>>
>> OK, just tried the 145 KB test file that Mr C++ used. That took 2
>> seconds + 43 MB RAM. Ouch.
>
> In this case append is an O(1) operation. But you're still getting killed
> on prefactors, because you're generating a list of size N and then sorting
> it. Lists are just not nice data structures to sort, nor are they nice to
> have for large N.
>
> To get better speed and memory use, I think you'd want to avoid the
> intermediate list in favor of some sort of strict array, but that'd be
> ugly.

Eh? You can't really avoid the list for sorting, in the sense that
before trying an array/in-place sort, you'd better use a trie.

Regards,
apfelmus

Andrew Coppin

unread,

Jun 23, 2007, 5:40:39 AM6/23/07

to haskel...@haskell.org

apfelmus wrote:
> Note that the one usually adds an "end of string" character $ in the
> Burrows-Wheeler transform for compression such that sorting rotated
> strings becomes sorting suffices.
>

Yeah, I noticed that the output from by program can never actually be
reverted to its original form. ;-) Maybe it I alter the code to stick a
0-byte in there or something...

(Hmm, I wonder how Mr C++ managed it? Heh. If I could read C++, maybe
I'd know...)

> Concerning the algorithm at hand, you can clearly avoid calculating
> Raw.append over and over:
>
> bwt :: Raw.ByteString -> Raw.ByteString
> bwt xs = Raw.pack . map (Raw.last) . sort $ rotations
> where
> n = length xs
> rotations = take n . map (take n) . tails $ xs `Raw.append` xs
>
> assuming that take n is O(1).
>

..which is amusing because that's what the *first* implementation did. ;-)

I was trying to avoid O(n^2) RAM usage. :-}

Bulat Ziganshin

unread,

Jun 23, 2007, 6:03:33 AM6/23/07

to Andrew Coppin, haskel...@haskell.org

Hello Andrew,

Saturday, June 23, 2007, 11:21:26 AM, you wrote:
> ...OK...so how do I make Haskell go faster still?

> Presumably by transforming the code into an ugly mess that nobody can
> read any more...?

bwt transformation is very good researched area, so probably you will
not get decent performance (megabytes per second) without lot of work.
and of course, no Haskell at all. take look at
http://darchiver.narod.ru/dark/Archon3fs.zip

--
Best regards,
Bulat mailto:Bulat.Z...@gmail.com

apfelmus

unread,

Jun 23, 2007, 6:22:05 AM6/23/07

to haskel...@haskell.org

Andrew Coppin wrote:
> apfelmus wrote:
>> Note that the one usually adds an "end of string" character $ in the
>> Burrows-Wheeler transform for compression such that sorting rotated
>> strings becomes sorting suffices.
>
> Yeah, I noticed that the output from by program can never actually be
> reverted to its original form.

Well it can, but that's a different story told in

Richard S. Bird and Shin-Cheng Mu.
Inverting the Burrows-Wheeler transform.
http://web.comlab.ox.ac.uk/oucl/work/richard.bird/publications.html
#DBLP:journals/jfp/BirdM04

Oh, and you had a function inv_bwt, right?

>> Concerning the algorithm at hand, you can clearly avoid calculating
>> Raw.append over and over:
>>
>> bwt :: Raw.ByteString -> Raw.ByteString
>> bwt xs = Raw.pack . map (Raw.last) . sort $ rotations
>> where
>> n = length xs
>> rotations = take n . map (take n) . tails $ xs `Raw.append` xs
>>
>> assuming that take n is O(1).
>

> I was trying to avoid O(n^2) RAM usage. :-}

Note that for ByteStrings, this takes only O(n) RAM because the
substrings are shared. But for lists, this would take O(n^2) RAM since
(take n) cannot share hole sublists. An O(n) choice for lists that
doesn't recalculate ++ all the time would be

bwt :: Ord a => [a] -> [a]
bwt xs = map last . sortBy (compare `on` (take n)) $ rotations

where
n = length xs

rotations = take n . tails $ xs ++ xs

with the well-known

on :: (a -> a -> c) -> (b -> a) -> (b -> b -> c)
on g f x y = g (f x) (f y)

Regards,
apfelmus

Andrew Coppin

unread,

Jun 23, 2007, 6:40:48 AM6/23/07

to haskel...@haskell.org

apfelmus wrote:

> Andrew Coppin wrote:
>
>> Yeah, I noticed that the output from by program can never actually be
>> reverted to its original form.
>>
>
> Well it can, but that's a different story told in
>
> Richard S. Bird and Shin-Cheng Mu.
> Inverting the Burrows-Wheeler transform.
> http://web.comlab.ox.ac.uk/oucl/work/richard.bird/publications.html
> #DBLP:journals/jfp/BirdM04
>
> Oh, and you had a function inv_bwt, right?
>

Implementation #1 had an inverse - but it works by finding an unused
character and prepending that to the input before doing the forward BWT. ;-)

For the other implementations, there is no inverse at all.

>> I was trying to avoid O(n^2) RAM usage. :-}
>>
>
> Note that for ByteStrings, this takes only O(n) RAM because the
> substrings are shared. But for lists, this would take O(n^2) RAM since
> (take n) cannot share hole sublists.

Presumably that's only true for *strict* ByteStrings?

(I still don't actually understand how lazy bytestrings are any
different to normal lists - or why they produced such a massive
performance increase...)

> An O(n) choice for lists that
> doesn't recalculate ++ all the time would be
>
> bwt :: Ord a => [a] -> [a]
> bwt xs = map last . sortBy (compare `on` (take n)) $ rotations
> where
> n = length xs
> rotations = take n . tails $ xs ++ xs
>
> with the well-known
>
> on :: (a -> a -> c) -> (b -> a) -> (b -> b -> c)
> on g f x y = g (f x) (f y)
>

Interesting... But how does it perform? ;-)

Andrew Coppin

unread,

Jun 23, 2007, 6:44:45 AM6/23/07

to haskel...@haskell.org

Bulat Ziganshin wrote:
> Hello Andrew,

>
> bwt transformation is very good researched area, so probably you will
> not get decent performance (megabytes per second) without lot of work.
>

Hey, I'm just glad I managed to get within striking distance of Mr C++.
So much for Haskell being "inherently less performant". :-P

> and of course, no Haskell at all. take look at
> http://darchiver.narod.ru/dark/Archon3fs.zip
>

Mmm, ok...

Bulat Ziganshin

unread,

Jun 23, 2007, 7:39:21 AM6/23/07

to Andrew Coppin, haskel...@haskell.org

Hello Andrew,

Saturday, June 23, 2007, 2:45:01 PM, you wrote:

> Hey, I'm just glad I managed to get within striking distance of Mr C++.
> So much for Haskell being "inherently less performant". :-P

my little analysis says that it's probably due to different sort()
implementations, so this says nothing about general case

--
Best regards,
Bulat mailto:Bulat.Z...@gmail.com

_______________________________________________

Andrew Coppin

unread,

Jun 23, 2007, 8:02:35 AM6/23/07

to haskel...@haskell.org

Bulat Ziganshin wrote:
> Hello Andrew,
>
> Saturday, June 23, 2007, 2:45:01 PM, you wrote:
>
>
>> Hey, I'm just glad I managed to get within striking distance of Mr C++.
>> So much for Haskell being "inherently less performant". :-P
>>
>
> my little analysis says that it's probably due to different sort()
> implementations, so this says nothing about general case
>

The point being that lots of people look at Haskell and go "oh, that's
very cute for writing trivial example code, but it can never be fast;
for that you must use C or C++".

Well, I altered the code, and it's *still* very short and very readable,
and it's just as fast as the (3 pages long) C++ version. >:-D

Jon Harrop

unread,

Jun 23, 2007, 9:35:39 AM6/23/07

to haskel...@haskell.org

On Saturday 23 June 2007 13:02:54 Andrew Coppin wrote:
> Well, I altered the code, and it's *still* very short and very readable,
> and it's just as fast as the (3 pages long) C++ version. >:-D

Indeed. The performance of modern functional programming languages never
ceases to amaze me. INRIA typically claim performance within 2x of C for
OCaml but there are very few programs where OCaml isn't within 50% now, at
least on AMDs.

Most of the pedagogical examples of functional programming (e.g. n-queens) are
too academic for general consumption but there are plenty of excellent
examples. Burrows-Wheeler is a great one. Ray tracing is another. Sudoku
solving is fairy good, albeit unusually well suited to arrays.

I also like to compare the performance of term-level interpreters or rewriters
implemented in different languages. I think it would be interesting to
compare Scheme/Mathematica interpreters written in OCaml, SML (MLton) and
Haskell, for example. I've benchmarked some interpreters in OCaml and SML
(MLton) and MLton's whole-program optimizations are a huge benefit here,
making it several times faster than OCaml. I'd like to know how well Haskell
would do.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?e

Bulat Ziganshin

unread,

Jun 23, 2007, 10:45:23 AM6/23/07

to Andrew Coppin, haskel...@haskell.org

Hello Andrew,

Saturday, June 23, 2007, 4:02:54 PM, you wrote:
> The point being that lots of people look at Haskell and go "oh, that's
> very cute for writing trivial example code, but it can never be fast;
> for that you must use C or C++".

and that's true :) as i said, your C++ code is very far from perfect.
by comparing with bad code you can find that even Logo is faster than
asm

--
Best regards,
Bulat mailto:Bulat.Z...@gmail.com

_______________________________________________

Bertram Felgenhauer

unread,

Jun 23, 2007, 5:29:28 PM6/23/07

to haskel...@haskell.org

Andrew Coppin wrote:
> apfelmus wrote:
> >Note that the one usually adds an "end of string" character $ in the
> >Burrows-Wheeler transform for compression such that sorting rotated
> >strings becomes sorting suffices.
> >
>
> Yeah, I noticed that the output from by program can never actually be
> reverted to its original form. ;-) Maybe it I alter the code to stick a
> 0-byte in there or something...

To recover the original string you either need to store the offset to
the first character or add a sentinel character to mark the string end.
One advantage of the sentinel character approach is that it's sufficient
to sort just the suffixes of the text. A disadvantage is that you need
an additional character. The code below reserves \0 for this purpose.

>>> Code: >>>

"bwt" implements a variation of the Burrows-Wheeler transform, using
\0 as a sentinel character for simplicity. The sentinel has to be smaller
than all other characters in the string.

> bwt xs = let
> suffixes = [(a,as) | a:as <- tails ('\0':xs)]
> in
> map fst . sortBy (\(_,a) (_,b) -> a `compare` b) $ suffixes

"rbwt" implements the corresponding inverse BWT. It's a fun knot tying
exercise.

> rbwt xs = let
> res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res)
> in
> tail . map snd . zip xs $ head res

"zipWith'" is a variant of zipWith that asserts that the third argument
has the same shape as the second one.

> zipWith' f [] _ = []
> zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys

<<< End Code <<<

Both transforms use linear memory (but quite a lot, admittedly).

regards,

Bertram

Chris Kuklewicz

unread,

Jun 23, 2007, 10:24:59 PM6/23/07

to Bertram Felgenhauer, haskel...@haskell.org

I enjoy code like this that requires laziness. My modified version of your code
is below...

Bertram Felgenhauer wrote:
>>>> Code: >>>
>
> "bwt" implements a variation of the Burrows-Wheeler transform, using
> \0 as a sentinel character for simplicity. The sentinel has to be smaller
> than all other characters in the string.
>
>> bwt xs = let
>> suffixes = [(a,as) | a:as <- tails ('\0':xs)]
>> in
>> map fst . sortBy (\(_,a) (_,b) -> a `compare` b) $ suffixes
>
> "rbwt" implements the corresponding inverse BWT. It's a fun knot tying
> exercise.
>
>> rbwt xs = let
>> res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res)
>> in
>> tail . map snd . zip xs $ head res
>
> "zipWith'" is a variant of zipWith that asserts that the third argument
> has the same shape as the second one.
>
>> zipWith' f [] _ = []
>> zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys
>
> <<< End Code <<

I did not like the look of (map snd . zip xs) since it looks like a no-op (that
constructs a useless (,) which may or may not be elided by a sufficiently smart
compiler). But it is using the fact that xs is finite and (head res) is not to
do "take (length xs) $ head res" without the extra traversal and math. But one
can abuse (zipWith' (flip const)) for a 'rbwt' appeals to me more:

> import Data.List
>
> f `on` g = \x y -> (g x) `f` (g y)

>
> zipWith' f [] _ = []
> zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys
>

> bwt = map head . sortBy (compare `on` tail) . init . tails . ('\0':)
>
> rbwt xs = tail $ zipWith' (flip const) xs $ head res
> where res = sortBy (compare `on` head) (zipWith' (:) xs res)

While I was removing (,) from the 'rbwt' I went ahead and removed it from 'bwt'
as well. This was, of course, pointless...

Thanks for the fun example,
Chris

Felipe Almeida Lessa

unread,

Jun 23, 2007, 10:26:56 PM6/23/07

to Bertram Felgenhauer, haskel...@haskell.org

On 6/23/07, Bertram Felgenhauer <bertram.f...@googlemail.com> wrote:
> "rbwt" implements the corresponding inverse BWT. It's a fun knot tying
> exercise.
>
> > rbwt xs = let
> > res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res)
> > in
> > tail . map snd . zip xs $ head res

Indeed it's very fun =). Although I must admit that, even after
staring at the code for some time, I still don't see how it works. Is
it too much to ask for a brief explanation?

I see that zipWith' creates all the lazy list without requiring any
knowledge of the second list, which means that the list to be sorted
is created nicely. But when sortBy needs to compare, say, the first
with the second items, it forces the thunk of the lazy list. But that
thunk depends on the sorted list itself!

What am I missing? ;)

Thanks in advance,

--
Felipe.

Chris Kuklewicz

unread,

Jun 23, 2007, 10:56:30 PM6/23/07

to Felipe Almeida Lessa, haskel...@haskell.org

Felipe Almeida Lessa wrote:
> On 6/23/07, Bertram Felgenhauer <bertram.f...@googlemail.com> wrote:
>> "rbwt" implements the corresponding inverse BWT. It's a fun knot tying
>> exercise.
>>
>> > rbwt xs = let
>> > res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs
>> res)
>> > in
>> > tail . map snd . zip xs $ head res
>
> Indeed it's very fun =). Although I must admit that, even after
> staring at the code for some time, I still don't see how it works. Is
> it too much to ask for a brief explanation?
>
> I see that zipWith' creates all the lazy list without requiring any
> knowledge of the second list, which means that the list to be sorted
> is created nicely. But when sortBy needs to compare, say, the first
> with the second items, it forces the thunk of the lazy list. But that
> thunk depends on the sorted list itself!
>
> What am I missing? ;)
>
> Thanks in advance,
>

I just read and understood it. I'll walk through my version:

> import Data.List
>
> f `on` g = \x y -> (g x) `f` (g y)
>

> zipWith' f [] _ = []
> zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys
>

> bwt = map head . sortBy (compare `on` tail) . init . tails . ('\0':)
>
> rbwt xs = tail $ zipWith' (flip const) xs $ head res

> where res = sortBy (compare `on` head) (zipWith' (:) xs res)

The key is that sort and sortBy are very very lazy.

'res' is a finite list of infinite strings.

Consider sorting such a finite list of infinite lists without
the recursive definition:

> example =
> let a = [1,3..]
> b = [1,2..]
> c = [2,4..]
> d = [1,4..]
> sorted = sort [a,b,c,d]
> firstFiveOfEach = map (take 5) sorted
> in mapM_ print firstFiveOfEach

Which in ghci produces:

> *Main> example
> [1,2,3,4,5]
> [1,3,5,7,9]
> [1,4,7,10,13]
> [2,4,6,8,10]

Now for a sidetrack:

> example =
> let a = [1,3..]
> b = [1,2..]
> c = [2,4..]
> d = [1,4..]
> sorted = sort [a,b,c,d,c]
> firstFiveOfEach = map (take 5) sorted
> in mapM_ print firstFiveOfEach

Produces the first three elements before 'c' ....

> *Main> example
> [1,2,3,4,5]
> [1,3,5,7,9]
> [1,4,7,10,13]

..and then hangs because "compare c c" never terminates.

This lazy sorting is also why "head . sort $ foo" is an O(N) way to get the
minimum of a list "foo" in Haskell.

So the code for rbwt depends on this lazy sorting behavior. The recursive
nature of the definition is the difficult to (directly) invent part.

The result of (zipWith' (:) xs ___) is a list the same length as 'xs' and where
each item is a list with the head item taken from 'xs'. So it has "primed the
pump" of the recursive definition by giving the first Char in each of the strings.

Compare with the much simpler 'ones' and 'odds':

ones = 1 : ones
odds = 1 : map (2+) odds

What about the tails of all these lists? They come from res. And res comes
from sorting the results of zipWith'. But the sorting is done only by looking
at the head element, which is the one that comes from xs. That is the "sortBy
(\(a:as) (b:bs) -> a `compare` b)" or "sortBy (compare `on` head)".

So the sorting routine does not care about the ___ in (zipWith' (:) xs ___) at
all, since it only examines the head which comes from xs.

Bertram Felgenhauer

unread,

Jun 24, 2007, 1:22:45 AM6/24/07

to haskel...@haskell.org

Felipe Almeida Lessa wrote:
> On 6/23/07, Bertram Felgenhauer <bertram.f...@googlemail.com> wrote:
> >"rbwt" implements the corresponding inverse BWT. It's a fun knot tying
> >exercise.
> >
> >> rbwt xs = let
> >> res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res)
> >> in
> >> tail . map snd . zip xs $ head res
>
> Indeed it's very fun =). Although I must admit that, even after
> staring at the code for some time, I still don't see how it works. Is
> it too much to ask for a brief explanation?
>
> I see that zipWith' creates all the lazy list without requiring any
> knowledge of the second list, which means that the list to be sorted
> is created nicely. But when sortBy needs to compare, say, the first
> with the second items, it forces the thunk of the lazy list. But that
> thunk depends on the sorted list itself!
>
> What am I missing? ;)

Mainly the precise effect of the irrefutable pattern in zipWith',
I think:

> zipWith' f [] _ = []
> zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys

The ~(y:ys) does not pattern match immediately; instead it asserts
that the argument will have a (:) constructor later, when either
y or ys are forced. The last line is equivalent to

> zipWith' f (x:xs) ys' = let (y:ys) = ys' in f x y : zipWith' f xs ys

which may be easier to understand.

The end effect is that

> map head $ zipWith' (:) [1..3] undefined

returns [1..3], never touching the "undefined" value at all.

Trying

> zipWith' [1..] [[]]

is also instructive:

[[1],[2*** Exception: bwt.hs:(18,0)-(19,51): Irrefutable pattern failed for pattern (y : ys)

Note the point where the error happens.

The second thing is that the comparison in "rbwt" only compares
the first character from the result lists - that is, it really
only sorts the list xs, annotated with some additional information
that is not available yet (isn't lazy evaluation great?).

Once the sorting is done, the ~(y:ys) pattern matches of "zipWith'"
all succeed. The effect is that a single circular list is formed
from the characters in xs, with each rotation of the list occuring
exactly once. In fact we get all rotations of the original text, in
sorted order. This is by no means obvious, but is how the inverse
BWT works.

Hope that helps,

Bertram

P.S. Thanks Chris for your improvements :)