# Space leak

91 views

### Paul Rubin

Dec 12, 2012, 5:36:43 PM12/12/12
to
I had a space leak in a program that I distilled down to the following:

main = print \$ length (replicateM 15 "012")

This should generate all the 15-digit ternary numbers (with leading 0's)
so the length of the list is 3**15=14348907. It gets that result
correctly, but uses almost a gigabyte of ram in computing it.

As an exercise I wrote my own version of the List datatype and the monad
instance for it doing the obvious things, and the same behavior
happened. Writing a strict version of the length function using foldl1'
didn't help either.

It's not real obvious to me what's going on. Is there a simple
explanation why replicateM doesn't generate values lazily that are
consumed by length?

Thanks.

### Piotr Kalinowski

Dec 13, 2012, 7:58:52 AM12/13/12
to
replicateM could produce result lazily, but length still needs to
traverse the whole list in order to determine its length. This process
of traversing the list forces evaluation of all its contents.

In other words, length does not consume values returned by replicateM.
It consumes a single value: a list, and it needs to fully evaluate it's
structure.

Regards,
Piotr Kalinowski

### Paul Rubin

Dec 13, 2012, 11:40:41 AM12/13/12
to
Piotr Kalinowski <pit...@gmail.com> writes:
> In other words, length does not consume values returned by replicateM.
> It consumes a single value: a list, and it needs to fully evaluate it's
> structure.

I don't understand, the list it receives is in WHNF so it should just
be a single (:) constructor with a value and a thunk.

If I type "length [1..]" into ghci, it burns cpu but doesn't expand
in memory. What is different?

### Dirk Thierbach

Dec 13, 2012, 6:07:00 PM12/13/12
to
I didn't verify this in detail, but if you look at the definitions:

replicateM n x = sequence (replicate n x)

sequence ms = foldr k (return []) ms where
k m m' = do { x <- m; xs <- m'; return (x:xs) }

m >>= k = foldr ((++) . k) [] m

you see that there's a lot of (++) in the monadic composition in addition
to sharing of the result from m'. So my intuition is that all these
pile up as unevaluated expressions and use up memory.

If you'd implement the same algorithm more directly, something
along the lines of

count :: Int -> [a] -> [a] -> [[a]]
count 0 _ _ = [[]]
count k [] ys = []
count k (x:xs) ys = map (x:) (count (k-1) ys ys) ++ count k xs ys

count' k xs = count k xs xs

main = print \$ length \$ count' 15 "012"

i.e. without (++) and not sharing the recursive call, you should see
a decrease of memory usage. As I said, untested and just my intuition.
You're welcome to test and maybe add some strictness till it works
correctly. :-)

- Dirk

### Piotr Kalinowski

Dec 17, 2012, 8:18:24 AM12/17/12
to
Paul Rubin <no.e...@nospam.invalid> writes:

> If I type "length [1..]" into ghci, it burns cpu but doesn't expand
> in memory. What is different?

Hmm, true. I might be wrong. I'm not an expert here.

Regards,
Piotr Kalinowski

### Albert Y. C. Lai

Dec 24, 2012, 6:51:40 PM12/24/12
to
On 12-12-12 05:36 PM, Paul Rubin wrote:
> I had a space leak in a program that I distilled down to the following:
>
> main = print \$ length (replicateM 15 "012")
>
> This should generate all the 15-digit ternary numbers (with leading 0's)
> so the length of the list is 3**15=14348907. It gets that result
> correctly, but uses almost a gigabyte of ram in computing it.

This is very interesting. Experiment environment:

Ubuntu 12.04 x86 32-bit, GHC 7.4.2 from GHC website, use command line
option "+RTS -t" to count memory footprint.

The following program seems to take exponential memory when optimized,
constant memory when unoptimized (yes yes should be linear, but the
slope is so small):

import System.Environment(getArgs)
main = do
print (length (mysequence (replicate n "012")))
mysequence [] = return []
mysequence (m:ms) = do
x <- m
xs <- mysequence ms
return (x:xs)

(Commands:
ghc -fforce-recomp f1.hs
./f1 10 +RTS -t
play with other numbers

ghc -fforce-recomp -O f1.hs
./f1 10 +RTS -t
play with other numbers)

The following program seems to take exponential memory whether optimized
or not:

import System.Environment(getArgs)
main = do
print (length (mysequence (replicate n "012")))
mysequence ms = foldr k (return []) ms where
k m m' = do
x <- m
xs <- m'
return (x:xs)

Supplying homebrew foldrs does not seem to make a difference.

You are right in that the program should and can take just linear memory
under a standard lazy evaluation model.

### Wolfgang Ehrlich

Jan 8, 2013, 9:15:34 AM1/8/13
to
Just guessing: It may be the algorithm that causes the space leak.

To create the list of 2^15 elements the algorithm creates a sublist of 2^14 elements and uses this sublist three times to build the requested list. It is the sublist that occupies the memory.

Just a simple example for illustration:

burn_memory :: Int -> Int
burn_memory n =
let sublist = [1 .. n]
in length (sublist ++ sublist)

### Wolfgang Ehrlich

Jan 9, 2013, 3:15:00 AM1/9/13
to
Oops, I got something wrong: "2^15" should be "3^15" and
"2^14" should be "3^14". Sorry for the inconvenience.

Back to the problem. Calling

with different values of *n* yields:

n=0: [""]
n=1: ["0","1","2"]
n=2: ["00","01","02","10","11","12","20","21","22"]
n=3:
["000","001","002","010","011","012","020","021","022"
,"100","101","102","110","111","112","120","121","122"
,"200","201","202","210","211","212","220","221","222"
]

This clearly shows how each list is built by using the previous
list three times.

As a remedy, you might consider to devise an algorithm that
builds the lists differently, avoiding the multiple use of
complete intermediate lists, e.g. by using each list element
multiple times. The resulting lists would be as shown below:

n=0: [""]
n=1: ["0","1","2"]
n=2: ["00","10","20","01","11","21","02","12","22"]
n=3:
["000","100","200","010","110","210","020","120","220"
,"001","101","201","011","111","211","021","121","221"
,"002","102","202","012","112","212","022","122","222"
]