Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Space leak

92 views
Skip to first unread message

Paul Rubin

unread,
Dec 12, 2012, 5:36:43 PM12/12/12
to
I had a space leak in a program that I distilled down to the following:

main = print $ length (replicateM 15 "012")

This should generate all the 15-digit ternary numbers (with leading 0's)
so the length of the list is 3**15=14348907. It gets that result
correctly, but uses almost a gigabyte of ram in computing it.

As an exercise I wrote my own version of the List datatype and the monad
instance for it doing the obvious things, and the same behavior
happened. Writing a strict version of the length function using foldl1'
didn't help either.

It's not real obvious to me what's going on. Is there a simple
explanation why replicateM doesn't generate values lazily that are
consumed by length?

Thanks.

Piotr Kalinowski

unread,
Dec 13, 2012, 7:58:52 AM12/13/12
to
replicateM could produce result lazily, but length still needs to
traverse the whole list in order to determine its length. This process
of traversing the list forces evaluation of all its contents.

In other words, length does not consume values returned by replicateM.
It consumes a single value: a list, and it needs to fully evaluate it's
structure.

Regards,
Piotr Kalinowski

Paul Rubin

unread,
Dec 13, 2012, 11:40:41 AM12/13/12
to
Piotr Kalinowski <pit...@gmail.com> writes:
> In other words, length does not consume values returned by replicateM.
> It consumes a single value: a list, and it needs to fully evaluate it's
> structure.

I don't understand, the list it receives is in WHNF so it should just
be a single (:) constructor with a value and a thunk.

If I type "length [1..]" into ghci, it burns cpu but doesn't expand
in memory. What is different?

Dirk Thierbach

unread,
Dec 13, 2012, 6:07:00 PM12/13/12
to
I didn't verify this in detail, but if you look at the definitions:

replicateM n x = sequence (replicate n x)

sequence ms = foldr k (return []) ms where
k m m' = do { x <- m; xs <- m'; return (x:xs) }

instance Monad [] where
m >>= k = foldr ((++) . k) [] m

you see that there's a lot of (++) in the monadic composition in addition
to sharing of the result from m'. So my intuition is that all these
pile up as unevaluated expressions and use up memory.

If you'd implement the same algorithm more directly, something
along the lines of

count :: Int -> [a] -> [a] -> [[a]]
count 0 _ _ = [[]]
count k [] ys = []
count k (x:xs) ys = map (x:) (count (k-1) ys ys) ++ count k xs ys

count' k xs = count k xs xs

main = print $ length $ count' 15 "012"

i.e. without (++) and not sharing the recursive call, you should see
a decrease of memory usage. As I said, untested and just my intuition.
You're welcome to test and maybe add some strictness till it works
correctly. :-)

- Dirk

Piotr Kalinowski

unread,
Dec 17, 2012, 8:18:24 AM12/17/12
to
Paul Rubin <no.e...@nospam.invalid> writes:

> If I type "length [1..]" into ghci, it burns cpu but doesn't expand
> in memory. What is different?

Hmm, true. I might be wrong. I'm not an expert here.

Regards,
Piotr Kalinowski

Albert Y. C. Lai

unread,
Dec 24, 2012, 6:51:40 PM12/24/12
to
On 12-12-12 05:36 PM, Paul Rubin wrote:
> I had a space leak in a program that I distilled down to the following:
>
> main = print $ length (replicateM 15 "012")
>
> This should generate all the 15-digit ternary numbers (with leading 0's)
> so the length of the list is 3**15=14348907. It gets that result
> correctly, but uses almost a gigabyte of ram in computing it.

This is very interesting. Experiment environment:

Ubuntu 12.04 x86 32-bit, GHC 7.4.2 from GHC website, use command line
option "+RTS -t" to count memory footprint.

The following program seems to take exponential memory when optimized,
constant memory when unoptimized (yes yes should be linear, but the
slope is so small):

import System.Environment(getArgs)
main = do
n <- (read . head) `fmap` getArgs
print (length (mysequence (replicate n "012")))
mysequence [] = return []
mysequence (m:ms) = do
x <- m
xs <- mysequence ms
return (x:xs)

(Commands:
ghc -fforce-recomp f1.hs
./f1 10 +RTS -t
play with other numbers

ghc -fforce-recomp -O f1.hs
./f1 10 +RTS -t
play with other numbers)

The following program seems to take exponential memory whether optimized
or not:

import System.Environment(getArgs)
main = do
n <- (read . head) `fmap` getArgs
print (length (mysequence (replicate n "012")))
mysequence ms = foldr k (return []) ms where
k m m' = do
x <- m
xs <- m'
return (x:xs)

Supplying homebrew foldrs does not seem to make a difference.

You are right in that the program should and can take just linear memory
under a standard lazy evaluation model.

Wolfgang Ehrlich

unread,
Jan 8, 2013, 9:15:34 AM1/8/13
to
Just guessing: It may be the algorithm that causes the space leak.

To create the list of 2^15 elements the algorithm creates a sublist of 2^14 elements and uses this sublist three times to build the requested list. It is the sublist that occupies the memory.

Just a simple example for illustration:

burn_memory :: Int -> Int
burn_memory n =
let sublist = [1 .. n]
in length (sublist ++ sublist)

Wolfgang Ehrlich

unread,
Jan 9, 2013, 3:15:00 AM1/9/13
to
Oops, I got something wrong: "2^15" should be "3^15" and
"2^14" should be "3^14". Sorry for the inconvenience.

Back to the problem. Calling

Control.Monad.replicateM n "012"

with different values of *n* yields:

n=0: [""]
n=1: ["0","1","2"]
n=2: ["00","01","02","10","11","12","20","21","22"]
n=3:
["000","001","002","010","011","012","020","021","022"
,"100","101","102","110","111","112","120","121","122"
,"200","201","202","210","211","212","220","221","222"
]

This clearly shows how each list is built by using the previous
list three times.

As a remedy, you might consider to devise an algorithm that
builds the lists differently, avoiding the multiple use of
complete intermediate lists, e.g. by using each list element
multiple times. The resulting lists would be as shown below:

n=0: [""]
n=1: ["0","1","2"]
n=2: ["00","10","20","01","11","21","02","12","22"]
n=3:
["000","100","200","010","110","210","020","120","220"
,"001","101","201","011","111","211","021","121","221"
,"002","102","202","012","112","212","022","122","222"
]
0 new messages