Transparent results cache

Mark T. B. Carroll

unread,

Nov 18, 2009, 1:29:07 PM11/18/09

to

I wrote something that is possibly a bit horrifying. I seem to have got
away with it for now. I don't normally use things like unsafePerformIO
but it occurred to me that I could write a kind of transparent results
cache for functions. I'd certainly be interested in comments about how
I screwed it up or shouldn't even be thinking such awful things. At
least, given its apparent generality, it seemed worth sharing to find
out how others had handled this kind of issue.

The Nothing case in useCache may be inadequately self-explanatory.
When it comes to write a new argument into into the round-robin array
of arguments we've seen, by which we expire things in a sort of FIFO
way, we also remove the old cached result from the results map.
fromMaybe may be too opaque a way to achieve the conditional removal.

(I suppose it could be more keen to retain recently-consulted results,
but that might slow it down too much. And perhaps I could have been
cleverer with the counter to avoid the Maybe. Also, it probably acts
poorly with parallelism. I wonder if the NOINLINE should instead be
on the user of cached ... this is a bit of a new direction for me.)

Mark

import Control.Monad (liftM)
import Data.Array.IO
import Data.IORef
import qualified Data.Map as Map
import Data.Maybe (fromMaybe)
import System.IO.Unsafe (unsafePerformIO)

data Cache a b = Cache
{
counter :: Int,
size :: Int,
expiry :: IOArray Int (Maybe a),
results :: Map.Map a b
}

{-# NOINLINE cached #-}

cached :: Ord a => (a -> b) -> Int -> a -> b

cached f size =
unsafePerformIO $
do expiry <- newArray (0, pred size) Nothing
cacheRef <- newIORef (Cache 0 size expiry Map.empty)
return $ useCache cacheRef f

useCache :: Ord a => IORef (Cache a b) -> (a -> b) -> a -> b

useCache cacheRef f args =
unsafePerformIO $
do cache <- readIORef cacheRef
case Map.lookup args (results cache) of
Just result ->
return result
Nothing ->
do expiredArgs <- readArray (expiry cache) (counter cache)
newExpiry <- writeArray (expiry cache) (counter cache) (Just args)
let expiredResults = fromMaybe id (liftM Map.delete expiredArgs) $ results cache
let result = f args
writeIORef cacheRef $ cache { counter = mod (succ (counter cache)) (size cache),
results = Map.insert args result expiredResults }
return result

Mark T. B. Carroll

unread,

Nov 20, 2009, 11:14:36 AM11/20/09

to

"Mark T. B. Carroll" <Mark.C...@Aetion.com> writes:

> I wonder if the NOINLINE should instead be on the user of cached

> [ snip code ]

I did indeed grow doubtful about where I'd put it and have tried moving
it around and thinking about it more.

Still, now I'm getting the occasional segmentation fault from my code.
I'm not knowingly using threading or concurrency, unless
Control.Parallel.Strategies.rnf does it behind the scenes in a way that
doesn't need RTS options to activate, so I wonder if I've somehow
misused unsafePerformIO or IORef here.

Mark

Dan Doel

unread,

Nov 20, 2009, 5:13:34 PM11/20/09

to

Mark T. B. Carroll

unread,

Nov 20, 2009, 5:23:42 PM11/20/09

to

Dan Doel <dan....@gmail.com> writes:

> You may be interested in this paper:
>
> http://research.microsoft.com/en-us/um/people/simonpj/papers/weak.ps.gz

Huh, yes, I seem good at reinventing wheels. (The lessons stick with me
better that way though!) Thank you. Some of that code looks rather familiar!

> And, I do think you probably want NOINLINE on the call of cached (otherwise
> the call may be inlined, and you may end up with multiple separately
> memoized functions), but I'm not 100% sure. I'm not sure why that'd be
> causing segfaults.

Mmm. I'm hoping that it doesn't become enough of a problem that I have
to try to put together a simple test case that evinces the bug.

Mark

Mark T. B. Carroll

unread,

Jan 22, 2010, 2:10:53 PM1/22/10

to

I /think/ what's going on is it happens when I cache a thunk instead of
a straight value. (Some of the cached functions use unsafePerformIO
themselves.) Except: some of the functions I want to cache values of,
it's much more expensive to fully evaluate the value, and often only a
small part of it is consumed by users, though I suppose it still beats
segfaults. I don't have a good story yet as to why this happens, though,
it's not always at the same point in the run.

Mark