[racket] Randomized bisimulation tests and Metropolis-Hastings random walk

Neil Toronto

unread,

Jan 4, 2013, 4:45:39 PM1/4/13

to us...@racket-lang.org

I get excited about applying statistics to programming. Here's something
exciting I found today.

I'm working on a Typed Racket implementation of Chris Okasaki's purely
functional random-access lists, which are O(1) for `cons', `first' and
`rest', and basically O(log(n)) for random access. I wanted solid
randomized tests, and I figured the best way would be bisimulation: do
exactly the same, random thing to a (Listof Integer) and an (RAList
Integer), then ensure the lists are the same. Iterate N times, each time
using the altered lists for the next bisimulation step.

There's a problem with this, though: the test runtimes are all over the
place for any fixed N, and are especially wild with large N. Sometimes
it first does a bunch of `cons' operations in a row. Because each
bisimulation step is O(n) in the list length (to check equality), this
makes the remaining steps very slow. Sometimes it follows up with a
bunch of `rest' operations, which makes the remaining steps very fast.

More precisely, random bisimulation does a "simple random walk" over
test list lengths, where `cons' is a step from n to n+1 and `rest' is a
step from n to n-1. Unfortunately, the n's stepped on in a simple random
walk have no fixed probability distribution, and thus no fixed average
or variance. That means each bisimulation step has no fixed average
runtime or fixed variation in runtime. Wildness ensues.

One possible solution is to generate fresh test lists from a fixed
distribution, for each bisimulation step. This is a bad solution,
though, because I want to see whether RAList instances behave correctly
*over time* as they're operated on.

The right solution is to use a Metropolis-Hastings random walk instead
of a simple random walk, to sample list lengths from a fixed
distribution. Then, the list lengths will have a fixed average, meaning
that each bisimulation step will have a fixed average runtime, and my
overall average test runtime will be O(N) instead of wild and crazy.

First, I choose a distribution. The geometric family is good because
list lengths are nonnegative. Geometric with p = 0.05 means list lengths
should be (1-p)/p = 19 on average, but are empty with probability 0.05
and very large with very low probability.

So I add these two lines:

(require math/distributions)

(define length-dist (geometric-dist 0.05))

In a Metropolis-Hastings random walk, you step from n to n' with
probability min(1,P(n')/P(n)); this is called "accepting" the step.
Otherwise, you "reject" the step by staying in the same spot. Adding
this new test takes only two defines and a `cond':

(define r (random 5)) ; Randomly choose an operation
(cond
[(= r 0)
;; New! Only cons with probability P(n+1)/P(n)
(define n (length lst))
(define accept (/ (pdf length-dist (+ n 1))
(pdf length-dist n)))
(cond [((random) . < . accept)
.... Accept: cons and ensure continued equality ....]
[else
.... Reject: return the same lists ....])]
[(= r 1)
.... Accept: `rest' each list and ensure continued equality ....]
.... Other random operations ....)

I left off the acceptance test for `rest' because P(n-1) > P(n) ==>
P(n-1)/P(n) > 1, so the test would always pass. (This is because the
geometric distribution's pdf is monotone. If I had chosen a Poisson
distribution, which has a bump in the middle, I'd have to do the
acceptance test for both `cons' and `rest'.)

I instrument the test loop to record list lengths, verify that their
mean is about 19, and do some density plots vs. the geometric
distribution. It all looks good: the randomized bisimulation test does
indeed operate on lists whose length is distributed Geometric(0.05), so
the overall average test runtime is O(N), and it still tests them as
they're operated on over time.

Neil ⊥
____________________
Racket Users list:
http://lists.racket-lang.org/users

Sam Tobin-Hochstadt

unread,

Jan 4, 2013, 4:54:39 PM1/4/13

to Neil Toronto, us...@racket-lang.org

Have you tried testing your implementation against either David Van
Horn's implementation (not in TR) [1], or Hari's implementation (in
TR) [2]? Or against VLists (also in Hari's PFDs package)?

[1] https://github.com/dvanhorn/ralist
[2] https://github.com/takikawa/tr-pfds/tree/master/data/ralist

On Fri, Jan 4, 2013 at 4:45 PM, Neil Toronto <neil.t...@gmail.com> wrote:
>
> I'm working on a Typed Racket implementation of Chris Okasaki's purely
> functional random-access lists, which are O(1) for `cons', `first' and
> `rest', and basically O(log(n)) for random access. I wanted solid randomized
> tests, and I figured the best way would be bisimulation: do exactly the
> same, random thing to a (Listof Integer) and an (RAList Integer), then
> ensure the lists are the same. Iterate N times, each time using the altered
> lists for the next bisimulation step.

Robby Findler

unread,

Jan 4, 2013, 5:06:33 PM1/4/13

to Neil Toronto, us...@racket-lang.org

Very cool! I've run into this problem a few times.

And here's an example where it happens right now ....

http://drdr.racket-lang.org/26021/collects/tests/future/random-future.rkt

Robby

Neil Toronto

unread,

Jan 4, 2013, 5:42:51 PM1/4/13

to Robby Findler, us...@racket-lang.org

On 01/04/2013 03:06 PM, Robby Findler wrote:
> Very cool! I've run into this problem a few times.
>
> And here's an example where it happens right now ....
>
> http://drdr.racket-lang.org/26021/collects/tests/future/random-future.rkt

Yeah, your problem is very similar. The difference is that in my case, I
had averages and variance that changed with the number of tests I wanted
to run. If I understand correctly how Redex generates terms, you're
getting independent, random programs from a fixed distribution. It just
happens that the distribution of program sizes has a *lot* of variance -
it has a "fat tail".

I'd try simple rejection sampling first:

(define gen-exp (let ([f (generate-term fut exp)])
(λ ()
(define t (f 10))
(cond [(> (length (flatten t)) n) (gen-exp)]
[else t]))))

When I had n = 100 and a `printf' that printed expression sizes, I got
output like this from running (gen-exp):

375
5092
45318
2331
2184
20

I didn't see more than 13 rejections in about 30 samples (most were
around 3-6), so the rejection loop seems fast enough.

If you actually want unbounded program sizes, there are other rejection
methods that'll thin the tail of your distribution without zeroing it
out. But you probably don't want that.

Robby Findler

unread,

Jan 4, 2013, 5:56:13 PM1/4/13

to Neil Toronto, us...@racket-lang.org

So maybe I should just have a smaller size bound? (the 10 in the 'gen-exp' function)

Really, perhaps, what I should do is not have a smaller size bound but instead just make the number of tests I do (32 in the currently pushed thing) depend on how long I've been testing.

Robby

Ray Racine

unread,

Jan 4, 2013, 7:50:16 PM1/4/13

to Neil Toronto, us...@racket-lang.org

MCMC with Gibbs Sampling and MH for straightforward stuff is straightforward, but the subtitles of underflow, use log space or not etc are something you guys know quite a bit more about than I do.

FWIW, a few months ago I was doing some custom Racket coded for straightforward Gibbs (mainly) and MH in one case for customer related data analysis, but the water gets deep quickly beyond the straightforward, and one quickly questions the validity of their custom sampler against considerations of a proven MCMC library. I did a brief amount research on the current state of things with regards to WinBugs, OpenBugs, Jags, ... After a quick overview, I sort of narrowed things down to a relatively new arrival, STAN. https://code.google.com/p/stan/

It's a new rewrite, based on a new sampling algorithm variation where existent libraries are getting long in the tooth. It generates the sampling code rather then interprets BUGS DSL code and claims to take some pains to support FFI binding (R of course) and embedding.

Just wanted to mention STAN if you haven't run across it yet, as opposed to well know standby MCMC libs.

One day ... an FFI Racket <-> STAN would be very cool.

Neil Toronto

unread,

Jan 4, 2013, 8:19:16 PM1/4/13

to Ray Racine, us...@racket-lang.org

My doctoral research is to beat the crap out of STAN, WinBUGS, etc.,
etc., by leveraging information you can only get by writing a compiler
for a language with a formal definition.

There's no Bayesian DSL that can answer queries with arbitrary
conditions like "X^2+Y^2 = 1.2". MCMC fails spectacularly. In fact, you
can't even reason about those using Bayes' law for densities. But I've
got a backend implementation based on a measure-theoretic semantics that
can sample from distributions with conditions like that. :D

I needed a Typed Racket implementation of random-access lists to make it
scale well for models with more than 20 random variables or so.

Neil ⊥

On 01/04/2013 05:50 PM, Ray Racine wrote:
> MCMC with Gibbs Sampling and MH for straightforward stuff is
> straightforward, but the subtitles of underflow, use log space or not
> etc are something you guys know quite a bit more about than I do.
>
> FWIW, a few months ago I was doing some custom Racket coded for
> straightforward Gibbs (mainly) and MH in one case for customer related
> data analysis, but the water gets deep quickly beyond the
> straightforward, and one quickly questions the validity of their custom
> sampler against considerations of a proven MCMC library. I did a brief
> amount research on the current state of things with regards to WinBugs,
> OpenBugs, Jags, ... After a quick overview, I sort of narrowed things
> down to a relatively new arrival, STAN. https://code.google.com/p/stan/
>
> It's a new rewrite, based on a new sampling algorithm variation
> where existent libraries are getting long in the tooth. It generates
> the sampling code rather then interprets BUGS DSL code and claims to
> take some pains to support FFI binding (R of course) and embedding.
>
> Just wanted to mention STAN if you haven't run across it yet, as
> opposed to well know standby MCMC libs.
>
> One day ... an FFI Racket <-> STAN would be very cool.
>
>
> On Jan 4, 2013 4:47 PM, "Neil Toronto" <neil.t...@gmail.com

> http://lists.racket-lang.org/__users
> <http://lists.racket-lang.org/users>

Jeremiah Willcock

unread,

Jan 4, 2013, 8:39:58 PM1/4/13

to Neil Toronto, us...@racket-lang.org

How long are these lists going to be, and how many insertions/deletions
will occur relative to the number of element lookups and in-place updates?
Would it make sense to consider other random-access data structures (in
particular, growable vectors) instead of lists?

-- Jeremiah Willcock

Neil Toronto

unread,

Jan 4, 2013, 8:44:24 PM1/4/13

to Jeremiah Willcock, us...@racket-lang.org

I expect lookups and in-place updates with the most frequency. The other
operation I need to be fast is head or tail insertion (it doesn't matter
which), but not nearly as much as the other two. Have I picked the right
data structure?

Neil ⊥

Jeremiah Willcock

unread,

Jan 4, 2013, 8:50:20 PM1/4/13

to Neil Toronto, us...@racket-lang.org

On Fri, 4 Jan 2013, Neil Toronto wrote:

> I expect lookups and in-place updates with the most frequency. The other
> operation I need to be fast is head or tail insertion (it doesn't matter
> which), but not nearly as much as the other two. Have I picked the right data
> structure?

How long are the lists going to be? What about a double-ended queue
(deque), which doesn't appear to be in Racket or Planet, or even something
similar to the two-stack structure used to simulate Turing machine tapes
on pushdown automata with two stacks (i.e., have two growable vectors, one
for the first part of the list backwards, and a second for the second part
forward)?

-- Jeremiah Willcock

Neil Toronto

unread,

Jan 5, 2013, 7:31:11 PM1/5/13

to Sam Tobin-Hochstadt, us...@racket-lang.org

Thanks for the pointers!

I can't use David Van Horn's because it's untyped, and I need to use
this in Typed Racket. (No polymorphic, opaque types, and if I use the
`struct' form in `require/typed', I'd get the performance problems I
just wrote about on the dev list.)

Last time I checked Hari's RAList implementation, it was broken, so I
started my own from scratch (which is what I worked from just now). He's
fixed it, though. I could run some benchmarks, but I'm sure mine's a
little faster. (I use index types for static guarantees and to get TR to
optimize fixnum ops.) His has more functions, though...

Would a VList have fast functional update? I don't see a `list-set'
exported, and I'm not familiar with the data structure.

Neil ⊥

Sam Tobin-Hochstadt

unread,

Jan 6, 2013, 10:38:55 AM1/6/13

to Neil Toronto, us...@racket-lang.org

On Sat, Jan 5, 2013 at 7:31 PM, Neil Toronto <neil.t...@gmail.com> wrote:
> Thanks for the pointers!
>
> I can't use David Van Horn's because it's untyped, and I need to use this in
> Typed Racket.

Well, you might consider porting his code to Typed Racket. I hear it
was designed for that. ;)

> Last time I checked Hari's RAList implementation, it was broken, so I
> started my own from scratch (which is what I worked from just now). He's
> fixed it, though. I could run some benchmarks, but I'm sure mine's a little
> faster. (I use index types for static guarantees and to get TR to optimize
> fixnum ops.) His has more functions, though...

Is the use of index types a fundamental refactoring, or something that
could be added to Hari's code?

I ask because having 3+ different functional random-access data
structures seems like a loss for Racket code interoperability over
all.

> Would a VList have fast functional update? I don't see a `list-set'
> exported, and I'm not familiar with the data structure.

Certainly the data structure has fast functional update, I'm not
certain about the code in `tr-pfds`. The data structure is described
here: http://en.wikipedia.org/wiki/VList; see in particular the paper
by Bagwell cited in the references. It might also be worth looking at
the Hash-array Mapped Trie, currently used in Scala and Clojure, and
the RRB-tree, described here:
http://infoscience.epfl.ch/record/169879/files/RMTrees.pdf .

Sam

Matthias Felleisen

unread,

Jan 6, 2013, 12:46:29 PM1/6/13

to Sam Tobin-Hochstadt, us...@racket-lang.org

On Jan 6, 2013, at 10:38 AM, Sam Tobin-Hochstadt wrote:

> Is the use of index types a fundamental refactoring, or something that
> could be added to Hari's code?
>
> I ask because having 3+ different functional random-access data
> structures seems like a loss for Racket code interoperability over
> all.

Amen.

Neil Toronto

unread,

Jan 7, 2013, 12:26:31 PM1/7/13

to Sam Tobin-Hochstadt, us...@racket-lang.org

On 01/06/2013 08:38 AM, Sam Tobin-Hochstadt wrote:
> On Sat, Jan 5, 2013 at 7:31 PM, Neil Toronto <neil.t...@gmail.com> wrote:
>> Last time I checked Hari's RAList implementation, it was broken, so I
>> started my own from scratch (which is what I worked from just now). He's
>> fixed it, though. I could run some benchmarks, but I'm sure mine's a little
>> faster. (I use index types for static guarantees and to get TR to optimize
>> fixnum ops.) His has more functions, though...
>
> Is the use of index types a fundamental refactoring, or something that
> could be added to Hari's code?

No, not fundamental. Their use isn't always straightforward, though.
(The tricky changes are things like (<= i n) to (let ([new-i (- i n)])
(<= new-i 0)).) I'll send a patch or a pull request.

> I ask because having 3+ different functional random-access data
> structures seems like a loss for Racket code interoperability over
> all.