[A bit of background: the whole point of "minimal changes to execution traces" is to increase acceptance probabilities of MCMC moves. There's nothing magical about this idea; it's just the observation that for most programs, it's going to be hard to craft complex proposals that change many variables at once and still maintain high acceptance rates. Reusing as much randomness as possible is an easy way to create a proposal that's a "small" change, which is likely to be accepted.]
A couple of thoughts on your example. First of all, there's nothing wrong with your
reasoning -- but I think it would be useful to separate a couple of concepts here. As written, your program definitely won't mix with a single-site MCMC sampler, as you've already noted. There are a couple of ways around that: 1) do
something more complex than single-site MCMC [blocked Gibbs, delayed
rejection, constraint propagation, some other complex proposal mechanism, etc.], or 2) soften the constraint that A+B=1, possibly by
adding noise (ie, condition on A+B+noise=1).
Solution #1 is a statement about an inference algorithm, not about how
you manage traces in a probabilistic program. Let's take the example of
block Gibbs, and suppose that what you've written below is just a small
part of a larger program. The idea of "minimal changes" still applies
in the block setting -- you make proposals to A and B simultaneously,
keep all of the other variables in the program the same, and then
accept/reject. So in that sense, the complexity of the proposal
mechanism is independent from the idea of re-running a trace and trying
to keep as much of the trace similar as possible.
Solution #2 is a statement about the conditioner. In general, hard
conditioning statements like the one you've written are what make
inference difficult. You've basically written a simple SAT problem (and
in fact, it's very easy to write SAT as a probabilistic program).
Sometimes, it's possible to take the constraints implied by the
conditioner and propagate them backwards through a program (we call this
"constraint propagation."). It's hard to do in general languages, and
even in languages like Lisp it isn't trivial. In your example program,
once you've resampled A, the constraint A+B=1 implies a value for B --
so you could just set B to be the correct value. You've reused "as much
randomness as possible", which is none -- but notice that you didn't
*sample* a new value for B. It was deterministically implied.
Constraint propagation is hard. Automatically identifying good
candidate block moves is hard. The "lightweight implementations" paper deals with neither issue;
those are largely language specific, and not the real point of our
work. But they're essential for real implementations.
By the way: several probabilistic programming language developers feel that constraint propagation won't ever really work for general programs, and that the only reliable solution is to "noise up" the model by ensuring that you never create hard conditioners. You can almost always accomplish this by adding some sort of fudge factor at the "bottom" of your program; this essentially has the effect of creating the "axially connected" regions of probability mass you alluded to.
Oh, and it's an interesting (and current) research question to figure out how to automatically self-relax programs -- ie, take a program with a hard conditioner and automatically inject noise to improve mixability...
HTH,
David