systematic biases from DiagCovGaussian prior?

hawk...@gmail.com

unread,

May 16, 2020, 7:03:27 AM5/16/20

to webppl-dev

Hello!

I've run into some very puzzling webppl behavior, and I was wondering if anyone had insights?

I was running variational inference and kept noticing that certain responses kept being preferred after different runs even when they weren't supported by the data. After tracing it back, I realized that this systematic bias was there even before running Optimize -- it was somehow coming from the prior, and was exactly the same across lots of random seeds!

This was hard for me to understand, because the prior is just a DiagCovGaussian over matrix elements, which should place independent Gaussian priors on each of the elements. The values of interest are deterministic functions of multiple matrix elements (specifically, normalizing over several possible conjunctions of matrix elements), so in expectation, the prior predictive over these downstream values ought to also be independent, not systematically skewed.

The most puzzling part is that the expectations of the individual matrix cells don't seem to differ dramatically under the prior, and the expectations of the individual conjunctions don't either. It's only after you normalize that you get these really systematically biased expectations... The bias also gets bigger when you increase the sigma on the DiagCovGaussian prior. At first, I thought that it was just that the expectation of the normalized values is more unstable, but it comes out almost exactly the same across runs (and I'm taking a lot of samples in the expectation!)

I've got a minimal working example here in <50 lines of code, which can be run on webppl.org, and would be grateful for any insights!

https://gist.github.com/hawkrobe/4c04410f7142d642f1abea98de6be499

Thanks in advance,

Robert

PS if the model seems unmotivated, it may help to explain that it's an extremely generic-ified version of a pragmatic language model, where the matrix elements represent how different words load onto different features, and the conjunctions are objects with multiple features. The normalizeRow step represents a listener agent deciding which of the objects to pick after hearing a word. But none of that matters for the behavior.

null-a

unread,

May 16, 2020, 11:48:36 AM5/16/20

to webppl-dev

Hi Robert. I took a quick look, and honestly, I'm not sure what's going on. I don't see any problems with the code, and I noticed that switching `DiagCovGaussian` for `TensorGaussian` produces the same results. Another thing I noticed was that there are plenty of settings of `conjunctions` that do produce uniform results from `expectedProb`. e.g. (`conjunctions = ['0_1', '2_3', '4_5']`).

> so in expectation, the prior predictive over these downstream values ought to also be independent, not systematically skewed.

It's not obvious to me that this is correct, though I admit I'm not sure I've fully understood what's happening. It feels like depending on exactly how columns are shared between `conjunctions`, we might not necessarily expect to see `expectedProb` to be uniform across all `conjunctions`.

Robert Hawkins

unread,

May 16, 2020, 3:50:15 PM5/16/20

to null-a, webppl-dev

Hi Paul -- thanks very much for your help! I think you're right that the non-uniformity is coming (somehow) from the sharing of columns, but I was having difficulty finding the precise pattern when playing with different sets of conjunctions. I'll try to work out the expected predictives analytically and see if that sheds some insight!

null-a wrote on 5/16/20 8:48 AM:

--
You received this message because you are subscribed to the Google Groups "webppl-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webppl-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/webppl-dev/df0653ce-452e-40fd-820b-ea4f6a4ed943%40googlegroups.com.

Robert Hawkins

unread,

May 17, 2020, 4:35:45 AM5/17/20

to Robert Hawkins, null-a, webppl-dev

just as an update if anyone is curious, this was a very surprising property of the program (to me), but mathematically turns out to follow pretty directly from the expectations!

let C={x,y,x+y} be a set of 'conjunctions', where x and y are any two primitive elements of the matrix and x+y is their conjunction, and let f(c) be the softmax function: f(c) = P(c ; C) = e^c / \sum_{c \in C} e^c. Note that 0 <= f(c) <= 1 for all c \in C.

then by the definition of the softmax and the independence of x and y, it follows that E[f(x+y)] = E[f(x) * f(y)] = E[f(x)] * E[f(y)] <= E[f(x)]. and other combinations with more primitives follow by induction...

mystery solved! thanks again for helping me think through this!

-- robert

Robert Hawkins wrote on 5/16/20 12:50 PM:

You received this message because you are subscribed to a topic in the Google Groups "webppl-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/webppl-dev/1dgQlFOn5UU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to webppl-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/webppl-dev/f7c2e8ec-f8f3-a0ba-db8e-a4bed172b76c%40alumni.stanford.edu.

Reply all

Reply to author

Forward