Confused about modeling the gender height problem

29 views
Skip to first unread message

Hakan Kjellerstrand

unread,
Aug 25, 2020, 5:47:57 AM8/25/20
to webppl-dev
I'm porting my BLOG models (http://hakank.org/blog_ppl/ ) to webppl and so far it has gone quite easy. webppl has a lot of nice features and I like it very much.

But now I'm stuck when porting the gender height problem, i.e. identifying a gender given  a height. Here's a simplified BLOG version of the problem: http://hakank.org/blog_ppl/gender_height_simple.blog .

I thought it would be easy, but in order to use observe() the height (a Gaussian). one has to use the "proper" distributions (i.e. "Gaussian" and not "gaussian"), And here is where I'm stuck with how to define and use  "height" and "gender".

Below my  model in webppl and it's not correct.  I've tried a couple of variants but all of them give errors or strange results.

"""
var model = function() { 
   var genderList = ["male","female"];

   var gender = function() { return Categorical({ps:[0.5,0.5],vs:genderList}); };
    var g = gender(); 

    var height = function() {
        if (sample(gender()) == "male") {
             return Gaussian({mu:181.5,sigma:50})
        } else {
            return Gaussian({mu:166.8,sigma:50});
        }
    };
     var h = height();

    // condition(height=="female");
    // observe(h,160.0);

   return { height:h, gender:g };
}

var d = Infer(model);
"""


Running it yield the non-informative "Not implemented" error:
"""
Error: Not implemented at /home/hakank/.nvm/versions/node/v10.22.0/lib/node_modules/webppl/src/dists/base.js:10
9| toJSON: function () {
10| throw new Error('Not implemented');
--------------^
11| },
at gender_height2.wppl:41
40|
41| var d = Infer(model);
"""

So, what is the best way of modeling this problem in webppl, including observing either the height or gender?

Best regards

Hakan

null-a

unread,
Aug 25, 2020, 8:57:43 AM8/25/20
to webppl-dev
I think the high-level confusion here is that your model is returning distributions, rather than samples from those distributions. When you're writing a model in WebPPL, have in mind the thought that you're writing a "function" that draws a single sample from the prior. e.g. We might write:

var model = function() {
  var gender = sample(<some_distribution>);
  var height = sample(<some_other_distribution>);
  return {gender: gender, height: height};
}

You can actually run this as a sampler simply by calling `model()`. Once that's working, you can replace any `sample` statements with `observe` statements as appropriate:

var model = function() {
  var gender = sample(<some_distribution>);
  var height = observe(<some_other_distribution>,  <observed_height>);
  return {gender: gender, height: height};
}

Here's one way of implementing your model in full:

var model = function() {
  var gender = sample(Categorical({ps: [0.5, 0.5], vs: ['m', 'f']}));
  var lParams = gender === 'm' ? {mu: 181.5, sigma: 50}
                               : {mu: 166.8, sigma: 50};
  observe(Gaussian(lParams), 160);
  return gender;
};

Infer({model: model, method: 'MCMC', samples: 10000});

Although I note this doesn't produce anything like same result as in the page you link to. Hopefully I've understood the gist of what you're trying to do though, and this helps you figure out the rest?

Hakan Kjellerstrand

unread,
Aug 25, 2020, 1:25:47 PM8/25/20
to webppl-dev
Hi.

Thanks for trying to make me understand why it's so hard - for me at least - to model this quite simple problem.

Unfortunately I'm still confused. I thought I understood the general principle that one are working with samples and not distributions. And when using discrete values it works with "condition(...)" to observe a value (e.g. condition(gender=="female")), but when working with continuous distributions one have to use observe with a distribution and not a sample (e.g.  observe(distribution, value)). 

I'm not sure why this quite simple problem -  reasoning from height to gender as well as the other way - is so hard to model in webppl when it's quite easy in both BLOG as well as in the PSI system (see http://hakank.org/psi_ppl/gender_height.psi ). Perhaps my real confusion is that I think that webppl works in about the same way as these systems. As mentioned before, so far it has been quite easy to port about 30 models from BLOG to webppl, see my WebPPL page: http://hakank.org/webppl/  for these models.

Well, I continue to read the documentation and port models. Hopefully I soon understand why I don't understand this...

Best,

Hakan

Hakan Kjellerstrand

unread,
Aug 28, 2020, 2:17:56 PM8/28/20
to webppl-dev
Well, I'm less confused now. :-)

The problem with the huge different probabilities between the BLOG and webppl models was simply that BLOG has the second parameter of Gaussian as variance while webppl use standard deviation.
Silly mistake by me.

Here is a model with more realistic standard deviations and slightly different mu's:

var model = function() {
   var genderList = ["male","female"];
   var gender = categorical({ps:[0.5,0.5],vs:genderList});
   var height = gender == "male" ? Gaussian({mu:178,sigma:7.7}) : Gaussian({mu:163,sigma:7.3});

   // condition(gender=="female");
   // observe(height,190.0);
   observe(height,160.0);
   // observe(height,170.0);

   return { height:sample(height),
                 gender:gender
               };
}

Now the marginal for gender is more like the BLOG version for observing a height of 160cm.
   "female" : 0.943
   "male" : 0.05699999999999997

It also works to reason about gender. With the single observation/condition:

    condition(gender=="female");

the model  give the expectation of height as

    'height', 162.93212150908056


It's great that this simple model now works as expected. :-)

/Hakan
Reply all
Reply to author
Forward
0 new messages