Is there a simple way to query a Graphical Model in Stan?

876 views
Skip to first unread message

Erikson Kaszubowski

unread,
Jun 18, 2015, 5:28:45 PM6/18/15
to stan-...@googlegroups.com
Dear Stan users,

I've been playing around with the 'abn' R package for DAG strutcture discovery using GLM and GLMM parametrization. Simply put, the package has algorithms for discovering directed graph structures where each node is either gaussian, binomial or poisson variables and each arc represents a regression between variables (more information here).

It's easy to represent the discovered DAG in Stan and fit it to the data. My problem now is: having a fitted model, how can I query it? Meaning, how can I use the model to retrieve the marginals of a node in the light of some evidence, as classical expert systems (like Nettica, BayesNet, Bayesia, etc)? I know Stan is not build for this kind of query, but is there a simple way to do it?

How I am doing it now with a simple classifier:

Using the Iris dataset and the discovered DAG as a classifier, I exploited the DAG structure and used the sum-product rule to compute the marginals directly in Stan (in 'generated quantities'). The results are fine for this data set: in a random 2/3 train 1/3 test fit, the classifier hardly ever misses.

The attached code should give a rough idea how to discover the model with abn and fit it with Stan. Warning: the code in 'generated quantities' is ugly. Any suggestions on how to improve it?

Thanks!
Erikson


dag.R
dag.stan

Bob Carpenter

unread,
Jun 18, 2015, 7:17:04 PM6/18/15
to stan-...@googlegroups.com

> On Jun 18, 2015, at 5:28 PM, Erikson Kaszubowski <erik...@gmail.com> wrote:
>
> Dear Stan users,
>
> I've been playing around with the 'abn' R package for DAG strutcture discovery using GLM and GLMM parametrization. Simply put, the package has algorithms for discovering directed graph structures where each node is either gaussian, binomial or poisson variables and each arc represents a regression between variables (more information here).
>
> It's easy to represent the discovered DAG in Stan and fit it to the data. My problem now is: having a fitted model, how can I query it? Meaning, how can I use the model to retrieve the marginals of a node in the light of some evidence, as classical expert systems (like Nettica, BayesNet, Bayesia, etc)? I know Stan is not build for this kind of query, but is there a simple way to do it?

Actually, that's all Stan is built for:

Specify a joint log probability function p(theta, y),
observe y, and take a sample from the posterior p(theta | y).

(We can also do optimization, but I don't think that's relevant here.)

So I'm not quite sure what you're asking because the language
isn't familiar.

The best way to simplify your code would be with functions for
common operations.

But you should never ever use things like

exp(log(1-inv_logit(sVir_Intercept))

Just replace exp(log(...)) with (...)!

There's also a log_inv_logit function. and a log1m_inv_logit
for log(1 - inv_logit(x)).

Also, you probably don't want priors as wide as normal(0, 100),
depending on the scale --- but if that's the scale, then you probably
don't want cauchy(0, 5) on the sd.

> How I am doing it now with a simple classifier:
>
> Using the Iris dataset and the discovered DAG as a classifier, I exploited the DAG structure and used the sum-product rule to compute the marginals directly in Stan (in 'generated quantities').

Which marginals are you trying to compute? It's hard to tell
from the code.

> The results are fine for this data set: in a random 2/3 train 1/3 test fit, the classifier hardly ever misses.

Is there a reason you can't just code the classifier up
in Stan directly?

> The attached code should give a rough idea how to discover the model with abn and fit it with Stan. Warning: the code in 'generated quantities' is ugly. Any suggestions on how to improve it?

Functions. But I think the whole thing could probably be
rewritten much more straightforwardly if you're just
trying to do classification. I think what you're doing is the
same as assigning responsibilities in a mixture model --- at least
if you're doing the usual trick of turning a mixture model into
a classifier.

- Bob

Erikson Kaszubowski

unread,
Jun 18, 2015, 10:03:28 PM6/18/15
to stan-...@googlegroups.com
Dear Bob,

I'm interested in making a query in the sense of the use of Bayesian Networks in expert systems (like the 'Asia' BUGS model, HUGIN-style, or the classical Sprinkler-Rain graphical model), as discussed in Koller and Friedman's "Probabilistic Graphical Models", chapter 9.
In my case, instead of using conditional distribution tables, as usual in those models, I'm modeling conditional distributions as GLMs, as suggested by the 'additive bayesian network' package.

I'm using Stan to represent and fit the parameters of the DAG discovered by the 'abn' package. I want to use the fitted parameters to query the marginal probabilities of specific nodes, given the evidence in other nodes. E.g., using a sug-graph from the attached model:

Speciesversicolor(SVE) -> Petal.Width(PW) <- Speciesvirginica(SVI)

In this simple sub-graph, the joint distribution is given by:

P(SVE, PW, SVI) = P(SVE)P(SVI)P(PW|SVE,SVI)

If a have a piece of evidence about petal width, I would like to know the updated probabilities of the parent nodes. Suppose I have the information that the petal width with mean value (0 for standardized data). What is the prob. it's an Iris Versicolor?

The priors on SVE and SVI are given by the first level logistic regression. As parent nodes, those regression have just an intercept: -.49 and -.72 in one version of the fitted model, or P(SVI=1) = inv_logit(-.49) = 0.38; P(SVE=0) = 1 - inv_logit(-.72) = 0.67.
The conditional probability for the child node is given by the linear regression PW = Intercept + B1*SVE + B2*SVI + error, so: P(PW=0|SVE=0,SVI=1) = dnorm(0, mean=-1.26 + 2.37*1 + 1.44*0, sd=0.27). So the joint is:

P(SVE=1, SVI, PW=0) = (0.38 * 0.67 * 0.00031) + (0.38 * 0.33 * 0) ### Marginalizing over SVI, normalize for P(SVE=1|PW=0)

I don't know if I made it clerarer now.

Thanks for the modeling tips! I'm using exp(log ()) because I'm summing over various log-probabilities and only then exponentiating. I want to compute the marginals for the nodes that represent the species, using the evidence from the other nodes.

I understand that it would be easier to just use a classifier for this purpose (in this example, a simple linear discriminant analysis or a naive bayes), but I'm doing it this way to understand the 'additional bayesian networks' better.

Erikson

Bob Carpenter

unread,
Jun 19, 2015, 3:55:07 PM6/19/15
to stan-...@googlegroups.com

> On Jun 18, 2015, at 10:03 PM, Erikson Kaszubowski <erik...@gmail.com> wrote:
>
> Dear Bob,
>
> I'm interested in making a query in the sense of the use of Bayesian Networks in expert systems (like the 'Asia' BUGS model, HUGIN-style, or the classical Sprinkler-Rain graphical model), as discussed in Koller and Friedman's "Probabilistic Graphical Models", chapter 9.
> In my case, instead of using conditional distribution tables, as usual in those models, I'm modeling conditional distributions as GLMs, as suggested by the 'additive bayesian network' package.

I'm afraid I don't know any of this material.

> I'm using Stan to represent and fit the parameters of the DAG discovered by the 'abn' package. I want to use the fitted parameters to query the marginal probabilities of specific nodes, given the evidence in other nodes.

> E.g., using a sug-graph from the attached model:
>
> Speciesversicolor(SVE) -> Petal.Width(PW) <- Speciesvirginica(SVI)

I don't understand this notation. What is Speciesversicolor() do
as a wrapper?

> In this simple sub-graph, the joint distribution is given by:
>
> P(SVE, PW, SVI) = P(SVE)P(SVI)P(PW|SVE,SVI)

> If a have a piece of evidence about petal width, I would like to know the updated probabilities of the parent nodes. Suppose I have the information that the petal width with mean value (0 for standardized data). What is the prob. it's an Iris Versicolor?

I'm a bit confused, though, because SVE and SVI look like
species indicators. How can they both be variables?

I'd think there'd be one boolean variable z[i] that would
indicate with z[i] = 0 if item i is plant type 1 and z[i] = 1
if item i is plant type 2. Then you'd condition things like
the observables on the type of plant. Then the problem's
a simple classification.

> The priors on SVE and SVI

I don't know what that means, either.

> are given by the first level logistic regression. As parent nodes, those regression have just an intercept: -.49 and -.72 in one version of the fitted model, or P(SVI=1) = inv_logit(-.49) = 0.38; P(SVE=0) = 1 - inv_logit(-.72) = 0.67.
> The conditional probability for the child node is given by the linear regression PW = Intercept + B1*SVE + B2*SVI + error, so: P(PW=0|SVE=0,SVI=1) = dnorm(0, mean=-1.26 + 2.37*1 + 1.44*0, sd=0.27). So the joint is:
>
> P(SVE=1, SVI, PW=0) = (0.38 * 0.67 * 0.00031) + (0.38 * 0.33 * 0) ### Marginalizing over SVI, normalize for P(SVE=1|PW=0)

> I don't know if I made it clerarer now.

A bit, but I'm still very confused. Is it consistent to have
SVE=1 and SVI=1? or SVE=0 and SVI=0? And what's this B1 and B2 variable
that gets introduced?

> Thanks for the modeling tips! I'm using exp(log ()) because I'm summing over various log-probabilities and only then exponentiating. I want to compute the marginals for the nodes that represent the species, using the evidence from the other nodes.

No matter what your intent, you never ever ever want to use exp(log())
or log(exp()) or any other pair of inverse functions applied to each other.
It'll only hurt your speed and arithmetic precision and robustness.

> I understand that it would be easier to just use a classifier for this purpose (in this example, a simple linear discriminant analysis or a naive bayes), but I'm doing it this way to understand the 'additional bayesian networks' better.

I just meant build the classifier in Stan. I think that's
what you're trying to do, we're just not communicating notationally.

What I want to see is:

data y: variable types and dimensions and constraints

parameters theta: ditto

joint probablity function: p(y, theta)

At that point, all the inference you want to do is turn-the-crank.

- Bob

Erikson Kaszubowski

unread,
Jun 20, 2015, 6:53:12 PM6/20/15
to stan-...@googlegroups.com
Bob,

Thanks for your patience. I will try another approach to give a clearer picture.


> In my case, instead of using conditional distribution tables, as usual in those models, I'm modeling conditional distributions as GLMs, as suggested by the 'additive bayesian network' package.

I'm afraid I don't know any of this material.

I will try to explain the idea using the 'Sprinkler-Rain-Grass' DAG from Wikipedia (https://en.wikipedia.org/wiki/Bayesian_network) in a moment.
 
> Speciesversicolor(SVE) -> Petal.Width(PW) <- Speciesvirginica(SVI)

I don't understand this notation.  What is Speciesversicolor() do
as a wrapper?

Sorry for the terrible notation. It's not a wrapper; the SVE is supposed to be just an abbreviation to 'Speciesversicolor'.
 
> In this simple sub-graph, the joint distribution is given by:
>
> P(SVE, PW, SVI) = P(SVE)P(SVI)P(PW|SVE,SVI)

> If a have a piece of evidence about petal width, I would like to know the updated probabilities of the parent nodes. Suppose I have the information that the petal width with mean value (0 for standardized data). What is the prob. it's an Iris Versicolor?

I'm a bit confused, though, because SVE and SVI look like
species indicators.  How can they both be variables?  

Indeed, they are species indicators. This is a limitation from the 'abn' package: multinomial variables must be split in binomials for each class.
 
I'd think there'd be one boolean variable z[i] that would
indicate with z[i] = 0 if item i is plant type 1 and z[i] = 1
if item i is plant type 2.  Then you'd condition things like
the observables on the type of plant.  Then the problem's
a simple classification.

> The priors on SVE and SVI

I don't know what that means, either.

> The conditional probability for the child node is given by the linear regression PW = Intercept + B1*SVE + B2*SVI + error, so: P(PW=0|SVE=0,SVI=1) = dnorm(0, mean=-1.26 + 2.37*1 + 1.44*0, sd=0.27). So the joint is:

A bit, but I'm still very confused.  Is it consistent to have
SVE=1 and SVI=1?  or SVE=0 and SVI=0?  And what's this B1 and B2 variable
that gets introduced?


OK, I'll use the 'Sprinkler-Rain-Grass' example now. In this example, we already have a 'fit' model for the dependecies between variables in the form of conditional distribution tables (or just prior probabilities for the parent nodes). The Wikipedia article shows an example of query on this simple model: given the evidence that the grass is wet, what is the probability that it rained? Using the information from the CDT and the sum-product rule, we can compute the query as P(R=T|G=W) summing out the 'Sprinkler' variable.

Nothing new here, just simple application of Bayes Rule over a graph.

BUT! Suppose I don't have CDT for my variables; let's say I want to represent this information using GLMs. So, e.g., the 'Rain' node could be represented using a logistic regression with an intercept only. The variable probability table is simply (0.8, 0.2); so the model for this variable could be: Pr(Rain=T) = inv_logit(-1.38). The same thing could be done for the Sprinkler variable, but now we must condition on the value of 'Rain': Pr(Sprinkler=T | Rain) = inv_logit(-0.4 - 4.1 * Rain). The information is (almost) exactly the same, but I'm encoding it using GLMs instead of the traditional CDT. The main advantage is that, by using a parameterized model, I can represent any probability distribution I want, e.g. a conditional gaussian for continuous variables.

I hope I made it clearer! In this case, the 'B1', 'B2' parameter introduced in my example are the regression parameters that allow me to compute P(X | Y), as explained above. And you are right, it's not consistent to have both indicator varibles == 1, but this knowledge is not encoded in the graph (OK, I know it's dumb, but please bear with me!)

So, I'm using Stan to fit the GLMs that represent the (conditional) probability distributions of the variables of my full model. But what I truly want is to use the fitted model to make a query, in the sense of the above example. My question is: is there a way to do this directly in Stan?

As I said, right now I'm using the sum-product rule over the fitted model to find the marginal probabilities of each indicator variable. But I could query the network using MCMC, if I could fix the evidence and sample from the queried node. But I'm lost in how to do this directly in Stan. Maybe if I encode the fitted model using custom probability functions, like this (for the Wikipedia example)?

functions {
  real rain_log(real y) {
    return if_else(y > 0.5, log_inv_logit(-1.38), log1m_inv_logit(-1.38));
  }
  real sprinkler_log(real y, real rain) {
    return if_else(y > 0.5, log_inv_logit(-0.4 - 4.1 * round(rain)), log1m_inv_logit(-0.4 - 4.1 * round(rain)));
  }
  real grass_log(real y, real sprinkler, real rain) {
    return if_else(y > 0.5, log_inv_logit(-10 + 12.2 * round(sprinkler) + 11.39 * round(rain) - 8.99 * round(rain) * round(sprinkler)),
    log1m_inv_logit(-10 + 12.2 * round(sprinkler) + 11.39 * round(rain) - 8.99 * round(rain) * round(sprinkler)));
  }
}

transformed data {
  real g;
  g <- 1.0;
}

parameters {
  real<lower=0, upper=1> s;
  real<lower=0, upper=1> r;
}

model {
  g ~ grass(s, r);
  s ~ sprinkler(r);
  r ~ rain();
}

generated quantities {
  real truSpr;
  real truRain;
  
  truSpr <- round(s);
  truRain <- round(r);
}



(well, it feels like using a cannon to kill a fly -- and it takes too many iterations for a good estimate! And I'm using a point estimates for the user-defined priors and likelihoods))

 
> I understand that it would be easier to just use a classifier for this purpose (in this example, a simple linear discriminant analysis or a naive bayes), but I'm doing it this way to understand the 'additional bayesian networks' better.

I just meant build the classifier in Stan.  I think that's
what you're trying to do, we're just not communicating notationally.

What I want to see is:

  data y:  variable types and dimensions and constraints

  parameters theta:  ditto

  joint probablity function: p(y, theta)

At that point, all the inference you want to do is turn-the-crank.

- Bob
Again, thanks for your patience and for all the tips. I understand there are better options to what I'm doing, but... It's for the sake of  learning!


Erikson

Seth Flaxman

unread,
Jun 23, 2015, 6:31:41 PM6/23/15
to stan-...@googlegroups.com
Hi. If I understand this correctly, you have a graphical model with
edges parameterizing using logistic regressions. And you want to know
how to calculate the marginal probabilities: P(Sprinkler), P(Rain),
P(Grass wet)? Recall that the whole idea of using MCMC is that we're
going to get a full probability distribution for each of these, by
generating draws.

To accomplish that, I think you should just be able to take the code
you wrote above and add it to the full model (i.e. the dag.stan code
you sent) in a generated quantities block. Instead of hardcoding in
those point estimates, you'd simply use the current draw of the
parameters, transforming them as needed as you've done in the code you
sent...

But maybe I'm missing something...can you say exactly what
P(Sprinkler) is in terms of the variables in your model? Once you have
that, we should be able to help you calculate it given the current
draws of the model parameters...

Seth
> --
> You received this message because you are subscribed to the Google Groups
> "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Bob Carpenter

unread,
Jun 23, 2015, 6:39:37 PM6/23/15
to stan-...@googlegroups.com
Thanks, Seth --- that's exactly what I've been trying to say!

- Bob

Andrew Gelman

unread,
Jun 23, 2015, 6:52:45 PM6/23/15
to stan-...@googlegroups.com
I just have to say, for the record, that I hate these sorts of examples, as they have nothing to do with the kinds of causal problems I work on in social or enviornmental science. In my world, every possible intervention can affect everything else, it’s none of this sprinkler, rain, grass bullshit.
> On Jun 23, 2015, at 6:31 PM, Seth Flaxman <fla...@gmail.com> wrote:
>

Bob Carpenter

unread,
Jun 23, 2015, 9:09:37 PM6/23/15
to stan-...@googlegroups.com
Andrew --- do we have to send you to the corner of the list
until you can play nicely with other users?

It's very hard to bridge these conceptual and terminological
gaps, so I think we need to have some patience.

As a computer scientist, I find simple examples much easier
to start with in my quest for understanding and I don't think we
should discourage people from exploring them. Technically, I think
Seth and I have been trying to say the same thing --- marginalization
is trivial (not just mathematician trivial, but really trivial) with MCMC.

- Bob

Erikson Kaszubowski

unread,
Jun 24, 2015, 10:05:03 AM6/24/15
to stan-...@googlegroups.com
Dear Bob, Seth and Andrew,

Seth is spot on! My problem is: to simulate draws from the queried node, conditional on the evidence, I need to compute the node's markov blanket, right? This should be done manually in the 'generated quantities' block, then, e.g. using a bernoulli rng with parameter equal to the probability of a success given the blanket.

Seth, I'm using a general bayesian network for classification (an example comparing Naïve Bayes, TAN, BAN and general Bayesian Networks here). I want to evaluate the network performance for predicting the class of held-out data. Each class is a separate discrete node in the network.

In summary:

1) I have a large dataset with many variables;
2) I'm doing DAG structure discovery using exact search based on topological ordering, as implemented in the 'abn' R package;
3) I implement the DAG in Stan to estimate its parameters using a training dataset (the discovered DAG structure is, in fact, just a set of generalized linear models);
4) I want to evaluate the network performance for a held-out test dataset, considering uncertainty in the parameter estimates.

I want to use Stan for two different steps: a) estimate de model parameters for a given network structure and training dataset (easy); b) use the fitted DAG to compute the probability of each class for held-out data (my question). I have seen the code for Naïve Bayes Classification in Stan manual, but my network is not of the naïve kind and the class variable is split in various nodes with different edges. I will try to implement simulating variable value using markov blanket information; I will post it here later.

Andrew, I understand how you feel about those silly examples of bayesian reasoning. My statistics classes only taught 'Bayesian Statistics' up to those simple examples, and we all had the impression that Bayes seemed like a nice trick without practical research applications. But Bob pointed the reason behind bringing the example to this discussion -- to help us arrive at some mutual understanding! The data I'm interested in modeling is much more complex; the 'abn' package, in fact, has been used for modeling complex observational data in Epidemiology (like this), allowing complex dependency structures between variables.

Erikson

Michael Betancourt

unread,
Jun 24, 2015, 10:25:17 AM6/24/15
to stan-...@googlegroups.com

we all had the impression that Bayes seemed like a nice trick without practical research applications

Hahahahahahahaha.  You must have had a _terrible_ teacher or you are
deliberately trying to troll a few of us.

I think Andrew’s aggression here is with this idea of just building up a DAG
connecting all of the variables and trying to infer correlations from the data
(edge strengths in the DAG) without making any attempt to actually model
the problem at hand.  If so, I wholeheartedly agree with his sentiment.  DAGs
are useful objects for presenting models, but they are fundamentally limited
as a way to construct complex models.

To be clear, I’m not against building up this example in Stan.  Just warning
that it’s not the best strategy for analyzing a challenging statistical problem.

Bob Carpenter

unread,
Jun 24, 2015, 11:36:37 AM6/24/15
to stan-...@googlegroups.com

> On Jun 24, 2015, at 10:05 AM, Erikson Kaszubowski <erik...@gmail.com> wrote:
>
> Dear Bob, Seth and Andrew,
>
> Seth is spot on! My problem is: to simulate draws from the queried node, conditional on the evidence, I need to compute the node's markov blanket, right?

For continuous nodes, this is automatic. Check out the MCMC section
of the manual.

For discrete nodes, you have to compute the conditional (Stan
doesn't require models to be graphical, but if it is a directed graphical
model, that will indeed correspond to the Markov blanket).

> This should be done manually in the 'generated quantities' block, then, e.g. using a bernoulli rng with parameter equal to the probability of a success given the blanket.

No reason to simulate. You can just straight up compute expectations
for anything you want. There's a chapter in the manual on latent
discrete parameters that provides some examples.

The naive Bayes implementation in the manual has an example, as do
the models in the latent discrete paramters chapter of the manual.

> Seth, I'm using a general bayesian network for classification (an example comparing Naïve Bayes, TAN, BAN and general Bayesian Networks here). I want to evaluate the network performance for predicting the class of held-out data. Each class is a separate discrete node in the network.
>
> In summary:
>
> 1) I have a large dataset with many variables;
> 2) I'm doing DAG structure discovery using exact search based on topological ordering, as implemented in the 'abn' R package;
> 3) I implement the DAG in Stan to estimate its parameters using a training dataset (the discovered DAG structure is, in fact, just a set of generalized linear models);
> 4) I want to evaluate the network performance for a held-out test dataset, considering uncertainty in the parameter estimates.
>
> I want to use Stan for two different steps: a) estimate de model parameters for a given network structure and training dataset (easy); b) use the fitted DAG to compute the probability of each class for held-out data (my question). I have seen the code for Naïve Bayes Classification in Stan manual, but my network is not of the naïve kind and the class variable is split in various nodes with different edges. I will try to implement simulating variable value using markov blanket information; I will post it here later.

If you can write out the probability function, it can probably
be coded in Stan, unless it's one of those connected discrete
examples that Andrew so dislikes (like the Asia model in BUGS, for
example, if I'm recalling the right one). Those Stan isn't going to
help with at all.

> Andrew, I understand how you feel about those silly examples of bayesian reasoning. My statistics classes only taught 'Bayesian Statistics' up to those simple examples, and we all had the impression that Bayes seemed like a nice trick without practical research applications. But Bob pointed the reason behind bringing the example to this discussion -- to help us arrive at some mutual understanding! The data I'm interested in modeling is much more complex; the 'abn' package, in fact, has been used for modeling complex observational data in Epidemiology (like this), allowing complex dependency structures between variables.

There's already a very deep Bayesian epidemiology literature.
As an example, Dawid and Skene's (1979!) diagnostic accuracy and population
prevalence model is presented in the latent discrete parameters chapter.

- Bob

Erikson Kaszubowski

unread,
Jun 24, 2015, 12:14:18 PM6/24/15
to stan-...@googlegroups.com
Michael, I wish I were trolling! And I'm not talking about undergrad stats. I couldn't find a single course on bayesian statistics, or at least with some topics on bayesian statistics, in all graduate courses in my University. And it's not a small University! But I live in Brazil, so maybe I shouldn't expect much.

Bob, in the Naïve Bayes example in Stan manual, a section about "Prediction without Model Updates" seems to be close to what I want to do (classify test data after training the model). In my case, instead of a 'gamma' parameter that is modeled by the Naïve Bayes assumption (conditionally independent features), I would compute the marginals for each categorical node of interest (that represent each class in the model) using the node's markov blanket, right? Thanks for the manual references, I'll read them carefully.

Erikson

Bob Carpenter

unread,
Jun 24, 2015, 12:35:37 PM6/24/15
to stan-...@googlegroups.com

> On Jun 24, 2015, at 12:14 PM, Erikson Kaszubowski <erik...@gmail.com> wrote:
> ...
> Bob, in the Naïve Bayes example in Stan manual, a section about "Prediction without Model Updates" seems to be close to what I want to do (classify test data after training the model). In my case, instead of a 'gamma' parameter that is modeled by the Naïve Bayes assumption (conditionally independent features), I would compute the marginals for each categorical node of interest (that represent each class in the model) using the node's markov blanket, right? Thanks for the manual references, I'll read them carefully.

Right. You can probably do whatever you need to do with
the expectations rather than sampling. It's much more
efficient.

- Bob

Reply all
Reply to author
Forward
0 new messages