Several mathy pebl-codebase questions

11 views
Skip to first unread message

Jeff

unread,
Oct 24, 2010, 5:01:32 PM10/24/10
to pebl-project
Hi,

I've been enjoying getting to know the Pebl codebase. It's small,
efficient, and well-written. Thanks much!

I have a couple questions if you have some time:

1) I'm trying to write my own multinomial-like CPD, and I don't
understand where the loglikelihood formula you use came from. You're
caching log factorials, but from what I've read it seems to me the
loglikelihood should be the sum of the product of each count with the
log of its parameter? So I'm confused about the factorials. I'm very
new to all the math behind Bayesian networks so I assume I'm
misunderstanding something important.

2) Second, is the Greedy Learner computing a Bayesian score of some
sort? (If so, which one?) It seems to highly favor very connected
networks, so I worry it's just maximizing the loglikelihood. Things
get better if I specify an energy matrix like UniformPrior does... is
this equivalent to BIC or some other Bayesian score? Also, I get quite
different results each time I run it. Is that normal? How does it
decide when to restart? The GES implementation in Tetrad seems to give
very consistent results so I wonder what the difference is (besides
that it's searching equivalence classes, of course).

3) Third, an easier question. I'd like to export the structure to
maybe SMILE or some other reasoner. Has anyone done this? If not, is
there a Python library I can 'put the network together in' that
supports export to some common file formats? If none of these, does
anyone have a spec for a good file format? What's preferred? XDSL
seems somewhat prevalent but it's so verbose. Random large (but fairly
sparse) networks (say 10k nodes and 10k edges) take >50mb!

Thanks for your time,
Jeff Klann
PhD Candidate, Indiana University
NLM Medical Informatics Fellow, Regenstrief Institute

Jeff Klann

unread,
Oct 27, 2010, 12:38:35 AM10/27/10
to pebl-project
Perhaps my email asked too many questions. Looks like the multinomial CPD is using marginal likelihoods for a BDe metric? But I can't find a good tutorial on computing marginal likelihood in the multinomial case, but I'd like to understand the loglikelihood formula. Does anyone have a reference (or an explanation)?

Thanks,
Jeff Klann

Peter Woolf

unread,
Oct 27, 2010, 1:41:41 PM10/27/10
to pebl-p...@googlegroups.com
Hi Jeff,

The marginal probability used in PEBL is based on the BDe metric as you noted.  A full description and derivation of the metric is in a 1992 paper by Cooper and Herskovits:
Is this what you are looking for?

Thanks,

--Peter

Abhik Shah

unread,
Oct 27, 2010, 4:19:39 PM10/27/10
to pebl-p...@googlegroups.com
1) Using the BDe metric that Peter referenced. Specifically, it's
equation 12 in the Cooper paper..

2) If a prior is not provided to the learner, it just calculates the
loglikelihood. You can, however, supply any pebl learner with a prior
to be used in calculating the score. The greedy learner is stochastic
and so, if you do not run it for long enough, it will give different
results because it probably will not converge to the maximum. Learner
operations (including restarts) are controlled by configuration
parameters (http://ano.malo.us/pebl/docs/paramref.html#greedy) and the
defaults may not be appropriate for all datasets.

3) NetworkX seems to be a popular python library for graphs (structure
only) and they provide readers/writers for a variety of common
formats. I don't know of any common format for representing both the
structure and CPD/CPT.

Thanks,
Abhik.

On Sun, Oct 24, 2010 at 5:01 PM, Jeff <jkl...@gmail.com> wrote:

Reply all
Reply to author
Forward
0 new messages