HW5 prob1

11 views
Skip to first unread message

Dav Clark

unread,
Nov 22, 2007, 7:46:02 PM11/22/07
to CS281A: Statistical Learning Theory (Fall 2007)
Problem 1 seems a little underconstrained... are we supposed to assume
some distributional form on p and/or q? It seems like as stated, the
problem is trivial - we just sample from the p(x_i | x_-i), which is
given, and combine those samples to make our q_i's...

Am I missing something?

DC

Joseph Austerweil

unread,
Nov 22, 2007, 9:53:40 PM11/22/07
to cs281a...@googlegroups.com
hey

when is the hw due? i haven't started it yet. is it hard?

=joe

Joseph Austerweil

unread,
Nov 22, 2007, 9:54:22 PM11/22/07
to cs281a...@googlegroups.com
sorry, i meant to send that to dav.
=joe

Alekh Agarwal

unread,
Nov 22, 2007, 10:37:41 PM11/22/07
to cs281a...@googlegroups.com
It'll be useful to all though I guess if Dav replies to the list :)

Percy Liang

unread,
Nov 23, 2007, 2:11:11 PM11/23/07
to cs281a...@googlegroups.com
> Problem 1 seems a little underconstrained... are we supposed to assume
> some distributional form on p and/or q? It seems like as stated, the

You don't need to, but you can think about p and q being exponential
families if you'd like since that's what was done in class.

> problem is trivial - we just sample from the p(x_i | x_-i), which is
> given, and combine those samples to make our q_i's...

But what distribution would you take the x_-i's from? The optimal
q_i should have a relatively simple expression which you should
arrive at with some algebraic manipulation of the global objective
function (min KL).

Dav Clark

unread,
Nov 23, 2007, 2:45:43 PM11/23/07
to CS281A: Statistical Learning Theory (Fall 2007)
On Nov 23, 11:11 am, Percy Liang <percyli...@gmail.com> wrote:

> But what distribution would you take the x_-i's from? The optimal
> q_i should have a relatively simple expression which you should
> arrive at with some algebraic manipulation of the global objective
> function (min KL).

I guess that gets at the heart of my question... from the reading in
the MCMC paper, as well as the presentation in class, it seems that
*proper* Gibb's sampling simply maintains a state vector of the most
recenty sampled values for each x_j. And these are the x_-i that
would get used in each sampling step.

But given what you said, I take this to mean that at each step I use
the total knowledge I have about the already sampled x_-i values to
generate my new x_i sample.

Thanks for the answer! Hope you're enjoying the long weekend.

DC

Percy Liang

unread,
Nov 23, 2007, 2:56:04 PM11/23/07
to cs281a...@googlegroups.com
> I guess that gets at the heart of my question... from the reading in
> the MCMC paper, as well as the presentation in class, it seems that
> *proper* Gibb's sampling simply maintains a state vector of the most
> recenty sampled values for each x_j. And these are the x_-i that
> would get used in each sampling step.
>
> But given what you said, I take this to mean that at each step I use
> the total knowledge I have about the already sampled x_-i values to
> generate my new x_i sample.

For sampling, you use your last sampled values of x_-i.

But for mean-field, you need to use q_-i(x_-i) somehow.

-Percy

Reply all
Reply to author
Forward
0 new messages