How do operators specifically propose a new state for continuous parameters?

112 views
Skip to first unread message

Pavel Rinkman

unread,
May 16, 2022, 1:57:25 PM5/16/22
to beast-users

Hello, everyone!

Recently, I have gotten stuck with some points of BEAST work algorithm. I will try to describe my view beneath, please, could you specify my mistakes & help to fathom BEAST?

I am very thankful for your help in advance!

BEAST mcmc run consists of a chain of states, that begins from the initial state from xml-file.

New states is proposed by operator classes, namely usually one new parameter per time. The specific operator is defined randomly, & in proportion to operators weights. Looking ahead, if its proposal is not accepted, the chain counter moves a step forwards, & next step new operator will be chose randomly again.

Transition & clock rates, frequencies, invariant sites proportion, gamma shape, & all other continuous parameters, for which we choose different distributions, boundaries, & initial values as prior are proposed by ScaleOperator.

Also each parameter has a scale factor (I do not consider their tuning during analysis here). New value is proposed the next way:

  • scale factor, β∈(0; 1], defines an interval [β; 1/β]
  • which is multiplied by the current state value, x. So, it gets [xβ; x/β].
  • the point is drawn from the interval in accordance with relative probability density over it as this parameter distribution scope.

Thus for x = 80; β = 0.5; & OneOnX distribution BEAST takes a point from the green area (between 40 & 160) in proportion to probability density, doesn’t it?

index.png

This moment is the most confused for me. Does it happen the way describing above or the distribution is defined over [β; 1/β], & then a drawn value is multiplied by x?

OneOnX.png

I have read a source code of ScaleOperator class (at least for the first BEAST) & noticed that it take a random value on [β; 1/β] through MathUtils.nextDouble(), that is just random.nextDouble() which is just a uniform, isn’t it? How ScaleOperator intermediate with distributions classes?

If this has differences between BEAST 1 & 2, I am very interested to have known them.

Best regards,

Remco Bouckaert

unread,
May 23, 2022, 5:47:03 PM5/23/22
to beast...@googlegroups.com
On 17/05/2022, at 5:57 AM, Pavel Rinkman <pavel....@gmail.com> wrote:

help to fathom BEAST?

I am very thankful for your help in advance!

BEAST mcmc run consists of a chain of states, that begins from the initial state from xml-file.

New states is proposed by operator classes, namely usually one new parameter per time. The specific operator is defined randomly, & in proportion to operators weights. Looking ahead, if its proposal is not accepted, the chain counter moves a step forwards, & next step new operator will be chose randomly again.

Transition & clock rates, frequencies, invariant sites proportion, gamma shape, & all other continuous parameters, for which we choose different distributions, boundaries, & initial values as prior are proposed by ScaleOperator.

Also each parameter has a scale factor (I do not consider their tuning during analysis here). New value is proposed the next way:

  • scale factor, β∈(0; 1], defines an interval [β; 1/β]
  • which is multiplied by the current state value, x. So, it gets [xβ; x/β].
  • the point is drawn from the interval in accordance with relative probability density over it as this parameter distribution scope.

Thus for x = 80; β = 0.5; & OneOnX distribution BEAST takes a point from the green area (between 40 & 160) in proportion to probability density, doesn’t it?


Hi Pavel,

This laste sentence may perhaps be the crux of the problem: the scale operator makes proposals independent of the prior distribution.It actually takes a value uniformly in the interval [β; 1/β], say it is value s. Then it multiplies x with s as the proposed new state.

Whether the state is accepted or not depends on the posterior of the new state (which may include a OneOnX distribution as you mentioned) as well as the Hastings ratio. The latter term compensates for the asymmetric interval that the scale operator takes its value from: sampling uniform from  [β; 1/β] gives a bigger chance of scaling a parameter up than scaling a parameter down. You may want to read up on the Metropolis–Hastings algorithm for details.

Hope this helps,

Remco

Pavel Rinkman

unread,
Jun 2, 2022, 10:09:53 AM6/2/22
to beast-users
Dear Remco,

Thank you for your answer!

If I understand correctly, the proposal kernel (operator in the BEAST glossology) suggests a new state from uniform distribution on the slice [β; 1/β], where β is a scale factor, doesn't it? And the sole way for the prior distribution to impact on the new state is to be a part of common prior estimation for eventual posterior (or joint) estimation of the state, isn't it?

I was confused by classic description of Metropolis-Hastings algorithm in books where an example of the proposal kernel is usually an any Gaussian distribution.

Best regards,
Pavel


вторник, 24 мая 2022 г. в 00:47:03 UTC+3, higg...@gmail.com:

babarlelephant

unread,
Jun 5, 2022, 2:16:06 PM6/5/22
to beast-users
Hi, interesting discussion. I wonder if there are some toy demos somewhere, 
like 30 lines of java instanciating a few BEAST classes (with a few comments to help reading the source of the relevant methods),
setting most parameters to a fixed value (population size, JC69, clockrate, tip dates..),
and running a MCMC to optimize a small tree, just its topology and the dates of 5 internal nodes? Finally drawing the best tree.

Remco Bouckaert

unread,
Jun 6, 2022, 10:52:46 PM6/6/22
to beast...@googlegroups.com
On 3/06/2022, at 2:09 AM, Pavel Rinkman <pavel....@gmail.com> wrote:

If I understand correctly, the proposal kernel (operator in the BEAST glossology) suggests a new state from uniform distribution on the slice [β; 1/β], where β is a scale factor, doesn't it?

Yes!

And the sole way for the prior distribution to impact on the new state is to be a part of common prior estimation for eventual posterior (or joint) estimation of the state, isn't it?

Indeed.

Remco

Remco Bouckaert

unread,
Jun 6, 2022, 11:01:28 PM6/6/22
to beast...@googlegroups.com
On 5/06/2022, at 1:12 PM, babarlelephant <acx...@gmail.com> wrote:

Hi, interesting discussion. I wonder if there are some toy demos somewhere, 
like 30 lines of java instanciating a few BEAST classes (with a few comments to help reading the source of the relevant methods),
setting most parameters to a fixed value (population size, JC69, clockrate, tip dates..),
and running a MCMC to optimize a small tree, just its topology and the dates of 5 internal nodes? Finally drawing the best tree.

Probably the most minimal way to run MCMC with a tree is to load a small alignment into BEAUti, and keep default settings (JC69 substitution model, strict clock with fixed rate = 1, and Yule tree prior with estimated birth rate). The MCMC algorithm implemented in BEAST is quite sophisticated in that it adepts during the MCMC run to the problem at hand, caches calculations of previous states, and performs sanity checks to make sure the calculations are still valid. Not quite something that fits into 30 lines of code.

If you are interested in creating your own model in BEAST, there is information on how to do this here: http://www.beast2.org/package-development-guide/

By the way, MCMC is not “optimising” but “sampling” the state space. So, unlike with maximum likelihood, there is no “best tree”, just a sample of trees representing the posterior tree space.

Cheers,

Remco

Pavel Rinkman

unread,
Jun 7, 2022, 6:14:54 AM6/7/22
to beast-users
Thank you! I just noticed that borders for the BEAST 1 priors do not work. States can go out of them even if you specified how to truncate the distribution. Is this a bug? When I used BEAST 2, the upper & lower borders were active. This causes a question. How do they work? Do we just reject proposed states gone out of the borders? If this is true, do we adapt Hastings' ratio?

вторник, 7 июня 2022 г. в 05:52:46 UTC+3, higg...@gmail.com:

Remco Bouckaert

unread,
Jun 8, 2022, 3:08:44 AM6/8/22
to beast...@googlegroups.com
Most operators are agnostic about the posterior (Gibbs operators are an exception: these inform their proposals based on the posterior). When a proposal is done outside of the range of the parameter, the posterior becomes zero (log posterior becomes negative infinity) and the proposal is rejected. So the Hastings ratio does not need any adjustment (again, Gibbs operators are an exception, but these should never propose an out of range value anyway).

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beast-users/43bc8f34-30ee-4d41-9248-17955da58989n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages