confused about theta

xjian

unread,

Aug 31, 2012, 12:20:24 AM8/31/12

to dadi...@googlegroups.com

Hi Ryan,

Thank you for updating dadi last time. My experience in the grid search is much better now.

I am not quite clear about the theta when defining the model (1 by default or set to some fixed value) and the optimized theta using Inference.optimal_sfs_scaling (model , data). Is the optimized theta the one for ancestral population? The description in the manual are confusing.

I am also confused about 'fixed theta'. My personal experience is, for a given model, I got an optimized theta (e.g., 700000). Then I fixed theta to 700000 in the same model and ran the same process, and got the optimized theta equal to 1. Does it mean although I don't 'fix' theta, it is still fixed to 1 during the simulation? And the optimized value of theta is just a relative value to what we have set in the model (e.g., 1 by default)? If so, what is the use of 'fixed theta'?

Thanks,

Xueqiu

Gutenkunst, Ryan N - (rgutenk)

unread,

Sep 4, 2012, 4:23:38 PM9/4/12

to dadi...@googlegroups.com

Hello Xueqiu,

On 8/30/12 9:20 PM, "xjian" <xueqi...@gmail.com> wrote:

I am not quite clear about the theta when defining the model (1 by default or set to some fixed value) and the optimized theta using Inference.optimal_sfs_scaling (model , data). Is the optimized theta the one for ancestral population? The description in the manual are confusing.

I am also confused about 'fixed theta'. My personal experience is, for a given model, I got an optimized theta (e.g., 700000). Then I fixed theta to 700000 in the same model and ran the same process, and got the optimized theta equal to 1. Does it mean although I don't 'fix' theta, it is still fixed to 1 during the simulation? And the optimized value of theta is just a relative value to what we have set in the model (e.g., 1 by default)? If so, what is the use of 'fixed theta'?

dadi handles theta differently than other parameters, because theta works in a particular way. Theta just scales the frequency spectrum. So (if all other parameters are held fixed) the frequency spectrum for theta=2 is just the frequency spectrum for theta=1 with every entry doubled. In practice this means that we don't need to use the full optimizer to find theta, since it's so easy to find once we've optimized the other parameters.

The definition of theta used in dadi is theta=4*Nref*mu*L. Here mu is the mutation rate, L is the length of sequence, and Nref is the effective size of the reference population. That's the population all other population sizes are measured in relation to. In the most common case, that population is the ancestral population.

As for "fixed theta", not that the function you call is optimal_sfs_scaling. If theta=1 in the model, which is the default, then optimal_sfs_scaling gives the optimal value for theta. If theta is fixed to some other value in the model, then optimal_sfs_scaling gives the factor by which theta must be scaled to optimally fit the data. In your case, that factor is 1, since you fixed theta to be equal to the optimal value.

The "fixed theta" analysis is useful when you have some external information about theta, so that you don't want to get it by fitting the FS. For example, maybe you know what the ancestral size is, and you want to plug that information into your model, rather than fitting it.

Does this answer your questions?

Best,

Ryan

Thanks,
Xueqiu

--

Ryan Gutenkunst

Assistant Professor

Molecular and Cellular Biology

University of Arizona

phone: (520) 626-0569

http://gutengroup.mcb.arizona.edu

Message has been deleted

xjian

unread,

Sep 7, 2012, 5:48:59 PM9/7/12

to dadi...@googlegroups.com

Yes Ryan, thank you so much!

Reply all

Reply to author

Forward