How to calculate the P-values in OmegaPlus?

186 weergaven
Naar het eerste ongelezen bericht

Maj

ongelezen,
5 dec 2014, 10:53:4505-12-2014
aan omeg...@googlegroups.com
Hello!
        I have just started learning OmegaPlus recently and I have a simple question to ask: how to calculate the P-values?

        I saw some publications that use this method report P-value for the estimated omegamax. I would like to get that for my data as well, but I couldn't find the related introduction from the manual. So I am just wondering whether anyone would be kind enough to help me with that? Many thanks in advance!

        Best regards,

        Maj

Eyal Privman

ongelezen,
23 dec 2014, 01:22:1623-12-2014
aan omeg...@googlegroups.com
Hello,

I'd like to join this question. Is there somewhere a detailed description of usage of OmegaPlus to infer a classic adaptive sweep? I managed to get it to run on my population genomic data set. But I've been looking through papers, and I understand I should run simulations for the null distribution using some coalescent simulator. I didn't see much about that in the manual. It would be useful to have a description of an example or something like a tutorial. Maybe there's a detailed description in some paper that I've missed?

Thanks,
Eyal

Pavlos Pavlidis

ongelezen,
23 dec 2014, 02:48:2923-12-2014
aan omeg...@googlegroups.com
Hi Eyal and Maj,

the approach to calculate the p-value should be in some papers (at least in Nielsen et al. 2005, and some papers that we had in Stephan's group in Munich). But you are right, that it hasn't been documented in a clear way as a stand-alone paper/manual. I'll describe it here briefly and I'll put it within the next days in the website (pop-gen.eu). 

i. You need to infer/know the demographic model of your population.
This means that either you study a population that somebody else has published a demographic model for this population, or you should infer it. There are several approaches to infer demographic models from genomic data, none is perfect. You can use:
a) Approximate Bayesian Computation ABC to either choose between alternative models, or to infer parameters of a specific model (e.g. bottleneck, expansion etc). Typically, in ABC you choose some loci (just some random fragments in the genome), let's say about 200 in random locations or in distances large enough between them to be considered independent. Then, you calculate summary statistics of these loci, then you simulate 200 independent loci again with various parameter values of the demographic model ( you can use msABC for that; I acknowledge though that we should update a bit the code). Then you can use Csillery's R package (abc) to infer demography. This pipeline is described more or less here:  http://www.bio.lmu.de/~pavlidis/home/?Software:msABC

b) You can use the approach of R. Durbin (Nature 2011) and his program PSMC. There is an update of this approach in 2014 (http://www.nature.com/ng/journal/v46/n8/abs/ng.3015.html). 

c) If you have population subdivision of two populations you can use IM, IMa model, but I have no experience with that (actually I had tried it 5-6 yrs ago and I had to wait 2-3 months for run to finish, so I killed it). 

Anyway, I'll try to put more detailed description in the web-site; the conclusion is that somehow you should know the demographic model of your population. 

ii) perform > 1000  neutral simulations wth the inferred/known demographic model (well, 1000 is of course arbitrary, but I'd use something like that)

This can be a bit tricky. You should use the same gridsize, and the corresponding mutation rate and recombination rate. For example if you simulate a whole chromosome, then theta = 4N mu and rho = 4 N r will be some large values probably. I'd suggest using Hudson's ms if the recombination rate for the simulated region is not very large. In Hudson's ms theta and rho refer to the whole simulated region (NOT to a single bp). If you have however large values of recombination rate (this can happen if you simulate a whole chromosome), then ms will take years. Thus, use MaCS from G. Chen (http://www-hsc.usc.edu/~garykche/). 

iii) calculate OmegaPlus for the simulations. 
Just be careful to use the same grid-size as in the real data

iv) calculate how many OmegaPlus_MAX are larger than the OmegaPlus_MAX of the real data. If only 5 per 1000 are larger then p-value is 5/1000. If none is larger then p-value < 1/1000.

I hope that these instructions are more or less clear. Don't hesitate to contact me if you have further questions

all the best
pavlos 

--
You received this message because you are subscribed to the Google Groups "OmegaPlus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to omegaplus+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Pavlos Pavlidis, PhD

Foundation for Research and Technology - Hellas
Institute of Molecular Biology and Biotechnology
Νikolaou Plastira 100, Vassilika Vouton
GR - 711 10, Heraklion, Crete, Greece

Eyal Privman

ongelezen,
23 dec 2014, 08:21:4023-12-2014
aan omeg...@googlegroups.com
Hi Pavlos,

Very helpful! Thanks! This is exactly what I was looking for. It would be good to include this in the manual. If you do, please make a note here and refer us to wherever you decide to post the more detailed description. As a non-expert who wants to apply OmegaPlus to some sequence dataset, this is the first time for me to carry out the pipeline you described here. Just looking at the manual I had no way to figure this out. The original papers describing the methods (Pavlidis et al. 2010 and also Nielsen et al. 2005) do not give a practical description of the tools a user should use (ABC, ms, etc.). Also looking through papers that used OmegaPlus I didn't find any detailed description of the overall pipeline.

Many thanks,
Eyal

Boris Shaskolskiy

ongelezen,
19 feb 2019, 07:06:3119-02-2019
aan OmegaPlus

Hello, all!

I`d like to calculate the p-value for the OmegaPlus results of microbe genomes. What kind of program I can use for the infer/know the demographic model of your population? Did the programs CoMuS or trajdemog from the http://pop-gen.eu/wordpress/software  are suitable for this?

Best regards,

Boris


пятница, 5 декабря 2014 г., 18:53:45 UTC+3 пользователь Maj написал:

pavlos

ongelezen,
12 apr 2019, 08:10:2612-04-2019
aan OmegaPlus
Dear Boris,
this is not a very easy problem. I think that you should use one of the following solutions:
1. ABC with Hudson's ms and gene conversion (please ask more if you don't know how to do this)
2. You can use the dadi software to infer the demographic model.

By the way, what microbe genomes have you used? are they publicly available or your own?

best
pavlos
Allen beantwoorden
Auteur beantwoorden
Doorsturen
0 nieuwe berichten