How to use CLR value & p-value to figure out outlier ?

Yafei Mao

unread,

Jul 26, 2016, 1:39:46 AM7/26/16

to OmegaPlus

Dear all,

I am trying to run sweed and find some interesting results but the CLR value is pretty low. Please see in attachment.

I mark the outlier which p-value is lower than 0.05 (script provided by your official website). could I say that these red points is the sites which are under selection?

BTW, In the paper (Pavlidis, Pavlos, et al. "SweeD: likelihood-based detection of selective sweeps in thousands of genomes." Molecular biology and evolution (2013): mst112.), the authors combine the sweed and omegaplus, but my data is not good for using in omegaplus as your suggestion in paper. However, If I combine Sweed outlier and Bayenscan outlier. do you think the common outlier would be more reliable?

Thanks a lot^^

Best,

Yafei

figure3.eps

Rplots.pdf

Pavlos Pavlidis

unread,

Jul 26, 2016, 2:05:17 AM7/26/16

to omeg...@googlegroups.com

Hi Yafei,

the values are quite low as far as I see. Are these outliers calculated as the top 5% of all the values you have there? Or you have used simulations to find the threshold?

best

pavlos

--
You received this message because you are subscribed to the Google Groups "OmegaPlus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to omegaplus+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Pavlos Pavlidis, PhD

Research C

Foundation for Research and Technology - Hellas

Institute of Computer Science
GR - 711 10, Heraklion, Crete, Greece

Yafei Mao

unread,

Jul 26, 2016, 2:38:06 AM7/26/16

to OmegaPlus

Hi Pavlos,

Thanks for your quick reply.

I am using the blow function to find out outlier:

assign.pvalues <- function(array){
#array <- sample(sw, 1000)
pvalues <- array(0, length(array))
ordered.indexes <- order(array)

j <- length(array)
for( i in ordered.indexes ){
pvalues[i] <- j/length(array)
j <- j-1
}
return(pvalues)
}

thr <- 0.05
sw.pval <- assign.pvalues(sw)
outliers <- which(sw.pval < thr)

my commnad is:

/work/student/yafei-mao/SweeD_v3.2.1_Linux/SweeD -name sc0000015.recode.vcf_vcf -input sc0000015.recode.vcf_vcf -grid 250 -folded

the outlier is real data not simulation data.

Thanks again.

Best,

Yafei

Pavlos Pavlidis

unread,

Jul 26, 2016, 3:20:14 AM7/26/16

to omeg...@googlegroups.com

Hi Yafei,

the problem with this approach is that *always* you will find some outliers. IN principle the top 5% are your outliers independently of the values they have.

I"d suggest, in addition to the analysis you have done, to try to infer your demographic model. Have you tried fastsimcoal2 or abc to infer demography?

best

pavlos

Yafei Mao

unread,

Jul 28, 2016, 3:08:16 AM7/28/16

to omeg...@googlegroups.com

Hi Pavlos,

Sorry for late reply, I worked with my final last days. Sorry again.

I did PSMC for my samples, the demography is same as my expection.

Do you think it will have some helps for Sweed?

Thanks for your help.

Best,

Yafei

--
You received this message because you are subscribed to a topic in the Google Groups "OmegaPlus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/omegaplus/vuipyEWY7bI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to omegaplus+...@googlegroups.com.

AAya14_out.eps

Pavlos Pavlidis

unread,

Jul 28, 2016, 3:53:08 AM7/28/16

to omeg...@googlegroups.com

Hi Yafei,

cool! what I suggest is to implement the demographic model in Hudson's ms, i.e. to simulate neutral data with your demographic model. Then run SweeD on the simulated data and get a distribution of maximum values. With this distribution you can calculate the 95% threshold.

best

pavlos

Yafei Mao

unread,

Jul 30, 2016, 8:43:02 PM7/30/16

to omeg...@googlegroups.com

Hi Pavlos,

Thanks a lot.

I will try my best.

Best,

Yafei

Reply all

Reply to author

Forward