How to use CLR value & p-value to figure out outlier ?

188 views
Skip to first unread message

Yafei Mao

unread,
Jul 26, 2016, 1:39:46 AM7/26/16
to OmegaPlus
Dear all,

I am trying to run sweed and find some interesting results but the CLR value is pretty low. Please see in attachment.

I mark the outlier which p-value is lower than 0.05 (script provided by your official website). could I say that these red points is the sites which are under selection?

BTW, In the paper (Pavlidis, Pavlos, et al. "SweeD: likelihood-based detection of selective sweeps in thousands of genomes." Molecular biology and evolution (2013): mst112.), the authors combine the sweed and omegaplus, but my data is not good for using in omegaplus as your suggestion in paper.  However, If I combine Sweed outlier and Bayenscan outlier. do you think the common outlier would be more reliable? 

Thanks a lot^^  

Best, 
Yafei 
figure3.eps
Rplots.pdf

Pavlos Pavlidis

unread,
Jul 26, 2016, 2:05:17 AM7/26/16
to omeg...@googlegroups.com
Hi Yafei,
the values are quite low as far as I see. Are these outliers calculated as the   top 5% of all the values you have there? Or you have used simulations to find the threshold?

best
pavlos

--
You received this message because you are subscribed to the Google Groups "OmegaPlus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to omegaplus+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Pavlos Pavlidis, PhD
Research C

Foundation for Research and Technology - Hellas
Institute of Computer Science
GR - 711 10, Heraklion, Crete, Greece

Yafei Mao

unread,
Jul 26, 2016, 2:38:06 AM7/26/16
to OmegaPlus
Hi Pavlos,

Thanks for your quick reply.

I am using the blow function to find out outlier:

assign.pvalues <- function(array){
  #array <- sample(sw, 1000)
  pvalues <- array(0, length(array))
  ordered.indexes <- order(array)
  
  j <- length(array)
  for( i in ordered.indexes ){
    pvalues[i] <- j/length(array)
    j <- j-1
  }
  return(pvalues)  
}

thr <- 0.05
sw.pval <- assign.pvalues(sw)
outliers <- which(sw.pval < thr)

my commnad is:


/work/student/yafei-mao/SweeD_v3.2.1_Linux/SweeD -name sc0000015.recode.vcf_vcf -input sc0000015.recode.vcf_vcf -grid 250 -folded

the outlier is real data not simulation data.
Thanks again.

Best, 
Yafei 

Pavlos Pavlidis

unread,
Jul 26, 2016, 3:20:14 AM7/26/16
to omeg...@googlegroups.com
Hi Yafei,
the problem with this approach is that *always* you will find some outliers. IN principle the top 5% are your outliers independently of the values they have.

I"d suggest, in addition to the analysis you have done, to try to infer your demographic model. Have you tried fastsimcoal2 or abc to infer demography?

best
pavlos

Yafei Mao

unread,
Jul 28, 2016, 3:08:16 AM7/28/16
to omeg...@googlegroups.com
Hi Pavlos,

Sorry for late reply, I worked with my final last days. Sorry again. 
I did PSMC for my samples, the demography is same as my expection. 
Do you think it will have some helps for Sweed?
Thanks for your help.

Best, 
Yafei 

--
You received this message because you are subscribed to a topic in the Google Groups "OmegaPlus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/omegaplus/vuipyEWY7bI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to omegaplus+...@googlegroups.com.
AAya14_out.eps

Pavlos Pavlidis

unread,
Jul 28, 2016, 3:53:08 AM7/28/16
to omeg...@googlegroups.com
Hi Yafei,
cool! what I suggest is to implement the demographic model in Hudson's ms, i.e. to simulate neutral data with your demographic model. Then run SweeD on the simulated data and get a distribution of maximum values. With this distribution you can calculate the 95% threshold.

best
pavlos

Yafei Mao

unread,
Jul 30, 2016, 8:43:02 PM7/30/16
to omeg...@googlegroups.com
Hi Pavlos,

Thanks a lot.
I will try my best.

Best, 
Yafei 
Reply all
Reply to author
Forward
0 new messages