LD decay analysis of SNP data

3,021 views
Skip to first unread message

Vicky

unread,
Jul 18, 2012, 11:20:52 AM7/18/12
to tas...@googlegroups.com
Hi,
Can anyone suggest me how to plot decay of LD of SNP data. I've r2 and D' values from TASSEL 3.0 but want to plot them for decay analysis which require some non linear regression model. the model need C= recombination rate (rho) something from Weir and Hills statistics. Using LD hat but not know about interpretation of the data. Really confused, am not perfect in statistics, please assist me in getting out of this.
Thanks,

Vinod Kumar

Maxime Bastien

unread,
Sep 6, 2012, 5:18:15 PM9/6/12
to tas...@googlegroups.com
Hi Vinod,

I am doing AM in soybean for disease resistance with ~8500 SNPs and I am currently facing the same problem as you. I found a R script on the web which I thought would do the job. With this script I am able to get a good plot of r2 values against physical distance. However there is some problem with the the regression model. For all chromosomes the regression line shows that LD decreases below 0.2 after 10-20 base pairs, which doesn't make sense, especially in an autogamous species. 

Here's a copy of the script, hope it can be helpful to you. And don't hesitate to tell me if you find any errors.

Cheers!

Maxime

Gm03_MAF5<-read.table('Gm03_MAF5.txt',header=T)
colnames(Gm03_MAF5)<-c('Dist','R')

read.delim("Gm03_MAF5.txt")->a
a$Dist->x
a$R*a$R->y
data.frame(x,y) ->dataframe
remington <- function(x)
  {
A <- 10+x
B <- (2+ x) * (11+ x)
C <- (3+ x)*(12+12* x+ x*x)
D<- 45*(2+ x)*(11+ x)
y<- (A/B)*(1+(C/D))
return(y)
  }
modele <- nls(y~remington(x*r),
data=dataframe,
trace=TRUE ,
start=list(r=0.01),
)
plot(a$Dist,a$R*a$R, xlim=c(1,65000000), main= " LD decay Gm03", xlab="distance in base pairs", ylab="r2")
curve(remington(x), 0,65000000, add = TRUE, lwd=2, col = "red")

Vicky

unread,
Sep 7, 2012, 1:11:02 AM9/7/12
to tas...@googlegroups.com
HI Bastein,
Thanks for the cooperation. I'll try the script and let you know about the results. Do it need any library to invoke or it will work without any library?
Thanks,

Vindo Kumar

Vicky

unread,
Sep 7, 2012, 1:29:48 AM9/7/12
to tas...@googlegroups.com
HI Bastien,
I've used the script but its not giving right results, also it giving warning at the end, I am writing below:
Warning message:
In curve(remington(x), 0, 6.5e+07, add = TRUE, lwd = 2, col = "red") :
  'add' will be ignored as there is no existing plot
I am also attaching the plot generated by the script.. I am working with a self pollinated diploid plant species.

Please see the results and suggest me how to correct it.

Thanks,

Vinod 



On Wednesday, July 18, 2012 8:50:52 PM UTC+5:30, Vicky wrote:
result.pdf
Message has been deleted

Elmer Iquira

unread,
Mar 1, 2013, 12:22:40 PM3/1/13
to tas...@googlegroups.com
Hello Vinod,

I got a little bit similar script for doing the same LD decay, but It had some addition in order to used a calculated regression value to do the line but it is using the same formula. It seems to work better than the first script. Could you try to see if you think it gives a good result?


Elmer 


On Wednesday, July 18, 2012 11:20:52 AM UTC-4, Vicky wrote:
Hi,
Can anyone suggest me how to plot decay of LD of SNP data. I've r2 and D' values from TASSEL 3.0 but want to plot them for decay analysis which require some non linear regression model. the model need C= recombination rate (rho) something from Weir and Hills statistics. Using LD hat but not know about interpretation of the data. Really confused, am not perfect in statistics, please assist me in getting out of this. Instead of 101 in the formula you can modify this parameter because it is related to the population size. 
Thanks,

The script can be modified to get a zoom for a specific distance ( ex.  5 Mb, less or more, you just need to modify this parameter in the option instead of 60Mb)




LD_R<-read.table('LDchrom1_A.txt',header=T)
colnames(LD_R)<-c('Dist','R') 

read.delim("LDchrom1_A.txt")->a 

a$Dist->x 
a$R*a$R->y 
data.frame(x,y) ->dataframe 
remington <- function(x) 


A <- 10+x
B <- (2+ x) * (11+ x) 
C <- (3+ x)*(12+12* x+ x*x) 
D<- 101*(2+ x)*(11+ x) 

y<- (A/B)*(1+(C/D)) 
return(y) 


modele <- nls(y~remington(x*r),
data=dataframe, 
trace=TRUE , 
start=list(r=0.00000001), 

rchapeau<-coef(modele) 
#par(mfrow=c(2,2)) 
plot(a$Dist,a$R*a$R, xlim=c(1,60000000), cex=.5, pch=20, col='gray', main= " Gm01 all SNPs", xlab="distance in base pairs", ylab="r2") 
curve(remington(x*rchapeau ), 0,60000000, add = TRUE, lwd=2, col = "black") 




Message has been deleted

Caroline

unread,
Aug 19, 2014, 8:30:03 AM8/19/14
to tas...@googlegroups.com, jabb...@hotmail.com
Hello Bastien,
Your script interest me to make a plot of r² values. I try your script but the R code: "read.delim("Gm03_MAF5.txt")->a" doesn't work.
The warning message is :
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'Gm03_MAF5.txt': No such file or directory

My initial file is LD analysis from Tassel. Is it the good file?
Thanks
Caroline
Reply all
Reply to author
Forward
0 new messages