Hello,
I am attempting to run a QTL analysis of 70 individuals with ~3000 markers. It is an F2 cross with the parents, F1, and F2 data included in the spreadsheet. I am attempting to use EHK regression for analysis, as well as run sex as an interactive and additive covariate. When I run ehk, the LOD score for the X chromosome is extremely high, and the warning X'X matrix is singular appears 50+ times. I have tried reducing close markers on the X chromosome, but that did not seem to help either. The only way I can get a clean graph where I can see peaks on the other chromosomes is to completely remove the X chromosome from the analysis (which I would prefer not to do). Additionally, when running geno.table as seen below, I get the warning " In chisq.test(a, p = c(0.5, 0.5)) :Chi-squared approximation may be incorrect" 50+ times. When I remove the X chromosome, I get the same warning but only once. When running a similar qtl on many of the same individuals from the qtl that gave a very high X LOD score, I still get the error In chisq.test(a, p = c(0.5, 0.5)) :Chi-squared approximation may be incorrect" 50+ times, but if I run scanone on this, the X LOD is very low. I find this strange as many of the individuals in the low LOD run are also in the high LOD run, though not identical and a different phenotype was used.
Here is the code that I used:
mydatanew<-read.cross("csvr","C:/Users/Documents/","QTL.csv",na.strings="-",genotypes=c("AA","AB","BB"),alleles=c("A","B"))
#To identify SNPs with segregation distortion
gt<-geno.table(mydatanew)
gt[ gt$P.value < 1e-3, ]
#removed those SNPs. If left in they skew the analysis and you get a bad graph
#To separate close markers
mydataf<-jittermap(mydatanew)
summary(mydataf)
#To run Extended Haley-Knott regression
mydataf<-calc.genoprob(mydataf,step=1,error.prob=0.001)
out.ehk<-scanone(mydataf, method="ehk")
#To determine significance threshold
operm<-scanone(mydataf,n.perm=1000,perm.Xsp=TRUE,verbose=FALSE)
summary(operm,alpha=c(0.05,0.5))
#To graph QTL plot
plot(out.ehk, bandcol="gray87", ylab="LOD score")
#To identify peak locations and LOD score
summary(out.ehk,perms=operm,alpha=0.5,pvalues=TRUE)
Any insight to what could be going wrong is greatly appreciated.
Thank you!