X'X Matrix is Singular, X Chr issues

Adrianna Jurek

unread,

Mar 17, 2025, 1:16:18 PMMar 17

to R/qtl discussion

Hello,

I am attempting to run a QTL analysis of 70 individuals with ~3000 markers. It is an F2 cross with the parents, F1, and F2 data included in the spreadsheet. I am attempting to use EHK regression for analysis, as well as run sex as an interactive and additive covariate. When I run ehk, the LOD score for the X chromosome is extremely high, and the warning X'X matrix is singular appears 50+ times. I have tried reducing close markers on the X chromosome, but that did not seem to help either. The only way I can get a clean graph where I can see peaks on the other chromosomes is to completely remove the X chromosome from the analysis (which I would prefer not to do). Additionally, when running geno.table as seen below, I get the warning " In chisq.test(a, p = c(0.5, 0.5)) :Chi-squared approximation may be incorrect" 50+ times. When I remove the X chromosome, I get the same warning but only once. When running a similar qtl on many of the same individuals from the qtl that gave a very high X LOD score, I still get the error In chisq.test(a, p = c(0.5, 0.5)) :Chi-squared approximation may be incorrect" 50+ times, but if I run scanone on this, the X LOD is very low. I find this strange as many of the individuals in the low LOD run are also in the high LOD run, though not identical and a different phenotype was used.

Here is the code that I used:

mydatanew<-read.cross("csvr","C:/Users/Documents/","QTL.csv",na.strings="-",genotypes=c("AA","AB","BB"),alleles=c("A","B"))

#To identify SNPs with segregation distortion
gt<-geno.table(mydatanew)
gt[ gt$P.value < 1e-3, ]
#removed those SNPs. If left in they skew the analysis and you get a bad graph

#To separate close markers
mydataf<-jittermap(mydatanew)
summary(mydataf)

#To run Extended Haley-Knott regression
mydataf<-calc.genoprob(mydataf,step=1,error.prob=0.001)
out.ehk<-scanone(mydataf, method="ehk")

#To determine significance threshold
operm<-scanone(mydataf,n.perm=1000,perm.Xsp=TRUE,verbose=FALSE)
summary(operm,alpha=c(0.05,0.5))

#To graph QTL plot
plot(out.ehk, bandcol="gray87", ylab="LOD score")

#To identify peak locations and LOD score
summary(out.ehk,perms=operm,alpha=0.5,pvalues=TRUE)

Any insight to what could be going wrong is greatly appreciated.

Thank you!

Karl Broman

unread,

Mar 17, 2025, 1:23:29 PMMar 17

to R/qtl discussion

I would recommend against the use of the ehk method. In principle it has some potential advantages over regular Haley-Knott regression, but in practice it doesn’t seem to work so well.

I’m not sure if that is the actual cause of the X chromosome issues, but do you see the same problems with method=“hk”?

Regarding the “chi-square approximation” warnings in geno.table, I would just ignore them. The chisq.test() function is used to calculate p-values, and it issues that warning if any of the cells have small expected counts.

karl

Adrianna Jurek

unread,

Mar 17, 2025, 1:33:11 PMMar 17

to R/qtl discussion

I do not seem to have the same issue running "hk" instead of "ehk". There are no errors now and the X chromosome LOD appears normal.