Hello Dr. Broman and the R/qtl community -
I was hoping that I could get some insight on the generation of a negative LOD score generated from the scanone function with a binary trait model generated using HK.
Specifically, I have one marker with a negative LOD score that is sandwiched between two markers with LOD scores > 30. There might be a recombination event or two between the consecutive markers (markers are 500 kb genomic bins genotyped with low coverage sequencing), but the genotypes aren't that different from one marker to the next. Given how I did the genotyping, if a recombination event happened in a bin, I scored that bin as missing data rather than assigning it a genotype, so maybe that particular marker has more missing data points, but I don't think that it varies much (I realize that I could get counts on the missing data, but I don't think it's likely to be substantially different). Would that likely cause the negative LOD score?
I have room to improve in my statistical knowledge and I tried looking into what the code for the scanone function (using getAnywhere) is to see if I could understand it better and figure out what it's actually doing, but I got lost in all of the if/thens. I don't think that the problem occurs if the data aren't analyzed with a binary model.
In a manual somewhere I saw that LOD is calculated as n/2 * log10(RSS0/RSS1). In my head, I read this as if the null hypothesis of no QTL (RSS0) is more likely than the alternative of a single QTL (RSS1), then the ratio would be greater than 1 and taking the log should generate a positive value and that a negative LOD would be generated when a single QTL is more likely than no QTL. What am I missing as it seems like it should be the other way around?
Also, the trait that I'm mapping appears to be a single locus Mendelian trait (based on phenotype frequencies), but when I map it, I get a second peak that appears to be fairly substantial. Looking more closely at the data, I realize that this is likely caused by some sort of DMI - for example, there are no individuals homozygous for one parent at one locus and homozygous for the other parent at the other locus. In order to show that the second locus is most likely due to a genetic incompatibility, I used the genotype at the main locus as a covariate. This eliminated the association between genotype and phenotype at the second locus and also resulted in lots of negative LOD scores at other parts of the genome. Spoiler alert, I already ID'ed the gene that's causing the phenotype (yeah), but am trying to make a nice figure of what happens when I control for the genotype at the main locus. For good measure, I did see what happened when I used the genotype at the second locus as a covariate for the main locus and we're all good.
Any help with understanding what is happening would be greatly appreciated! I'm attaching an image as an example.
Thanks, Anji