Hey,
I'm performing some genome-wide association studies in an all-male study population and would like to include the X and Y chromosomes.
My data is from the UK Biobank and I've been processing it in PLINK1.9.
A few questions arose:
1) One of the tests I'm performing is a --glm in PLINK. Does this handle X and Y chromosomes correctly, or do I need to take extra measures?
2) I perform some other association tests outside PLINK. I wanted to get a better idea of how the hemizygosity is encoded in my data, so I performed an --export A in PLINK1.9.
I was a bit confused by the output, though. In the PAR, things look as expected: 0/1/2 encoding with 2 being observed much less frequently than 1 across variants and individuals.
However, when I look at the non-PAR, I see variants which have 0/2, variants which have 0/1/2 with 2 being much more frequent than 1 and variants which have 0/1/2 with 1 being more frequent than 2. I expected to see only 0/2 in the non-PAR. Can sense be made of these, or is --export not supported in hemizygous cases?
If so, can I still assume that the PLINK binary format correctly represents the hemizygosity?