Funny thing -- I had the web page open, I had looked right at that command
line and somehow not seen it:
--check-sex {female max F} {male min F} <ycount>
--impute-sex {female max F} {male min F} <ycount>
It worked! Now things get weird in a way that is entirely Illumina's
fault. These data (same as shown previously below) are from Omni2.5 v1.1:
FID IID PEDSEX SNPSEX STATUS F YCOUNT
X0108 X0108 0 0 PROBLEM -0.07516 519
X0678 X0678 0 0 PROBLEM 0.05293 520
X2736 X2736 0 0 PROBLEM 0.07506 526
X0556 X0556 0 0 PROBLEM 0.08925 520
X0731 X0731 0 0 PROBLEM 0.09103 520
X0334 X0334 0 0 PROBLEM 0.09378 531
X0774 X0774 0 0 PROBLEM 0.09786 522
X2509 X2509 0 0 PROBLEM 0.1016 515
X0076 X0076 0 0 PROBLEM 0.1072 520
X0663 X0663 0 0 PROBLEM 0.1079 520
X2413 X2413 0 0 PROBLEM 0.1308 525
X2541 X2541 0 0 PROBLEM 0.1316 518
X2741 X2741 0 1 PROBLEM 0.938 2216
X0786 X0786 0 1 PROBLEM 0.9382 2216
X0232 X0232 0 1 PROBLEM 0.9398 2211
X0015 X0015 0 1 PROBLEM 0.9413 2212
X0261 X0261 0 1 PROBLEM 0.9414 2189
X0356 X0356 0 1 PROBLEM 0.9415 2204
X0749 X0749 0 1 PROBLEM 0.9417 2207
In case anyone is interested in tracking down the guilty party:
GSGT Version,1.9.4
Content,,HumanOmni25M-8v1-1_A.bpm
Even if I exclude all 614 markers that do not have rs numbers, I have
about 194 non-missing Y markers per woman and about 1600 for men. So I
think it is best to include all of Illumina-Y markers even though about
530 of them are not really Y markers. When I do that and sort by F,
selecting the cases at the transition from female to male, I see this:
FID IID PEDSEX SNPSEX STATUS F YCOUNT
X2281 X2281 0 0 PROBLEM 0.1327 520
X2445 X2445 0 0 PROBLEM 0.1836 520
X2576 X2576 0 0 PROBLEM 0.2138 522
X0059 X0059 0 0 PROBLEM 0.24 523
X0574 X0574 0 0 PROBLEM 0.2598 518
X2319 X2319 0 1 PROBLEM 0.9305 2205
X0720 X0720 0 1 PROBLEM 0.9332 2186
X2639 X2639 0 1 PROBLEM 0.9334 2212
X0321 X0321 0 1 PROBLEM 0.9355 2205
X0091 X0091 0 1 PROBLEM 0.9362 2210
Here I sort by YCOUNT and look at the transition:
FID IID PEDSEX SNPSEX STATUS F YCOUNT
X2697 X2697 0 0 PROBLEM 0.08498 524
X2413 X2413 0 0 PROBLEM 0.1308 525
X2890 X2890 0 0 PROBLEM 0.1037 525
X2736 X2736 0 0 PROBLEM 0.07506 526
X0334 X0334 0 0 PROBLEM 0.09378 531
X0720 X0720 0 1 PROBLEM 0.9332 2186
X0261 X0261 0 1 PROBLEM 0.9414 2189
X0320 X0320 0 1 PROBLEM 0.9423 2197
X0445 X0445 0 1 PROBLEM 0.9452 2201
X0335 X0335 0 1 PROBLEM 0.937 2202
So in this case the F and YCOUNT lead to the same conclusion, but PLINK
can't really use the YCOUNT because so many markers are not really on Y.
For this reason, I recommend allowing thresholds on YCOUNT just as you do
with F. For example, I might use these options:
--check-sex .3 .9 ycount 600 2000
--impute-sex .3 .9 ycount 600 2000
So female would be F<=.3 && ycount<=600 and male would be F>=.9 and &&
ycount>=2000.
Here's something you might not want to change, but I think it would be a
reasonable plan: If PEDSEX=0, then STATUS=PROBLEM even if both F and
YCOUNT point unambiguously to SNPSEX of 1 or 2. I would say that
--check-sex has not identified any new problem in those cases, so maybe
PROBLEM isn't the right status, but OK isn't quite correct, either. I
think of "PROBLEM" as indicating a conflict between PEDSEX and SNPSEX, but
that should either mean that both are greater than zero and unequal, or
that SNPSEX is zero. If PEDSEX is zero, there isn't really a problem
unless SNPSEX also is zero. If PEDSEX=0 and SNPSEX>0, maybe we should
have a special STATUS like "IMPUTE".
Minor point: for F, I don't think we need to retain four significant
digits and output that looks like this can be avoided if we use something
like "%.3f" to format the F values:
FID IID PEDSEX SNPSEX STATUS F YCOUNT
X0703 X0703 0 0 PROBLEM 0.006869 519
X2257 X2257 0 0 PROBLEM 0.102 518
X0720 X0720 0 1 PROBLEM 0.9332 2186
For F, having 0.006869 is not any more informative than 0.007 (or 0.01,
realistically). In another output file, I've seen as many as 7 digits
after the decimal.
Mike
>>>>>> an email to
plink2-users...@googlegroups.com.
>>>>>>> For more options, visit
https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>> Groups "plink2-users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to
plink2-users...@googlegroups.com.
>>>>> For more options, visit
https://groups.google.com/d/optout.
>>>>>
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>> Groups "plink2-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to
plink2-users...@googlegroups.com.
>>> For more options, visit
https://groups.google.com/d/optout.
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
plink2-users...@googlegroups.com.