checking sex

1,339 views
Skip to first unread message

Mike Miller

unread,
Mar 30, 2014, 6:37:41 AM3/30/14
to plink2 users
I argue below for the use of Y markers in determining sex of samples, I
present some of my own data, and I also point out a problem with the way
--missing handles markers on the Y chromosome.


Biologically speaking, the presence of at least one Y chromosome is the
major determinant of sex in humans. The number of Y chromosomes is
negatively correlated with the number of X chromosomes, and maybe that is
why PLINK has always used X-chromosome markers to guess the sex of
samples. Maybe it is because X markers are more often available than are
Y markers.

When we test thousands of samples, we find a few that have X chromosome
abnormalities: Klinefelter, Turner, both partial and mosaic forms of those
aneuploidies, and Triple X. We also see some cases where inbreeding
causes increased F.

I just looked at a sample of about 4,800 subjects, 43% female, where I
have 874 Y-chromosome markers. When I count for every sample the number
of Y markers that were called (were non-missing), I get about 43% of
samples having zero non-missing Y markers and 57% of samples having 90% or
more of the Y markers called. There is nothing in between the 0% call
rate and the 90% call rate. So counting non-missing markers provides a
very convincing test of presence absence of a Y chromosome.

What's a good way to count the number of non-missing Y markers? Let's try
the obvious way first:

$ plink2 --bfile data --out data_Ymissing --chr 24 --missing

That seems like a good idea but plink (1.90a) has a special way of
handling the Y chromosome. In this table I combine two columns from that
output with the sex from the pedigree file (PEDSEX):

FREQUENCY PEDSEX N_GENO F_MISS
85 0 0 nan
767 1 874 0
49 1 874 1
1914 1 874 0<F<1
2060 2 0 nan

As you can see, whenever sex is missing (PEDSEX = 0), plink reports N_GENO
of 0, just as it does for anyone designated as female. This makes the
output much less useful for our purposes. You can also see that 49
individuals designated as male are missing all Y markers. That is
probably because they are not male. You can bet that some individuals
designated as female have non-missing Y markers.

Here's a different tactic you can use to get the desired counts of
non-missing Y markers, but you need to be using some kind of UNIXy system:

$ plink2 --bfile data --out data_chrY --chr 24 --recodeA
$ perl -pe 's/ NA//g' data_chrY.raw | awk '{print $1,$2,$5,NF-6}' > sex_chrY.txt
$ head sex_chrY.txt
FID IID SEX 874
A A 0 0
B B 0 0
C C 2 0
D D 1 874
E E 1 866
F F 1 871
G G 2 0
H H 0 869
I I 1 872

In that output you get a header line where the header in the last column
is the total number of Y markers. Those are real data with the IDs
changed to protect the innocent, but even in those first 9 records you can
see that of three samples of unknown sex, the first two appear to be
female and the third appears to be male. Otherwise, SEX (a.k.a. "PEDSEX")
shows the expected relation with the number of non-missing Y markers
(women don't have any and men have a lot).

I can then guess the sex based on number of non-missing Y-chromosome
markers and produce a variable that I call Y-SEX. I present it in this
table along with PEDSEX and SNPSEX from the output of --check-sex.

FREQUENCY PEDSEX SNPSEX Y-SEX
1 0 0 2
37 0 1 1
47 0 2 2
5 1 0 2
2681 1 1 1
12 1 1 2
32 1 2 2
60 2 0 2
27 2 1 1
1973 2 2 2

I think the Y-SEX is a very important variable here. I can't believe that
a sample has male Y-SEX (in my data with 874 Y markers) but was not from a
sample that had a Y chromosome. Having a Y chromosome means that the
sample was male or maybe there was complete androgen insensitivity such
that the individual was female but with a Y chromosome.

Why not add an option to include info about Y markers in the --check-sex
output? The plink documentation mentions a case where a woman in 1000G
has an F > .8. That kind of thing seems to have happened a bunch of times
in my data:

Only one of the 12 samples in the table below has an F value larger than
the smallest of those from the 2681 men who were male by all three
criteria (Y-SEX, PEDSEX and SNPSEX). Since not one of these twelve samples
had even one non-missing Y marker, I do not believe they are male. I
suspect Klinefelter in the 12th case.


FID IID Y-SEX PEDSEX SNPSEX STATUS F
-------- ------- ----- ------ ------ ------ ------
Z Z 2 1 1 OK 0.8328
Y Y 2 1 1 OK 0.8389
X X 2 1 1 OK 0.8397
W W 2 1 1 OK 0.8466
V V 2 1 1 OK 0.8593
U U 2 1 1 OK 0.8633
T T 2 1 1 OK 0.8645
S S 2 1 1 OK 0.8651
R R 2 1 1 OK 0.8658
Q Q 2 1 1 OK 0.8703
P P 2 1 1 OK 0.8789
O O 2 1 1 OK 0.9642

So even though the top 11 cases have high F values (.83 < F < .88), all 11
of them are lower than every F value of all 2,681 men. Thus it seems not
to be a coincidence that they have no Y chromosomes! I don't know yet how
to explain the high Fs. Do we know what caused them in the 1000G case?

Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4AAAAJ

Christopher Chang

unread,
Mar 30, 2014, 8:58:55 PM3/30/14
to plink2...@googlegroups.com
The current --check-sex implementation is really just around for backwards compatibility.  You're right that, when Y chromosome data is present, it's the most informative; I'll go ahead and take the obvious step of making the "PROBLEM" column account for nonmissing Y chromosome female calls (imputed sex will never be female in this case), and add a modifier for dumping the Y chromosome nonmissing call count column you've requested.

I don't know what's responsible for the highest F coefficients in the 1000G data; I think that's a worthwhile question to ask their team.  (I'm planning to disable use of --check-sex/--impute-sex without parameters in a future version, since the old 0.2/0.8 thresholds clearly don't work.)

Christopher Chang

unread,
Mar 31, 2014, 9:50:03 AM3/31/14
to plink2...@googlegroups.com
The 31 March development build has an extended --check-sex; let me know if it's satisfactory.


On Sunday, March 30, 2014 6:37:41 PM UTC+8, Mike Miller wrote:

Mike Miller

unread,
Mar 31, 2014, 10:08:10 AM3/31/14
to Christopher Chang, plink2...@googlegroups.com
Thank you, Dr. Chang!

I use --check-sex and non-missing Y marker counts to find sample mix-ups,
but also to identify sex chromosome abnormalities. The best way to do the
sex testing is to use the intensity data instead of the called genotypes.
I haven't done it myself, but a friend can distinguish normal men, women,
Klinefelter, Turner, mosaic Turner and Triple X, at least. It's a little
better than using the genotype calls, but not so much better that I want
to learn how to do it!

The first GWAS chip I worked with was an Illumina 660W-Quad and it had 8
markers on Y. The Affymetrix TX v.1 I'm working with now (last message)
has 874. I'm also working with the Illumina Omni 2.5 which has 1604. So
it looks like older chips might not have offered enough Y markers to be
very reliable for sex determination, and maybe that's why PLINK developers
chose to focus on X.

When PEDSEX = 2 and F < .35, I don't worry about it. I think every case
I've seen in that range has been normal female, probably non-European
samples in a data set of mostly European Americans.

If I figure out what's causing the F > .8 in the 11 women in my latest
sample, I'll let you know. I have to look at inbreeding, for one. I have
to deal with the ancestral population structure of the sample, too.

Mike

Christopher Chang

unread,
Mar 31, 2014, 10:25:12 AM3/31/14
to plink2...@googlegroups.com, Christopher Chang, mbmi...@umn.edu
I just tried following my own advice and LD-pruning the 1000G data before running --check-sex; that reduced the maximum female F coefficient to ~0.66.

Mike Miller

unread,
Apr 1, 2014, 10:20:48 AM4/1/14
to plink2 users
I didn't see your message until now, but it looks like the 1 April build,
which is here...

https://www.cog-genomics.org/static/bin/plink140401/plink_linux_x86_64_dev.zip

...doesn't have the extended --check-sex output. I tried to grab this...

https://www.cog-genomics.org/static/bin/plink140331/plink_linux_x86_64_dev.zip

...but I got this: "500 Internal Server Error". When I tried to look at
directory listings I got "403 Forbidden"

Mike
> --
> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

Christopher Chang

unread,
Apr 1, 2014, 11:08:48 AM4/1/14
to plink2...@googlegroups.com
Hmm, strange.  I went ahead and re-posted the files just now; zip file should be 1129799 bytes, and uncompressed binary should be 2817813 bytes.  Extended --check-sex seems to be working for me on Linux.
> To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

Mike Miller

unread,
Apr 1, 2014, 12:00:02 PM4/1/14
to Christopher Chang, plink2...@googlegroups.com
Thanks again, Chris. The file sizes are correct...

$ ls -l plink_linux_x86_64_dev.zip ../bin/plink2_dev
-rwxr-xr-x 1 mbmiller mbmiller 2817813 Apr 1 09:46 ../bin/plink2_dev*
-rw-rw-r-- 1 mbmiller mbmiller 1129799 Apr 1 09:46 plink_linux_x86_64_dev.zip

...but now it won't run:

$ plink2_dev --bfile chop1 --out chop1_check-sex_dev --check-sex
plink2_dev: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by plink2_dev)
plink2_dev: /lib64/libc.so.6: version `GLIBC_2.17' not found (required by plink2_dev)

I hope my libraries aren't out of date...

$ ls -l /lib64/libc.so.6
lrwxrwxrwx 1 root root 11 Sep 30 2010 /lib64/libc.so.6 -> libc-2.5.so*

$ ls -l /lib64/libc-2.5.so
-rwxr-xr-x 1 root root 1717800 Mar 31 2010 /lib64/libc-2.5.so*

$ uname -a ; cat /etc/issue
Linux snps.psych.umn.edu 2.6.18-194.11.4.el5 #1 SMP Tue Sep 21 05:04:09 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 5.5 (Final)
Kernel \r on an \m

Off Topic: By the way, I have a son named Christopher who will start in
the PhD program in pure math at Berkeley in the fall. He is mostly into
algebra: he spent last summer on commutative monoids in Hawaii, then a
thesis in algebraic topology, and this summer he'll be doing algebraic
statistics in Helsinki. I can't keep up with him anymore.

Mike
>> an email to plink2-users...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

Christopher Chang

unread,
Apr 1, 2014, 6:02:46 PM4/1/14
to plink2...@googlegroups.com, Christopher Chang, mbmi...@umn.edu
Okay, this definitely has something to do with yesterday's Amazon Linux upgrade.  Will try to fix the issue now.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

Christopher Chang

unread,
Apr 1, 2014, 11:59:26 PM4/1/14
to plink2...@googlegroups.com, Christopher Chang, mbmi...@umn.edu
Just posted a build with statically linked glibc that should work on your system; new binary size should be 4352300.  (Though not before trashing my system so badly that even simple commands like "cp" and "ls" stopped working, and I had to reinstall everything... suffice it to say that trying to manually downgrade your system's libc is a no good, very bad idea.)


On Wednesday, April 2, 2014 12:00:02 AM UTC+8, Mike Miller wrote:
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

Mike Miller

unread,
Apr 2, 2014, 11:17:07 AM4/2/14
to Christopher Chang, plink2...@googlegroups.com
I'm very sorry to hear of all the hassles you've been dealing with. I
tried the new development version, now 140402, and it didn't do well with
the --check-sex. I ran it on a set of 76 people who have no sex data. It
ran...

Warning: Nonmissing nonmale Y chromosome genotype(s) present.
Total genotyping rate is 0.996455.
2391739 variants and 76 people pass filters and QC.
--check-sex: 46441 Xchr and 2233 Ychr variant(s) scanned,
76 problems detected. Report written to SDRG_plate8_b_sex.sexcheck.

...but then it didn't seem to use the Y data, it gave the usual output,
but it left SNPSEX missing (zero) for all of the probable females. Here
are the first 20 subjects in the output, sorted by F:

FID IID PEDSEX SNPSEX STATUS F
X0108 X0108 0 0 PROBLEM -0.07516
X0678 X0678 0 0 PROBLEM 0.05293
X2736 X2736 0 0 PROBLEM 0.07506
X0556 X0556 0 0 PROBLEM 0.08925
X0731 X0731 0 0 PROBLEM 0.09103
X0334 X0334 0 0 PROBLEM 0.09378
X0774 X0774 0 0 PROBLEM 0.09786
X2509 X2509 0 0 PROBLEM 0.1016
X0076 X0076 0 0 PROBLEM 0.1072
X0663 X0663 0 0 PROBLEM 0.1079
X2413 X2413 0 0 PROBLEM 0.1308
X2541 X2541 0 0 PROBLEM 0.1316
X2741 X2741 0 1 PROBLEM 0.938
X0786 X0786 0 1 PROBLEM 0.9382
X0232 X0232 0 1 PROBLEM 0.9398
X0015 X0015 0 1 PROBLEM 0.9413
X0261 X0261 0 1 PROBLEM 0.9414
X0356 X0356 0 1 PROBLEM 0.9415
X0749 X0749 0 1 PROBLEM 0.9417

I ran it again using --check-sex .3 .9, but I got the same result.
>>>> an email to plink2-users...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>> Groups "plink2-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to plink2-users...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

Christopher Chang

unread,
Apr 2, 2014, 11:25:24 AM4/2/14
to plink2...@googlegroups.com, Christopher Chang, mbmi...@umn.edu
* If the F coefficient-based call is female, but there are nonmissing Y chromosome genotypes, the call is converted to ambiguous.
* To include the numbers of nonmissing Y genotypes, add the 'ycount' modifier.  e.g.

plink --bfile nosex_data --check-sex .3 .9 ycount

* If you decide that the nonmissing Y genotypes are not a problem for some reason, you can always rerun --check-sex/--impute-sex with just the X chromosome, e.g.:

plink --bfile nosex_data --check-sex .3 .9 --chr 23
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>> Groups "plink2-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

Mike Miller

unread,
Apr 2, 2014, 1:09:59 PM4/2/14
to Christopher Chang, plink2...@googlegroups.com
Funny thing -- I had the web page open, I had looked right at that command
line and somehow not seen it:

--check-sex {female max F} {male min F} <ycount>
--impute-sex {female max F} {male min F} <ycount>

It worked! Now things get weird in a way that is entirely Illumina's
fault. These data (same as shown previously below) are from Omni2.5 v1.1:

FID IID PEDSEX SNPSEX STATUS F YCOUNT
X0108 X0108 0 0 PROBLEM -0.07516 519
X0678 X0678 0 0 PROBLEM 0.05293 520
X2736 X2736 0 0 PROBLEM 0.07506 526
X0556 X0556 0 0 PROBLEM 0.08925 520
X0731 X0731 0 0 PROBLEM 0.09103 520
X0334 X0334 0 0 PROBLEM 0.09378 531
X0774 X0774 0 0 PROBLEM 0.09786 522
X2509 X2509 0 0 PROBLEM 0.1016 515
X0076 X0076 0 0 PROBLEM 0.1072 520
X0663 X0663 0 0 PROBLEM 0.1079 520
X2413 X2413 0 0 PROBLEM 0.1308 525
X2541 X2541 0 0 PROBLEM 0.1316 518
X2741 X2741 0 1 PROBLEM 0.938 2216
X0786 X0786 0 1 PROBLEM 0.9382 2216
X0232 X0232 0 1 PROBLEM 0.9398 2211
X0015 X0015 0 1 PROBLEM 0.9413 2212
X0261 X0261 0 1 PROBLEM 0.9414 2189
X0356 X0356 0 1 PROBLEM 0.9415 2204
X0749 X0749 0 1 PROBLEM 0.9417 2207

In case anyone is interested in tracking down the guilty party:

GSGT Version,1.9.4
Content,,HumanOmni25M-8v1-1_A.bpm

Even if I exclude all 614 markers that do not have rs numbers, I have
about 194 non-missing Y markers per woman and about 1600 for men. So I
think it is best to include all of Illumina-Y markers even though about
530 of them are not really Y markers. When I do that and sort by F,
selecting the cases at the transition from female to male, I see this:

FID IID PEDSEX SNPSEX STATUS F YCOUNT
X2281 X2281 0 0 PROBLEM 0.1327 520
X2445 X2445 0 0 PROBLEM 0.1836 520
X2576 X2576 0 0 PROBLEM 0.2138 522
X0059 X0059 0 0 PROBLEM 0.24 523
X0574 X0574 0 0 PROBLEM 0.2598 518
X2319 X2319 0 1 PROBLEM 0.9305 2205
X0720 X0720 0 1 PROBLEM 0.9332 2186
X2639 X2639 0 1 PROBLEM 0.9334 2212
X0321 X0321 0 1 PROBLEM 0.9355 2205
X0091 X0091 0 1 PROBLEM 0.9362 2210


Here I sort by YCOUNT and look at the transition:

FID IID PEDSEX SNPSEX STATUS F YCOUNT
X2697 X2697 0 0 PROBLEM 0.08498 524
X2413 X2413 0 0 PROBLEM 0.1308 525
X2890 X2890 0 0 PROBLEM 0.1037 525
X2736 X2736 0 0 PROBLEM 0.07506 526
X0334 X0334 0 0 PROBLEM 0.09378 531
X0720 X0720 0 1 PROBLEM 0.9332 2186
X0261 X0261 0 1 PROBLEM 0.9414 2189
X0320 X0320 0 1 PROBLEM 0.9423 2197
X0445 X0445 0 1 PROBLEM 0.9452 2201
X0335 X0335 0 1 PROBLEM 0.937 2202

So in this case the F and YCOUNT lead to the same conclusion, but PLINK
can't really use the YCOUNT because so many markers are not really on Y.
For this reason, I recommend allowing thresholds on YCOUNT just as you do
with F. For example, I might use these options:

--check-sex .3 .9 ycount 600 2000
--impute-sex .3 .9 ycount 600 2000

So female would be F<=.3 && ycount<=600 and male would be F>=.9 and &&
ycount>=2000.

Here's something you might not want to change, but I think it would be a
reasonable plan: If PEDSEX=0, then STATUS=PROBLEM even if both F and
YCOUNT point unambiguously to SNPSEX of 1 or 2. I would say that
--check-sex has not identified any new problem in those cases, so maybe
PROBLEM isn't the right status, but OK isn't quite correct, either. I
think of "PROBLEM" as indicating a conflict between PEDSEX and SNPSEX, but
that should either mean that both are greater than zero and unequal, or
that SNPSEX is zero. If PEDSEX is zero, there isn't really a problem
unless SNPSEX also is zero. If PEDSEX=0 and SNPSEX>0, maybe we should
have a special STATUS like "IMPUTE".

Minor point: for F, I don't think we need to retain four significant
digits and output that looks like this can be avoided if we use something
like "%.3f" to format the F values:

FID IID PEDSEX SNPSEX STATUS F YCOUNT
X0703 X0703 0 0 PROBLEM 0.006869 519
X2257 X2257 0 0 PROBLEM 0.102 518
X0720 X0720 0 1 PROBLEM 0.9332 2186

For F, having 0.006869 is not any more informative than 0.007 (or 0.01,
realistically). In another output file, I've seen as many as 7 digits
after the decimal.

Mike
>>>>>> an email to plink2-users...@googlegroups.com.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>> Groups "plink2-users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to plink2-users...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>> Groups "plink2-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to plink2-users...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

Christopher Chang

unread,
Apr 3, 2014, 5:27:07 AM4/3/14
to plink2...@googlegroups.com, Christopher Chang, mbmi...@umn.edu
--check-sex/--impute-sex now accept the four numeric parameters you asked for, as of the 3 April development build.  Don't see much of a need to change the PROBLEM column, though (and as for sig figs and the like, I'm deliberately keeping every PLINK 1.07 setting I can for now).
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>> Groups "plink2-users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>> Groups "plink2-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

Mike Miller

unread,
Apr 3, 2014, 1:44:15 PM4/3/14
to Christopher Chang, plink2...@googlegroups.com
Thanks. That's perfect. I love the new options for --check-sex. I just
ran them in two different data sets with different numbers of Y markers
(and different numbers of non-Y markers labeled incorrectly as Y), and it
worked perfectly both times. Thanks very much for this!

I'll do more research on my odd cases where PEDSEX=1 but there are no Y
markers and F>.8. It is very strange that there are so many -- maybe a
characteristic of the clinical sample sources.

Mike

Mike Miller

unread,
Apr 3, 2014, 1:52:43 PM4/3/14
to Christopher Chang, plink2...@googlegroups.com
The .fam file in both input and output looks like this:

S0076 S0076 0 0 2 2
S0774 S0774 0 0 2 2
S0786 S0786 0 0 1 1
S0678 S0678 0 0 2 2
S0334 S0334 0 0 2 2
S0261 S0261 0 0 1 1

But after using --extract snps.txt --make-bed, I get a .nosex file:

S0076 S0076
S0774 S0774
S0786 S0786
S0678 S0678
S0334 S0334
S0261 S0261

I don't see any errors in the sex specification, so I think there should
be no .nosex file.

Mike

Mike Miller

unread,
Apr 3, 2014, 1:54:57 PM4/3/14
to Christopher Chang, plink2...@googlegroups.com
Allow me to retract that. I was using this:

$ plink2 --version
PLINK v1.90a 64-bit (20 Mar 2014)


But then tried this...

$ plink2_dev --version
PLINK v1.90b1p 64-bit (3 Apr 2014)

...and everything was fine. Sorry.

Mike

Mike Miller

unread,
Apr 7, 2014, 6:37:28 PM4/7/14
to plink2 users
Two issues:

(1) can we turn off the F test for sex and just use YCOUNT?
(2) do we not want to count missing Y markers, disregarding sex?


Issue 1:

I've been using this:

--check-sex .3 .9 ycount 200 600

But suppose I want to call sex based only the YCOUNT, ignoring the
heterozygosity? A man can have two X chromosomes (Klinefelter) and a
woman can have one X chromosome (Turner), and with thousands of subjects
being studied, these are not all that rare. If I try a high female
ceiling and a low male floor, it fails with this error:

$ plink2_dev --bfile data --out data_sex --check-sex 2 -.5 ycount 200 600
PLINK v1.90b1p 64-bit (3 Apr 2014) https://www.cog-genomics.org/plink2
(C) 2005-2014 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to data_sex.log.
Error: --check-sex female F estimate ceiling cannot be larger than male floor.

Now that we have the ycount argument, should we add a way to turn off the
X-based F check?


Issue 2:

On Sun, 30 Mar 2014, Mike Miller wrote:

> What's a good way to count the number of non-missing Y markers? Let's
> try the obvious way first:
>
> $ plink2 --bfile data --out data_Ymissing --chr 24 --missing
>
> That seems like a good idea but plink (1.90a) has a special way of
> handling the Y chromosome. In this table I combine two columns from
> that output with the sex from the pedigree file (PEDSEX):
>
> FREQUENCY PEDSEX N_GENO F_MISS
> 85 0 0 nan
> 767 1 874 0
> 49 1 874 1
> 1914 1 874 0<F<1
> 2060 2 0 nan
>
> As you can see, whenever sex is missing (PEDSEX = 0), plink reports
> N_GENO of 0, just as it does for anyone designated as female.


I don't think we discussed this at all because we got into talking about
adding the ycount option to --check-sex. Now that --check-sex can do
ycount, the problem of how to count missing Y markers can be solved that
way, but I still wonder if you wouldn't want an option to simply count
what is missing, maybe as an "ignore-sex" argument to the --missing
option? I get that it wouldn't work when analyzing multiple chromosomes
simultaneously if female subjects are present, but sometimes we want to
count missing Y genotypes when sex is missing (PEDSEX=0) and we can't do
that.

Christopher Chang

unread,
Apr 7, 2014, 10:40:19 PM4/7/14
to plink2...@googlegroups.com, mbmi...@umn.edu
1. I'll add a 'y-only' modifier for this today.
2. Another workaround, if chromosome codes in the .bim file are numeric, is to temporarily treat the Y chromosome like an autosome via e.g. "--chr-set 30".

Mike Miller

unread,
Apr 8, 2014, 2:07:57 AM4/8/14
to Christopher Chang, plink2...@googlegroups.com
On Mon, 7 Apr 2014, Christopher Chang wrote:

> 1. I'll add a 'y-only' modifier for this today.

Thanks!

> 2. Another workaround, if chromosome codes in the .bim file are numeric,
> is to temporarily treat the Y chromosome like an autosome via e.g.
> "--chr-set 30".

Clever. I was trying to think of a trick other than changing the sexes,
but I couldn't come up with anything. That's a good idea.
> --
> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

Maryiam Shöâeè

unread,
Dec 11, 2015, 2:13:49 PM12/11/15
to plink2-users
This discussion was one of the most useful posts I have found on bioinformatic resources. Thank you both. 

freeseek

unread,
Jan 5, 2016, 11:17:17 AM1/5/16
to plink2-users
On Friday, December 11, 2015 at 2:13:49 PM UTC-5, Maryiam Shöâeè wrote:
This discussion was one of the most useful posts I have found on bioinformatic resources. Thank you both.

I will add to the discussion that when dealing with blood from old individuals, non-missing Y markers might not be a wise idea to check for sex, as it is quite common for old individuals to lose their Y chromosome in the blood (see http://dx.doi.org/10.1126/science.1262092). I have actually seen this problem extensively with genotype array data. Even if the Y chromosome loss is not 100%, many SNPs on the Y chromosome get called as missing. However, how many SNPs you have in your array on the Y chromosome can make a big difference here.

Christopher Chang

unread,
Jan 5, 2016, 5:10:19 PM1/5/16
to plink2-users
Thanks for the note; I've updated the sex imputation documentation to mention this issue.
Reply all
Reply to author
Forward
0 new messages