plink's handling of sex chromosomes

2,233 views
Skip to first unread message

Kaustubh Adhikari

unread,
Nov 21, 2014, 4:10:16 AM11/21/14
to plink2...@googlegroups.com
Sorry if this has already been documented somewhere, but I couldn't find a clear description of how plink handles sex chromosomes, so I have a few questions:

1. Hows does plink expect the genotype data to look like, for X and Y chromosomes, for male/female/unspecified individuals?

My data comes from Illumina snp chip, exported by their genomestudio software for plink, which separates X, Y and PAR. The plink export part is very poorly documented so I don't know how they do it to begin with. But what does plink expect? E.g. X chromosome genotype for a male should be G G, not G 0, right? And this probably relates to the 'heterozygous haploid' warning plink gives?

2. When calculating MAF or missingness, how does plink treat X chromosome genotypes? Does it take sex information into account? Does this depend on the genotype coding plink expects?

3. When calculating MAF or missingness, how does plink treat Y chromosome genotypes? Does it take sex information into account? Does this depend on the genotype coding plink expects?

After creating new ped files through --chr 23 or 24 and --geno 0.05 --min 0.05, I still sometimes see a few snps/individuals with >5% missingness rate when --missing is applied. In our data we expect some sex misspecification, but not sure if that should lead to this. it made me confused and hence asking the questions. I have a few simple implementation requests about QC and analysis procedures on sex chromosomes which I can propose once this is clear.

Christopher Chang

unread,
Nov 21, 2014, 4:36:04 AM11/21/14
to plink2...@googlegroups.com
1. Male should be G G for both the X and Y chromosomes.  Female should be 0 0 on the Y chromosome.  Unspecified gender is treated as female.

2/3. MAF and missingness computations do take gender into account.

Kaustubh Adhikari

unread,
Nov 21, 2014, 5:23:49 AM11/21/14
to plink2...@googlegroups.com
Thank you!

Won't it be better to treat unspecified gender as male for Y chromosome? Currently --missing gives nan for unspecifieds, but having a rate can give an idea about sex. (Yes, I know that --check-sex has a ycount option now, but it didn't when I was trying this analysis.)

Do you have any guess as to why I was having the problem with missingness, that a few snps/individuals gave >5% missingness rate in sex chromosomes even after creating a file with --geno 0.05 --mind 0.05 filters? Unfortunately I did it a while back and can't locate the files now, but I definitely remember seeing this.

Christopher Chang

unread,
Nov 21, 2014, 11:40:38 AM11/21/14
to plink2...@googlegroups.com
I may add special unspecified-gender handling in the future (e.g. allow both heterozygous X and nonmissing Y calls simultaneously).  For now, though, backward compatibility with 1.07 is the priority.

An individual could have >5% missing rate after your filter if, say, it started at a 4.99% missing rate, and then some of its nonmissing SNPs were removed by --geno.  This is due to the order of operations: --mind is resolved before --geno.  It's a mostly harmless phenomenon, but if you want to stamp it out, you should run additional --mind + --geno round(s) until no new individuals/SNPs are removed.

Kaustubh Adhikari

unread,
Nov 23, 2014, 3:36:33 AM11/23/14
to plink2...@googlegroups.com
Thanks. I am summarizing some suggestions on QC and GWAS on X chromosome as suggested by these papers. All of these are obtainable through running multiple plink commands and summarizing the results through some script, but it will be useful to have them performed at one go.

Ref:
How to Include Chromosome X in Your Genome-Wide Association Study
Inke R. Konig, Christina Loley, Jeanette Erdmann, and Andreas Ziegler

XWAS: a toolset for genetic data analysis and association studies of the X chromosome
Diana Chang, Feng Gao and Alon Keinan

QC suggestion:
1. Include only those SNPs which have MAF > a% in both males and females. (1% usually).

2. looking at SNP missingness difference between males and females. One way to do it is to exclude |Miss_Male - Miss_Female| > a% (2% usually). Other is to test differential missingness in males vs. females using a chi-square test and exculde using p-value threshold of 10^-7 usually.

GWAS suggestion:
 Perform stratified GWAS, i.e. in males and females separately, then combine results using Fisher's method. (or any other method which combines test statistics, possibly using weights proportional to sample sizes etc. Combining coefficient estimates is not recommended because genetic effects might be opposite in the two sexes.)

Maybe a new file can be produced where results from many different approaches are presented together, so that one doesn't have to run them separately? As suggested in the papers, models to consider are: only-males, only-females, combined data using --xchr-model 1, combined data using --xchr-model 2, and the proposed stratified model.

Kaustubh Adhikari

unread,
Nov 27, 2014, 3:19:52 AM11/27/14
to plink2...@googlegroups.com
Any comments on these, Christopher? Do they seem reasonable?

Christopher Chang

unread,
Nov 27, 2014, 3:13:02 PM11/27/14
to plink2...@googlegroups.com
I haven't tried XWAS yet, but it looks reasonable.

For QC #2, you may as well use Fisher's exact test.

Shicheng Guo

unread,
Mar 28, 2018, 5:02:10 PM3/28/18
to plink2-users
Another interesting thing is plink will remove SNPs in chrX and chrY when doing PCA analysis. Why not just provide a option include sex chromosome or not? 

Christopher Chang

unread,
Mar 28, 2018, 5:19:24 PM3/28/18
to plink2-users
That would require a consensus on how chrX and chrY should be handled in genomic PCA, which doesn't really exist.

If you want to force the usual formulas to be applied to chrX and/or chrY, it's pretty easy to do this by manipulating chromosome codes/sets.

junling REN

unread,
Jan 2, 2025, 5:20:46 PMJan 2
to plink2-users

Thanks for your summarizing. Can I use 'plink --bfile --test-missing --pheno sex.txt --pheno-name Sex --out missing' to isolate differential missingness in males vs. females? 
Reply all
Reply to author
Forward
0 new messages