Problem in sex checking in Plink v1.9

1,052 views
Skip to first unread message

Mia

unread,
Jun 16, 2020, 9:43:50 PM6/16/20
to plink2-users
Hello,

I recently have been working over the bfile in plink v1.9. I have been receiving warning #1. het. haploid genotypes... and #2. Nonmissing nonmale Y chromosome genotypes...when I did sex checkings of the dataset.

My question is:
  • How am I supposed to deal with those warnings when doing GWAS QC?
  • Heaps of samples tagged with "PROBLEM". I was wondering what the issue here is likely to be in your opinion, please? And how should I do to address the issue?
plink --bfile mydata -chr 23-24 --make-bed --out mydata_unsplit
Warning: 361926 het. haploid genotypes present (see mydata_unsplit.hh ); many commands treat these as missing.

# Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands treat these as missing.

NOTE: Plink prompts above warnings until the last command below.


 plink --bfile mydata_unsplit --split-x b38 --make-bed --out mydata_split
Error: No X chromosome loci have bp positions <= 2781479 or >= 155701383.

Checking the heads and tails of raw_hg38_unsplit.bim confirmed the above error, i.e., the first bp position is 23: 2781927, and the last bp position is 24: 56734789. So I combined ‘--split-x’ with ‘no-fail’ to force the split processing.

plink --bfile mydata_unsplit --split-x 'no-fail' b38 --make-bed --out mydata_split

plink --bfile mydata_split --check-sex --out sex_check

# 25046 Xchr and 0 Ychr variant(s) scanned, 75 problems detected. Report written to sex_check.sexcheck.

I also tried to prune the data and extract those variants to do sex checkings
plink --bfile mydata_split --indep-pairphase 20000 2000 0.5 --chr 23-24

plink --bfile mydata_split --extract plink.prune.in --make-bed --out rmydata_split_pruned_xy

plink --bfile mydata_split_pruned_xy --check-sex

# 11105 Xchr and 0 Ychr variant(s) scanned, 122 problems detected. Report written to plink.sexcheck

I tried the following code, but it tested 4000 more samples tagged with PROBLEM.
plink --bfile raw_hg38_split_pruned_xy --check-sex ycount

Following is an example of the content of the sex.probs file (R plot attached for the F distributions)

sample1    sample1    1    0    PROBLEM    0.7427
sample2    sample2    2    0    PROBLEM    0.5244
sample3    sample3    1    0    PROBLEM    0.6958
sample4    sample4    2    0    PROBLEM    0.2991

I would appreciate your thoughts!

Best,
Mia

sex_check_F.png

Christopher Chang

unread,
Jun 17, 2020, 9:47:57 AM6/17/20
to plink2-users
See the last bullet point at https://www.cog-genomics.org/plink/1.9/basic_stats#check_sex .  The default 0.2 and 0.8 thresholds aren't quite right for most datasets; it looks like most of your "PROBLEM" cases will go away if you revise them appropriately.

Meanwhile, it looks like the actual cause of your "heterozygous haploid", etc. warnings is that your variant-calling workflow doesn't take sex into account.  Once you're done with --check-sex, you can use --set-hh-missing + --make-bed to erase all of the known-bad variant calls.

Mia

unread,
Jun 18, 2020, 7:26:12 PM6/18/20
to plink2-users


Thank you Chang! I changed the parameter and it improved heaps lot. Of those remained samples, two have inconsistent sex, and a very few tagged with PROBLEM, but they have consistent sex info between our manifest and fam files.

Also thank you for the '--set--hh-missing' advice.

Best,
Reply all
Reply to author
Forward
0 new messages