Hello,
I recently have been working over the bfile in plink v1.9. I have been receiving warning #1. het. haploid genotypes... and #2. Nonmissing nonmale Y chromosome genotypes...when I did sex checkings of the dataset.
My question is:
- How am I supposed to deal with those warnings when doing GWAS QC?
- Heaps of samples tagged with "PROBLEM". I was wondering what the issue here is likely to be in your opinion, please? And how should I do to address the issue?
plink --bfile mydata -chr 23-24 --make-bed --out mydata_unsplit
# Warning: 361926 het. haploid genotypes present (see mydata_unsplit.hh ); many commands treat these as missing.
# Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands treat these as missing.
NOTE: Plink prompts above warnings until the last command below.
plink --bfile mydata_unsplit --split-x b38 --make-bed --out mydata_split
# Error: No X chromosome loci have bp positions <= 2781479 or >= 155701383.
Checking the heads and tails of raw_hg38_unsplit.bim confirmed the above error, i.e., the first bp position is 23: 2781927, and the last bp position is 24: 56734789. So I combined ‘--split-x’ with ‘no-fail’ to force the split processing.
plink --bfile mydata_unsplit --split-x 'no-fail' b38 --make-bed --out mydata_split
plink --bfile mydata_split --check-sex --out sex_check
# 25046 Xchr and 0 Ychr variant(s) scanned, 75 problems detected. Report written to sex_check.sexcheck.
I also tried to prune the data and extract those variants to do sex checkings
plink --bfile mydata_split --indep-pairphase 20000 2000 0.5 --chr 23-24
plink --bfile mydata_split --extract plink.prune.in --make-bed --out rmydata_split_pruned_xy
plink --bfile mydata_split_pruned_xy --check-sex
# 11105 Xchr and 0 Ychr variant(s) scanned, 122 problems detected. Report written to plink.sexcheck
I tried the following code, but it tested 4000 more samples tagged with PROBLEM.
plink --bfile raw_hg38_split_pruned_xy --check-sex ycount
Following is an example of the content of the sex.probs file (R plot attached for the F distributions)
sample1 sample1 1 0 PROBLEM 0.7427
sample2 sample2 2 0 PROBLEM 0.5244
sample3 sample3 1 0 PROBLEM 0.6958
sample4 sample4 2 0 PROBLEM 0.2991
I would appreciate your thoughts!
Best,
Mia