--check-sex always gives F of -1 when run on a single sample GATK VCF

202 views
Skip to first unread message

Aaron Statham

unread,
Jan 20, 2016, 11:29:23 PM1/20/16
to plink2-users
Hi,

I've been trying to integrate PLINK v1.90b3.29 64-bit (24 Dec 2015) into our workflow for checking the sex of VCFs from whole genome sequencing. Today I came across what looks like a bug (but could just be me not understand how plink works) where a single sample GATK VCF always gives an F of -1, whereas those same samples if joint-called give much better Fs which align with the sex we expect for these samples.

Here are the results for 3 samples (A, B, C) called separately by GATK GenotypeGVCFs (then VQSR) compared to being jointcalled together into a single JointCall VCF. Samples A and C are female, sample B is male.

for VCF in A B C JointCall; do
    ./plink --vcf $VCF.vcf.gz --out $VCF --double-id --make-bed --allow-extra-chr
    ./plink --bfile $VCF --check-sex ycount --out $VCF --allow-extra-chr
done

head *.sexcheck

==> A.sexcheck <==
           FID            IID       PEDSEX       SNPSEX       STATUS            F   YCOUNT
  SYD-A   SYD-A            0            0      PROBLEM           -1     4084

==> B.sexcheck <==
           FID            IID       PEDSEX       SNPSEX       STATUS            F   YCOUNT
  SYD-B   SYD-B            0            0      PROBLEM           -1     8770

==> C.sexcheck <==
           FID            IID       PEDSEX       SNPSEX       STATUS            F   YCOUNT
  SYD-C   SYD-C            0            0      PROBLEM           -1     4052

==> JointCall.sexcheck <==
           FID            IID       PEDSEX       SNPSEX       STATUS            F   YCOUNT
  SYD-A   SYD-A            0            0      PROBLEM      -0.5933     5041
  SYD-B   SYD-B            0            0      PROBLEM       0.7975    11157
  SYD-C   SYD-C            0            0      PROBLEM      -0.5208     4906


Any help most welcome!
Aaron

Christopher Chang

unread,
Jan 21, 2016, 1:58:35 AM1/21/16
to plink2-users
--check-sex requires decent allele frequency estimates.  If you have a single-sample dataset, it's essential to obtain these allele frequencies from another source and load them with --read-freq.

Aaron Statham

unread,
Jan 21, 2016, 6:41:18 AM1/21/16
to plink2-users
Thanks for the speedy reply!

So I created a frqx file from a 80 sample Joint Called WGS VCF, then reran plink --check-sex with the --freq parameter. At first it complained about duplicate IDs (.), so I removed all rows from the VCF with . as its ID, then reran.

./plink --bfile A --check-sex ycount --out A --allow-extra-chr --read-freq warehouse_1.0.frqx
PLINK v1.90b3.29 64-bit (24 Dec 2015)      https://www.cog-genomics.org/plink2
(C) 2005-2015 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to A.log.
Options in effect:
  --allow-extra-chr
  --bfile A
  --check-sex ycount
  --out A
  --read-freq warehouse_1.0.filtered.frqx

7450 MB RAM detected; reserving 3725 MB for main workspace.
4461279 variants loaded from .bim file.
1 person (0 males, 0 females, 1 ambiguous) loaded from .fam.
Ambiguous sex ID written to A.nosex .
Using 1 thread (no multithreaded calculations invoked.
Before main variant filters, 1 founder and 0 nonfounders present.
Calculating allele frequencies... done.
Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands
treat these as missing.
Total genotyping rate is 0.993504.
Error: Allele(s) on line 31 of --read-freq file don't match loaded
values.

Do I need to subset the VCF to only SNP IDs found in the frqx and ensure they are in the same order?

Thanks!
Aaron

Christopher Chang

unread,
Jan 21, 2016, 10:13:54 AM1/21/16
to plink2-users
Allele order should be unimportant, but you do need to remove triallelic variants (where the joint called file has two alleles, but your sample contains a third allele absent from the joint called file) and the like for now.

Aaron Statham

unread,
Jan 21, 2016, 6:24:03 PM1/21/16
to plink2-users
Any tips for how I could do that? I'd be starting with the single sample VCF and frqx - is it something I could do at the .bed stage with plink or would I need to filter at the VCF stage

I've already pruned down to only SNPs that have rs IDs, so perhaps I can just remove any that dbSNP says are triallelic from both the frqx and VCF

thanks!

Christopher Chang

unread,
Jan 21, 2016, 6:25:06 PM1/21/16
to plink2-users
Attempting to --bmerge the two filesets should produce a list of triallelic SNPs.
Reply all
Reply to author
Forward
0 new messages