Sex check F value changes with/without FAM file

596 views
Skip to first unread message

Tatiana Orme

unread,
Aug 23, 2017, 1:51:04 PM8/23/17
to plink2-users

Hi, just wondering if anybody could help me please.

I am running PLINK --sex-check and am having problems with different F values depending on whether or not I use a fam file.

(I am using PLINK 1.9 and have followed normal steps for sexcheck e.g removed pseudoautosomal SNPs, pruned the data, ran the sexcheck and looked at the F distribution (R plot attached for the sexcheck plot without the fam file). From here I used a threshold of 0.6 to rerun the sexcheck, this time with my fam file which has genders for some samples. The problem is when I run the same command with the fam file, the F values for some samples change dramatically, e.g:

Sample 1: without fam F=0.9642, with fam F=-0.006384
Sample 2: without fam F=0.8793, with fam F=-0.06814
Sample 3: without fam F=0.9665, with fam F=0.6925
Sample 4: without fam F=0.002855, with fam F=0.9376

I have tried different things to see what makes this change (using different thresholds,or with allow-no-sex so all phenos would be present in case that affected it), but the thing that is making the difference in F value is the presence/absence of the fam file - which in turn changes how many variants are scanned. I can't work out why this occurs though because I would have thought calculating the F value would be independent of the fam file gender and pheno - sorry if I have missed something obvious! Which F values do you think are more accurate and I should proceed with?

Any insight would be much appreciated,
Many thanks
Tatiana

P.s this group has been very helpful!

without fam:
Options in effect:
  --bfile <my bfile>
  --check-sex 0.6 0.6
  --out <out>

12512 variants loaded from .bim file.
2670 people (0 males, 0 females, 2670 ambiguous) loaded from .fam.

Before main variant filters, 2670 founders and 0 nonfounders present.
Warning: Nonmissing nonmale Y chromosome genotype(s) present
Total genotyping rate is 0.963379.
12512 variants and 2670 people pass filters and QC.
Note: No phenotypes present.
--check-sex: 11787 Xchr and 0 Ychr variant(s) scanned, 2670 problems detected.

.
with fam:
Options in effect:
  --allow-no-sex
  --bfile <my file>
  --check-sex 0.6 0.6
  --fam <fam_file>
  --out <out>

12512 variants loaded from .bim file.
2670 people (905 males, 602 females, 1163 ambiguous) loaded from .fam.

2670 phenotype values loaded from .fam.

Before main variant filters, 2670 founders and 0 nonfounders present.

Warning: 90532 het. haploid genotypes present
Warning: Nonmissing nonmale Y chromosome genotype(s) present
Total genotyping rate is 0.96338.
12512 variants and 2670 people pass filters and QC.
Among remaining phenotypes, 1181 are cases and 1489 are controls.
--check-sex: 10350 Xchr and 0 Ychr variant(s) scanned, 1887 problems detected.

sexcheck_PRUNED_hiqhqualvcf.png

Christopher Chang

unread,
Aug 23, 2017, 2:24:07 PM8/23/17
to plink2-users
--check-sex needs decent population allele frequency estimates.  Ideally, you have an allele frequency file (generated by e.g. --freq applied to some reference dataset) and load it during your --check-sex runs.

Otherwise, the frequencies are estimated directly from your data (this is why the .fam file matters: it affects chrX estimates since males have half the weight and heterozygous male observations are thrown out), and below a minor allele frequency of ~2%, the estimates become untrustworthy if you have only 2670 samples.  I'd expect your results to improve if you add "--maf 0.02".
Reply all
Reply to author
Forward
0 new messages