Trouble specifying clusters using the --within flag (for FST)

375 views
Skip to first unread message

Lynne E

unread,
May 29, 2023, 2:17:48 AM5/29/23
to plink2-users
I am trying to calculate Weir and Cockerham’s FST across the five 1000 Genomes supercontinent populations using plink 1.9, with the following command:
 
plink --bfile chr22_1KGP --fst --within ALLpops.txt --out chr22_FST

(The chr22_1KGP files were previously converted from a 1000 Genomes Project VCF file that had been filtered down to only contain biallelic SNPs and a subset of unrelated individuals).

However, I am having trouble with the  --within flag. There should be five continental groups and 2460 individuals, but I keep getting the following output:

2460 people (0 males, 0 females, 2460 ambiguous) loaded from .fam.
--within: 4 clusters loaded, covering a total of 801 people.
Before main variant filters, 2460 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is exactly 1.
131702 variants and 2460 people pass filters and QC.
Note: No phenotypes present.
Writing --fst report (4 populations) to chr22_FST.fst ... done.
129931 markers with valid Fst estimates (1771 excluded).

If it helps, the file “ALLpops.txt” used by the --within flag was created in R and contains 3 columns: FID, IID, and cluster (continental groups), as shown below:

Screen Shot 2023-05-29 at 4.10.44 pm.png


I have checked that all 2460 individuals are there and separated into five groups. I also tried setting all FID values to 0 instead of matching the IID but got the same results. Additionally, I did not get any further using plink 2.0.

What would you advise me to do?


Thank you in advance,
Lynne

Chris Chang

unread,
May 29, 2023, 11:15:31 PM5/29/23
to Lynne E, plink2-users
The IDs in ALLpops.txt must be partially mismatched with those in chr22_1KGP.fam .  Both parts must match, it isn’t enough for the IIDs to be equal.

(Note that, with plink 2.0, this analysis can be performed directly on the files at 
https://www.cog-genomics.org/plink/2.0/resources#phase3_1kg ; the .psam file contains a SuperPop phenotype column with the labels you want.)

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/9930cd30-4b2b-4259-97e6-bf0052633cfan%40googlegroups.com.

Lynne E

unread,
Jun 4, 2023, 6:51:22 PM6/4/23
to plink2-users
It worked, thank you so much!

Just a couple quick questions:
1. What is an FST of NA indicating?
2. I know it's possible using the Hudson method in plink 2.0, but is there a way to calculate WC FST for chromosome X in plink 1.9?


Thank you in advance for your time,
Lynne

Christopher Chang

unread,
Jun 7, 2023, 3:21:45 PM6/7/23
to plink2-users
1. Insufficient data to perform the calculation.
2. plink 1.9 --fst also skips chrX.  If you're going to e.g. manipulate chromosome codes with "--output-chr 26" followed by e.g. "--chr-set 26" to pretend chrX is an autosome, just do it with plink 2.0 "method=wc".

Lynne E

unread,
Jun 8, 2023, 7:21:55 PM6/8/23
to plink2-users
Thank you for your help!
Reply all
Reply to author
Forward
0 new messages