Hi,
I am using PLINK v2.00a3LM AVX2 Intel and I would like to calculate global pairwise fst between all sets of populations.
I have a set of .bim .bed and .fam which contains individuals from different populations. I also have a .pheno file which each line for an individual and the second column with the population id, e.g.:
#IID pop
Ami_Coriell_NA13615 Ami_Coriell
Ami_Coriell_NA13616 Ami_Coriell
Armenian_armenia86 Armenian_Armenian
Armenian_armenia91 Armenian_Armenian
Armenian_armenia102 Armenian_Armenian
I've run the command:
plink2 \
--bfile /cluster/project8/hellenthal/SamMorris/analysis/data_projects/modern/Africa/genotypedData/allAncients.africanReference.withFan.AllChr.PHASED2.highAccuracySNPs.pruned \
--fst pop \
--pheno /cluster/project8/hellenthal/SamMorris/analysis/data_projects/modern/Africa/genotypedData/allAncients.africanReference.withFan.AllChr.PHASED2.highAccuracySNPs.pruned.pheno
which returns:
> --bfile /cluster/project8/hellenthal/SamMorris/analysis/data_projects/modern/Africa/genotypedData/allAncients.africanReference.withFan.AllChr.PHASED2.highAccuracySNPs.pruned \
> --fst pop \
> --pheno /cluster/project8/hellenthal/SamMorris/analysis/data_projects/modern/Africa/genotypedData/allAncients.africanReference.withFan.AllChr.PHASED2.highAccuracySNPs.pruned.pheno
PLINK v2.00a3LM AVX2 Intel (23 Sep 2020)
www.cog-genomics.org/plink/2.0/(C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to plink2.log.
Options in effect:
--bfile /cluster/project8/hellenthal/SamMorris/analysis/data_projects/modern/Africa/genotypedData/allAncients.africanReference.withFan.AllChr.PHASED2.highAccuracySNPs.pruned
--fst pop
--pheno /cluster/project8/hellenthal/SamMorris/analysis/data_projects/modern/Africa/genotypedData/allAncients.africanReference.withFan.AllChr.PHASED2.highAccuracySNPs.pruned.pheno
Start time: Thu Oct 15 10:59:39 2020
128648 MiB RAM detected; reserving 64324 MiB for main workspace.
Allocated 8586 MiB successfully, after larger attempt(s) failed.
Using up to 20 threads (change this with --threads).
6343 samples (0 females, 0 males, 6343 ambiguous; 6343 founders) loaded from
/cluster/project8/hellenthal/SamMorris/analysis/data_projects/modern/Africa/genotypedData/allAncients.africanReference.withFan.AllChr.PHASED2.highAccuracySNPs.pruned.fam.
250846 variants loaded from
/cluster/project8/hellenthal/SamMorris/analysis/data_projects/modern/Africa/genotypedData/allAncients.africanReference.withFan.AllChr.PHASED2.highAccuracySNPs.pruned.bim.
2 categorical phenotypes loaded.
End time: Thu Oct 15 10:59:39 2020
However, I have many more than 2 (~200) populations, rather than the 2 detailed in the output. I think this is the wrong way to do it, but I am not sure which is the correct way. I think I am probably specifying the phenotype (population) / file incorrectly. Could you suggest the best way to do this. Thank you.