--score file entries were skipped due to mismatching allele codes

17 views
Skip to first unread message

Hasuni

unread,
Nov 13, 2024, 8:29:17 PMNov 13
to plink2-users
Hello,

I am having a really hard time trouble shooting this error "--score file entries were skipped due to mismatching allele codes" when running PCA project. 

Context: I am running PCA on 1000g unrelated individuals, and then projecting related onto the unrelated space. The dataset is the same. Here are the commands:

plink2 --bfile merged_forpca_allchr \
--remove related_to_remove.txt \
--freq counts \
--memory 60000 \
--out tgp_ref_pcs \
--pca allele-wts \
--threads 10

plink2 --bfile merged_forpca_allchr \
--keep related_to_remove.txt \
--read-freq tgp_ref_pcs.acount \
--score tgp_ref_pcs.eigenvec.allele 2 5 header-read no-mean-imputation variance-standardize \
--score-col-nums 6-15 \
--out related_projection


LOG FILE for step 2

PLINK v2.0.0-a.5.15LM 64-bit Intel (7 Oct 2024)    cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to related_projection.log.
Options in effect:
  --bfile merged_forpca_allchr
  --keep related_to_remove.txt
  --out related_projection
  --read-freq tgp_ref_pcs.acount
  --score tgp_ref_pcs.eigenvec.allele 2 5 header-read no-mean-imputation variance-standardize
  --score-col-nums 6-15

Start time: Wed Nov 13 16:48:22 2024
385621 MiB RAM detected, ~323453 available; reserving 192810 MiB for main
workspace.
Using up to 72 threads (change this with --threads).
4091 samples (0 females, 0 males, 4091 ambiguous; 4091 founders) loaded from merged_forpca_allchr.fam.
282427 variants loaded from merged_forpca_allchr.bim.
Note: No phenotype data present.
--keep: 1300 samples remaining.
1300 samples (0 females, 0 males, 1300 ambiguous; 1300 founders) remaining
after main filters.
--read-freq: PLINK 2 --freq file detected.
--read-freq: Frequencies for 282427 variants loaded.
Warning: 564854 --score file entries were skipped due to mismatching allele
codes.
(Add the 'list-variants' modifier to see which variants were actually used for
scoring.)
Error: No valid variants in --score file.
End time: Wed Nov 13 16:48:22 2024
Wed Nov 13 16:48:22 PST 2024


I am very confused as to how I can have mismatching alleles when its the same dataset being used, and the only filter I am applying is removing individuals.

Appreciate any help here.

Thank you,

Hasuni

Hasuni

unread,
Nov 13, 2024, 8:33:35 PMNov 13
to plink2-users
I just ran these commands using a slightly older version of plink2 and the projection ran fine. So there must be an issue with plink2 download from early October.

Chris Chang

unread,
Nov 13, 2024, 8:44:48 PMNov 13
to Hasuni, plink2-users
See the current PCA projection documentation (
https://www.cog-genomics.org/plink/2.0/score#pca_project ).  Note the vcols= part of the —pca command.

The reason it is necessary with your data is that you’re still using the plink 1 file format (—bfile), which does not keep reliable track of REF/ALT alleles.  “—pca allele-wts” and many other commands now default to including a PROVISIONAL_REF? column in their output highlighting this ambiguity when it is relevant.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/9877e014-ccab-4706-b793-5429a95b6a18n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages