No entries in phenotype_ordered.txt correspond to loaded sample IDs

18 views
Skip to first unread message

Batsirai Mabvakure

unread,
Jul 11, 2024, 12:01:01 PM (5 days ago) Jul 11
to plink2-users
Hi everyone,

I have some challenges with running plink2 make-bed. I downloaded some vcf files (chromosome 1 - 22) from the 1000 genomes project, concatenated them and created a bcf. I also created a phenotypes file with the sample ID and the population group (see head and tail commands for the phenotypes_ordered). I am running make-bed as follows: 

plink2 --bcf /home/bm1173/ancestry_analyses/references/reference_normalized.bcf \

  --const-fid \

  --allow-extra-chr \

  --chr 1-22 \

  --make-bed \

  --pheno /home/bm1173/ancestry_analyses/references/phenotype_ordered.txt \

  --out /home/bm1173/ancestry_analyses/references/reference_dataset


I am getting the following error:


PLINK v2.00a5.11LM AVX2 Intel (26 May 2024)    www.cog-genomics.org/plink/2.0/

(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to /home/bm1173/ancestry_analyses/references/reference_dataset.log.

Options in effect:

  --allow-extra-chr

  --bcf /home/bm1173/ancestry_analyses/references/reference_normalized.bcf

  --chr 1-22

  --const-fid

  --make-bed

  --out /home/bm1173/ancestry_analyses/references/reference_dataset

  --pheno /home/bm1173/ancestry_analyses/references/phenotype_ordered.txt


Start time: Thu Jul 11 10:53:41 2024

241779 MiB RAM detected, ~237945 available; reserving 120889 MiB for main

workspace.

Using 1 compute thread.

--bcf: 81708153 variants scanned.

--bcf: 81657k variants converted. 

/home/bm1173/ancestry_analyses/references/reference_dataset-temporary.pgen +

/home/bm1173/ancestry_analyses/references/reference_dataset-temporary.pvar.zst

+ /home/bm1173/ancestry_analyses/references/reference_dataset-temporary.psam

written.

2504 samples (0 females, 0 males, 2504 ambiguous; 2504 founders) loaded from

/home/bm1173/ancestry_analyses/references/reference_dataset-temporary.psam.

81708153 variants loaded from

/home/bm1173/ancestry_analyses/references/reference_dataset-temporary.pvar.zst.

Error: No entries in /home/bm1173/ancestry_analyses/references/phenotype_ordered.txt correspond to loaded sample IDs.

End time: Thu Jul 11 11:11:51 2024


I created a file called sample_ids.txt from the reference_normalized.bcf to check if the names corresond. Below is the head and tail of 10 elements in the files:


(base)$ tail -n 10 phenotype_ordered.txt

NA21128 NA21128 GIH

NA21129 NA21129 GIH

NA21130 NA21130 GIH

NA21133 NA21133 GIH

NA21135 NA21135 GIH

NA21137 NA21137 GIH

NA21141 NA21141 GIH

NA21142 NA21142 GIH

NA21143 NA21143 GIH

NA21144 NA21144 GIH

(base)$ tail -n 10 sample_ids.txt

NA21128

NA21129

NA21130

NA21133

NA21135

NA21137

NA21141

NA21142

NA21143

NA21144

(base)$ head -n 10 phenotype_ordered.txt

HG00096 HG00096 GBR

HG00097 HG00097 GBR

HG00099 HG00099 GBR

HG00100 HG00100 GBR

HG00101 HG00101 GBR

HG00102 HG00102 GBR

HG00103 HG00103 GBR

HG00105 HG00105 GBR

HG00106 HG00106 GBR

HG00107 HG00107 GBR

(base)$ head -n 10 sample_ids.txt

HG00096

HG00097

HG00099

HG00100

HG00101

HG00102

HG00103

HG00105

HG00106

HG00107


The entries correspond to each other - and am not sure why I am getting this error. I tried several suggestions including making sure the order of the samples is consistent in both files, and checking the separator for the names. I'd be grateful for your assistance in getting past this error.  


Thank you in advance .


Batsi


Batsirai Mabvakure

unread,
Jul 11, 2024, 4:03:29 PM (5 days ago) Jul 11
to plink2-users
I managed to sort out this error by adding "--double-id" as follows:

plink2 --bcf /home/bm1173/ancestry_analyses/references/reference_normalized.bcf \

  --double-id \

  --allow-extra-chr \

  --chr 1-22 \

  --make-bed \

  --pheno /home/bm1173/ancestry_analyses/references/phenotype_ordered.txt \

  --out /home/bm1173/ancestry_analyses/references/reference_dataset


I am now having challenges merging my sample and reference files. I am able to generate the psam file but its not generating the pvar file. I am running this command:


plink2 --pmerge-list /home/bm1173/ancestry_analyses/references/merge_list.txt --make-bed --out /home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/merged_dataset


I am getting this error:


PLINK v2.00a5.11LM AVX2 Intel (26 May 2024)    www.cog-genomics.org/plink/2.0/

(C) 2005-2024 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to /home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/merged_dataset.log.

Options in effect:

  --make-bed

  --out /home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/merged_dataset

  --pmerge-list /home/bm1173/ancestry_analyses/references/merge_list.txt


Start time: Thu Jul 11 15:25:40 2024

241779 MiB RAM detected, ~237613 available; reserving 120889 MiB for main

workspace.

Using 1 compute thread.

--pmerge-list: 2 filesets specified.

--pmerge-list: 2505 samples and 1 phenotype present.

--pmerge-list: Merged .psam written to

/home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/merged_dataset.psam

.

--pmerge-list: 2 .pvar files scanned.

Error: Non-concatenating --pmerge[-list] is under development.

End time: Thu Jul 11 15:25:45 2024


The merge_list.txt looks like this:


/home/bm1173/ancestry_analyses/references/reference_dataset.bed /home/bm1173/ancestry_analyses/references/reference_dataset.bim /home/bm1173/ancestry_analyses/references/reference_dataset.fam 

/home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/sample1.plink_makebed.bed  /home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/sample1.plink_makebed.bim  /home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/sample1.plink_makebed.fam



--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/a7f0f2a2-da8a-4825-b048-86813d4cfac3n%40googlegroups.com.


--

Batsirai Mabvakure, PhD

Assistant Professor, Oncology

Affiliate Member, Cancer Prevention and Control Program,

Georgetown Lombardi Comprehensive Cancer Center

Georgetown University School of Medicine,

2115 Wisconsin Ave NW

Washington DC

Email: bm1...@georgetown.edu

Christopher Chang

unread,
Jul 12, 2024, 1:09:38 AM (5 days ago) Jul 12
to plink2-users
Implementation of --pmerge-list is not complete.  Continue using plink 1.x instead of plink 2.0 for this operation.
Reply all
Reply to author
Forward
0 new messages