plink2 --bcf /home/bm1173/ancestry_analyses/references/reference_normalized.bcf \
--const-fid \
--allow-extra-chr \
--chr 1-22 \
--make-bed \
--pheno /home/bm1173/ancestry_analyses/references/phenotype_ordered.txt \
--out /home/bm1173/ancestry_analyses/references/reference_dataset
I am getting the following error:
PLINK v2.00a5.11LM AVX2 Intel (26 May 2024) www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to /home/bm1173/ancestry_analyses/references/reference_dataset.log.
Options in effect:
--allow-extra-chr
--bcf /home/bm1173/ancestry_analyses/references/reference_normalized.bcf
--chr 1-22
--const-fid
--make-bed
--out /home/bm1173/ancestry_analyses/references/reference_dataset
--pheno /home/bm1173/ancestry_analyses/references/phenotype_ordered.txt
Start time: Thu Jul 11 10:53:41 2024
241779 MiB RAM detected, ~237945 available; reserving 120889 MiB for main
workspace.
Using 1 compute thread.
--bcf: 81708153 variants scanned.
--bcf: 81657k variants converted.
/home/bm1173/ancestry_analyses/references/reference_dataset-temporary.pgen +
/home/bm1173/ancestry_analyses/references/reference_dataset-temporary.pvar.zst
+ /home/bm1173/ancestry_analyses/references/reference_dataset-temporary.psam
written.
2504 samples (0 females, 0 males, 2504 ambiguous; 2504 founders) loaded from
/home/bm1173/ancestry_analyses/references/reference_dataset-temporary.psam.
81708153 variants loaded from
/home/bm1173/ancestry_analyses/references/reference_dataset-temporary.pvar.zst.
Error: No entries in /home/bm1173/ancestry_analyses/references/phenotype_ordered.txt correspond to loaded sample IDs.
End time: Thu Jul 11 11:11:51 2024
I created a file called sample_ids.txt from the reference_normalized.bcf to check if the names corresond. Below is the head and tail of 10 elements in the files:
(base)$ tail -n 10 phenotype_ordered.txt
NA21128 NA21128 GIH
NA21129 NA21129 GIH
NA21130 NA21130 GIH
NA21133 NA21133 GIH
NA21135 NA21135 GIH
NA21137 NA21137 GIH
NA21141 NA21141 GIH
NA21142 NA21142 GIH
NA21143 NA21143 GIH
NA21144 NA21144 GIH
(base)$ tail -n 10 sample_ids.txt
NA21128
NA21129
NA21130
NA21133
NA21135
NA21137
NA21141
NA21142
NA21143
NA21144
(base)$ head -n 10 phenotype_ordered.txt
HG00096 HG00096 GBR
HG00097 HG00097 GBR
HG00099 HG00099 GBR
HG00100 HG00100 GBR
HG00101 HG00101 GBR
HG00102 HG00102 GBR
HG00103 HG00103 GBR
HG00105 HG00105 GBR
HG00106 HG00106 GBR
HG00107 HG00107 GBR
(base)$ head -n 10 sample_ids.txt
HG00096
HG00097
HG00099
HG00100
HG00101
HG00102
HG00103
HG00105
HG00106
HG00107
The entries correspond to each other - and am not sure why I am getting this error. I tried several suggestions including making sure the order of the samples is consistent in both files, and checking the separator for the names. I'd be grateful for your assistance in getting past this error.
Thank you in advance .
Batsi
plink2 --bcf /home/bm1173/ancestry_analyses/references/reference_normalized.bcf \
--double-id \
--allow-extra-chr \
--chr 1-22 \
--make-bed \
--pheno /home/bm1173/ancestry_analyses/references/phenotype_ordered.txt \
--out /home/bm1173/ancestry_analyses/references/reference_dataset
I am now having challenges merging my sample and reference files. I am able to generate the psam file but its not generating the pvar file. I am running this command:
plink2 --pmerge-list /home/bm1173/ancestry_analyses/references/merge_list.txt --make-bed --out /home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/merged_dataset
I am getting this error:
PLINK v2.00a5.11LM AVX2 Intel (26 May 2024) www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to /home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/merged_dataset.log.
Options in effect:
--make-bed
--out /home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/merged_dataset
--pmerge-list /home/bm1173/ancestry_analyses/references/merge_list.txt
Start time: Thu Jul 11 15:25:40 2024
241779 MiB RAM detected, ~237613 available; reserving 120889 MiB for main
workspace.
Using 1 compute thread.
--pmerge-list: 2 filesets specified.
--pmerge-list: 2505 samples and 1 phenotype present.
--pmerge-list: Merged .psam written to
/home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/merged_dataset.psam
.
--pmerge-list: 2 .pvar files scanned.
Error: Non-concatenating --pmerge[-list] is under development.
End time: Thu Jul 11 15:25:45 2024
The merge_list.txt looks like this:
/home/bm1173/ancestry_analyses/references/reference_dataset.bed /home/bm1173/ancestry_analyses/references/reference_dataset.bim /home/bm1173/ancestry_analyses/references/reference_dataset.fam
/home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/sample1.plink_makebed.bed /home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/sample1.plink_makebed.bim /home/bm1173/ancestry_analyses/trial2/results/variant_calling/sample1/sample1.plink_makebed.fam
--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/a7f0f2a2-da8a-4825-b048-86813d4cfac3n%40googlegroups.com.
Batsirai Mabvakure, PhD
Assistant Professor, Oncology
Affiliate Member, Cancer Prevention and Control Program,
Georgetown Lombardi Comprehensive Cancer Center
Georgetown University School of Medicine,
2115 Wisconsin Ave NW
Washington DC
Email: bm1...@georgetown.edu