plink 2 : error : howto : merge plink files to pfile, convert to bed

773 views
Skip to first unread message

Vaibhav Sharma

unread,
Aug 14, 2021, 12:15:26 PM8/14/21
to plink2-users
Hi Team,

I want to merge plink bed files created from vcf file ...

I run the below command ... 

~/tools/plink2/plink2  --pmerge-list merge_list_hard_vc.txt 'bfile'  \

--set-missing-var-ids @:#:\$r:\$a \

--make-bed \

--out all_chrm_hardvc_2


merge_list_hard_vc.txt contains list(prefix) of 23 files ... from chromosome 1-22 and X 


I get below status/error 


--pmerge-list: 23 filesets specified.

--pmerge-list: 113456 samples and 1 phenotype present.

--pmerge-list: Merged .psam written to all_chrm_hardvc_2.psam .

--pmerge-list: 23 .pvar files scanned.

Concatenation job detected.

Concatenating...

../plink_files/QC_c19_all_v1.vcf.gz.bgz.bim

could not be scanned twice. (Process-substitution/named-pipe input is not

permitted in this use case.)


and thus when I try to convert them to bed 

PLINK v2.00a3LM AVX2 Intel (4 Aug 2021)

Options in effect:

  --make-bed

  --out all_chrm_hardvc_2_bed

  --pfile all_chrm_hardvc_2


I get error ( expected)

113456 samples (0 females, 0 males, 113456 ambiguous; 113456 founders) loaded

from all_chrm_hardvc_2.psam.

13146944 variants loaded from all_chrm_hardvc_2.pvar.

Error: PgfiInitPhase1() was called with raw_variant_ct == 13146944, but

all_chrm_hardvc_2.pgen contains 15297359 variants.


what can I do to ensure that the first step works properly ... I don't have repeated files in my merge_list_hard_vc.txt, and all files/data exists for the 23 bed files mentioned in list ...


Thanks,

vaibhav



Christopher Chang

unread,
Aug 14, 2021, 12:22:07 PM8/14/21
to plink2-users
Can you post the exact contents of merge_list_hard_vc.txt ?

Vaibhav Sharma

unread,
Aug 14, 2021, 12:22:35 PM8/14/21
to plink2-users
Comment from Christopher in another time thread : The most important difference right now is that --pmerge-list only handles concatenation jobs (e.g. merging a split-by-chromosome dataset) for now.  If there are any variants shared between multiple input files, --pmerge-list doesn't work yet.

So question would be I guess --- how to handle duplicates properly in plink1.9 ?

Vaibhav Sharma

unread,
Aug 14, 2021, 12:29:32 PM8/14/21
to plink2-users

../plink_files/QC_c1_all_v1.vcf.gz.bgz

../plink_files/QC_c2_all_v1.vcf.gz.bgz

../plink_files/QC_c3_all_v1.vcf.gz.bgz

../plink_files/QC_c4_all_v1.vcf.gz.bgz

../plink_files/QC_c5_all_v1.vcf.gz.bgz

../plink_files/QC_c6_all_v1.vcf.gz.bgz

../plink_files/QC_c7_all_v1.vcf.gz.bgz

../plink_files/QC_c8_all_v1.vcf.gz.bgz

../plink_files/QC_c9_all_v1.vcf.gz.bgz

../plink_files/QC_c10_all_v1.vcf.gz.bgz

../plink_files/QC_c11_all_v1.vcf.gz.bgz

../plink_files/QC_c12_all_v1.vcf.gz.bgz

../plink_files/QC_c13_all_v1.vcf.gz.bgz

../plink_files/QC_c14_all_v1.vcf.gz.bgz

../plink_files/QC_c15_all_v1.vcf.gz.bgz

../plink_files/QC_c16_all_v1.vcf.gz.bgz

../plink_files/QC_c17_all_v1.vcf.gz.bgz

../plink_files/QC_c18_all_v1.vcf.gz.bgz

../plink_files/QC_c19_all_v1.vcf.gz.bgz

../plink_files/QC_c20_all_v1.vcf.gz.bgz

../plink_files/QC_c21_all_v1.vcf.gz.bgz

../plink_files/QC_c22_all_v1.vcf.gz.bgz

../plink_files/QC_cX_all_v1.vcf.gz.bgz

Vaibhav Sharma

unread,
Aug 14, 2021, 12:57:06 PM8/14/21
to plink2-users
each file has variants only from respective chromosome ... but can have duplicate IDs ... ( when I had used plink1.9 to merge it gave me warning about duplicate variants )

Please help Christopher !!!

Christopher Chang

unread,
Aug 14, 2021, 1:00:44 PM8/14/21
to plink2-users
You need to convert those files one-at-a-time to .pgen format before --pmerge-list can be used.  In the meantime, I will fix the misleading error message you got.

Vaibhav Sharma

unread,
Aug 14, 2021, 1:17:28 PM8/14/21
to plink2-users
thanks ... so the  'bfile'  flag won't do the trick ?

need to convert the files to pgen ... 

will do sir !!!

Christopher Chang

unread,
Aug 14, 2021, 1:31:30 PM8/14/21
to plink2-users
--pmerge-list works fine with plink .bed filesets, but the command to generate .bed filesets is --make-bed, not --bfile (which reads them).  .bed filesets practically always take more disk space than the equivalent .pgen fileset, so if you're using plink2 anyway you may as well convert to .pgen.

Vaibhav Sharma

unread,
Aug 14, 2021, 1:35:26 PM8/14/21
to plink2-users

~/tools/plink2/plink2  --pmerge-list merge_list_hard_vc.txt 'bfile'  \

--set-missing-var-ids @:#:\$r:\$a \

--make-bed \

--out all_chrm_hardvc_2


here the --make-bed flag didn't work ... and I was given when pgen file , and thus took the step again to get the bed file ... 


I was talking about the 'bfile' flag here 

--pmerge-list merge_list_hard_vc.txt 'bfile' 


regardless let me try the path you recommend ... and thanks a ton for all the help and support ... much much appreciated 

Christopher Chang

unread,
Aug 14, 2021, 1:37:53 PM8/14/21
to plink2-users
Okay, I took a closer look at this and I now have no idea what you are doing.  I am unable to reproduce the error you are seeing.  Your *input* filenames make no sense, so that implies that you did something weird earlier.  Unless you are able to send me a set of files and a command-line that lets me replicate what you are seeing, I probably can't provide any further help.

Vaibhav Sharma

unread,
Aug 14, 2021, 1:49:21 PM8/14/21
to plink2-users
Sorry for confusing you ...

let me try again ...please bear with me ... if this doesn't make sense ... feel free to ignore this query ... 
 
I have bed files converted from bgz files ( thus they have prefix as plink_files/QC_cX_all_v1.vcf.gz.bgz  .. but I have respective files with matching prefix QC_cX_all_v1.vcf.gz.bgz.bed, QC_cX_all_v1.vcf.gz.bgz.bim, QC_cX_all_v1.vcf.gz.bgz.fam files .. all converted via plink1.9 )

I want to merge these separate bed files into single bed file 

I create a list of all prefix, in sorted order 

I read the document and it said I need to pass 'bfile' 

~/tools/plink2/plink2  --pmerge-list merge_list_hard_vc.txt 'bfile'  \

--set-missing-var-ids @:#:\$r:\$a \

--make-bed \

--out all_chrm_hardvc_2


this step didn't create bed file ... even though I used --make-bed 

may be beach it failed during the concat stage ... 

./plink_files/QC_c19_all_v1.vcf.gz.bgz.bim

could not be scanned twice. (Process-substitution/named-pipe input is not

permitted in this use case.)


I will try to convert my bed files QC_cX_all_v1.vcf.gz.bgz.bed etc. to pgen and try again ... 

Is there something wrong with the steps as such ?

Christopher Chang

unread,
Aug 14, 2021, 2:14:18 PM8/14/21
to plink2-users
I repeat, there is something weird about the contents of the input files to your initial command.  I can provide no further help until you post those files to e.g. Google Drive/Dropbox/etc.

Christopher Chang

unread,
Aug 14, 2021, 2:20:29 PM8/14/21
to plink2-users
It is okay to post only the two smallest filesets, as long as that is enough to reproduce the exact "could not be scanned twice" error you are seeing.

Vaibhav Sharma

unread,
Aug 15, 2021, 11:14:23 AM8/15/21
to plink2-users
Hi Christopher ... thanks you but I don't have rights to share the file ... I really wish I could 

I had managed to merge the bed files but as I was getting the warning for duplicate variants, I thought I should try plink2 and use $r,$a and thus this mail thread ... 

yesterday I tried converting the vcf file ( bzgipped ) to pfile format via plink2.. and I get a lot less subjects ( like 24k) ... when I convert the same file to bed via plink1.9 I see the all the samples (113k) 
Reply all
Reply to author
Forward
0 new messages