Error: --pmerge[-list] is under development.

1,899 views
Skip to first unread message

Peiyuan Zhu

unread,
Apr 5, 2021, 2:46:10 AM4/5/21
to plink2-users

Hi Christopher, I have tried using a txt file to contain the file names, which I suppose is what is meant by the documentation. However, I still get the error saying this function is under development, which I suppose this won't happen because I'm already using the latest version. 

[zhupy@blg4110 zhupy]$ cat ukb_c.txt

ukb_c1

ukb_c2

ukb_c3

ukb_c4

ukb_c5

ukb_c6

ukb_c7

ukb_c8

ukb_c9

ukb_c10

ukb_c11

ukb_c12

ukb_c13

ukb_c14

ukb_c15

ukb_c16

ukb_c17

ukb_c18

ukb_c19

ukb_c20

ukb_c21

ukb_c22

ukb_cX

ukb_cXY


[zhupy@blg4110 zhupy]$ ./plink2 --pmerge-list ukb_c.txt --pmerge-list-dir . --out ukb_imp_genom --multiallelics-already-joined

PLINK v2.00a3LM 64-bit Intel (28 Mar 2021)     www.cog-genomics.org/plink/2.0/

(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to ukb_imp_genom.log.

Options in effect:

  --multiallelics-already-joined

  --out ukb_imp_genom

  --pmerge-list ukb_c.txt

  --pmerge-list-dir .


Start time: Mon Apr  5 02:39:07 2021

192029 MiB RAM detected; reserving 96014 MiB for main workspace.

Using up to 40 threads (change this with --threads).

--pmerge-list: 24 filesets specified.

--pmerge-list: 487409 samples present.

--pmerge-list: Merged .psam written to ukb_imp_genom.psam .

--pmerge-list: 24 .pvar files scanned.

Error: --pmerge[-list] is under development.

End time: Mon Apr  5 02:39:40 2021

Christopher Chang

unread,
Apr 5, 2021, 1:40:24 PM4/5/21
to plink2-users
0. Why on earth is "--multiallelics-already-joined" in this command line?
1. The very first line of the --pmerge[-list] online documentation is "(Only handles concatenation-like jobs for now.)" in red.  Take a closer look at the last two filesets in your list.

Peiyuan Zhu

unread,
Apr 5, 2021, 1:53:34 PM4/5/21
to plink2-users
Does concatenation-like jobs means the same sample ID should present in all files? The --multiallelics-already-joined flag comes from an error message below. 

I tried removing the last two filesets from X and XY chromosome so I have 24-2=22 filesets overall but still doesn't work. 

[zhupy@blg4110 zhupy]$ ./plink2 --pmerge-list ukb_c.txt --pmerge-list-dir . --out ukb_c

PLINK v2.00a3LM 64-bit Intel (28 Mar 2021)     www.cog-genomics.org/plink/2.0/

(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to ukb_c.log.

Options in effect:

  --out ukb_c

  --pmerge-list ukb_c.txt

  --pmerge-list-dir .


Start time: Mon Apr  5 13:46:01 2021

192029 MiB RAM detected; reserving 96014 MiB for main workspace.

Using up to 40 threads (change this with --threads).

--pmerge-list: 22 filesets specified.

--pmerge-list: 487409 samples present.

--pmerge-list: Merged .psam written to ukb_c.psam .

Error: The biallelic variants with ID 'rs151120166' at position 1:715142 in

./ukb_c1.pvar appear to be the components of a 'split' multiallelic variant; if

so, it must be 'joined' (with e.g. "bcftools norm -m") before a correct merge

can occur. If you are SURE that your data does not contain any same-position

same-ID variant groups that should be joined, you can suppress this error with

--multiallelics-already-joined.

End time: Mon Apr  5 13:46:29 2021

[zhupy@blg4110 zhupy]$ ./plink2 --pmerge-list ukb_c.txt --pmerge-list-dir . --out ukb_c --multiallelics-already-joined

PLINK v2.00a3LM 64-bit Intel (28 Mar 2021)     www.cog-genomics.org/plink/2.0/

(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to ukb_c.log.

Options in effect:

  --multiallelics-already-joined

  --out ukb_c

  --pmerge-list ukb_c.txt

  --pmerge-list-dir .


Start time: Mon Apr  5 13:46:42 2021

192029 MiB RAM detected; reserving 96014 MiB for main workspace.

Using up to 40 threads (change this with --threads).

--pmerge-list: 22 filesets specified.

--pmerge-list: 487409 samples present.

--pmerge-list: Merged .psam written to ukb_c.psam .

--pmerge-list: 22 .pvar files scanned.

Concatenation job detected.

Concatenating... 0/92775306 variants complete.Error: --pmerge[-list] multiallelic-variant dosage support is under development.


End time: Mon Apr  5 13:47:17 2021

Peiyuan Zhu

unread,
Apr 5, 2021, 2:13:37 PM4/5/21
to plink2-users

The ids are duplicated for multialletic variants. 


> ukb_c1 %>% filter(ID=="rs151120166")

# A tibble: 2 x 5

  `#CHROM`    POS ID          REF   ALT  

     <dbl>  <dbl> <chr>       <chr> <chr>

1        1 715142 rs151120166 G     A    

2        1 715142 rs151120166 G     T    

Christopher Chang

unread,
Apr 5, 2021, 2:17:26 PM4/5/21
to plink2-users
1. Sample IDs actually aren't required to match.  Instead, the current requirement is that the filesets don't cover overlapping position ranges.  If ukb_cX covers the middle of chrX, and ukb_cXY covers the beginning and end of chrX, that doesn't work for now.  If that is the case, you might be able to address it by splitting ukb_cXY into e.g. "ukb_PAR1" and "ukb_PAR2" parts, and changing the bottom of your list to

  ukb_PAR1
  ukb_cX
  ukb_PAR2

2. However, the real issue is that your data *does* contain split variants.  You are LYING to plink2 with --multiallelics-already-joined, and are LUCKY that your run with that flag happened to fail for another reason; otherwise, you would have destroyed every affected variant.

Eventually, the right thing to do will be (i) join your variants with e.g. "bcftools norm -m +" (there will be an equivalent plink2 function fairly soon), and then (ii) use either "bcftools concat" or --pmerge-list on the results.  However, this requires "multiallelic dosage" functionality that has not been built out yet.

For now, the simplest solution is to use --set-all-var-ids to assign unique IDs to each part of a split variant.

Peiyuan Zhu

unread,
Apr 5, 2021, 4:27:51 PM4/5/21
to plink2-users
1. What is the best way of doing the splitting of Chromosome XY?

2. I think this should be done outside of the pmerge command. So I will be doing this when I am converting bgen to pgen. I always have this problem of not knowing which two arguments can be combined into one run. For example, in this case it wouldn't help if I do "./plink2 --set-all-var-ids @:#[b19]\$r,\$a --pmerge-list ukb_c.txt --pmerge-list-dir . --out ukb_c --multiallelics-already-joined". But sometimes arguments can be combined like I can do --max-maf with glm or do  --max-maf to prune and then use the pruned file in glm. I'm just trying to understand how such distinction can be made

Christopher Chang

unread,
Apr 5, 2021, 5:20:58 PM4/5/21
to plink2-users
Why is --multiallelics-already-joined still in that command line?!

I'll wait until next week to respond to the other questions.  You need to spend more time understanding the answers I've already given.

Peiyuan Zhu

unread,
Apr 5, 2021, 7:07:23 PM4/5/21
to plink2-users
I was assuming  --set-all-var-ids @:#[b19]\$r,\$a is done before  --multiallelics-already-joined because I put that argument first. This is part of the question that is I'm not sure when can we combine arguments into one run. 

Christopher Chang

unread,
Apr 6, 2021, 12:12:17 PM4/6/21
to plink2-users
Please reread what I have written about --multiallelics-already-joined.
Reply all
Reply to author
Forward
0 new messages