Using rm-dup {mode} on PLINK 2

1,626 views
Skip to first unread message

Monica I

unread,
Sep 11, 2018, 2:25:55 PM9/11/18
to plink2-users
I have several .bgen file which I need to turn into .pgen, .psam, and .pvar files, but these.bgen file include several duplicate SNP variants (each with different reference alleles and the same alternative allele). I want to exclude all versions of the duplicates and save a list of the excluded duplicates in a separate file. I tried the following:

./plink2 --bgen [filename] --sample [filename] --maf [maf] --make-pgen --rm-dup exclude-all --out [filename]

Two problems with this. 1) Even if all the duplicates get removed, how do I know which ones have been removed? I need some sort of list, in a new file.. 2) This did not work because the system did not recognize the --rm-dup flag. I tried to look it up using the help command, but was unable to find it.

With PLINK 1.9 it would have been possible to first use the option --list-duplicate-vars (which creates a file plink.dupvar) and then run another command with the option --exclude plink.dupvar
However, with PLINK1.9 I am unable to use .bgen files.

Is there a way to accomplish this with PLINK2? Any suggestions?


The following is from the online manual for PLINK 2:

Deduplicate variants

--rm-dup {mode}

--rm-dup usually removes all but one instance of each duplicate-ID variant (ignoring the missing ID). The five modes are:

  • 'error' (default): Check each group of duplicate-ID variants for equality. (Alleles are considered unequal even if the codes are the same, just in a different order; FILTER/INFO are considered unequal if the strings don't match exactly, even if they're semantically identical.) If any mismatches are found, this errors out, and writes a list of mismatching variant IDs to plink2.rmdup.mismatch.
  • 'retain-mismatch': When unequal duplicate-ID variants are found, keep every member of the group. The .rmdup.mismatch file is still written.
  • 'exclude-mismatch': When unequal duplicate-ID variants are found, exclude every member of the group.
  • 'exclude-all': Exclude all instances of all duplicate-ID variants.
  • 'force-first': Always keep just the first instance of each duplicate-ID variant.


Christopher Chang

unread,
Sep 11, 2018, 2:30:30 PM9/11/18
to plink2-users
--rm-dup was literally added a few days ago, so you'll need to download a newer plink2 binary.  Meanwhile, I'll go ahead and add another command today to just list duplicated IDs (so if you need that, you can wait till tonight to update).

Christopher Chang

unread,
Sep 11, 2018, 5:11:44 PM9/11/18
to plink2-users
New build is posted, where --rm-dup has a 'list' modifier for writing the original duplicated IDs.


On Tuesday, September 11, 2018 at 11:25:55 AM UTC-7, Monica I wrote:
Reply all
Reply to author
Forward
0 new messages