I have several .bgen file which I need to turn into .pgen, .psam, and .pvar files, but these.bgen file include several duplicate SNP variants (each with different reference alleles and the same alternative allele). I want to exclude all versions of the duplicates and save a list of the excluded duplicates in a separate file. I tried the following:
./plink2 --bgen [filename] --sample [filename] --maf [maf] --make-pgen --rm-dup exclude-all --out [filename]
Two problems with this. 1) Even if all the duplicates get removed, how do I know which ones have been removed? I need some sort of list, in a new file.. 2) This did not work because the system did not recognize the --rm-dup flag. I tried to look it up using the help command, but was unable to find it.
With PLINK 1.9 it would have been possible to first use the option --list-duplicate-vars (which creates a file plink.dupvar) and then run another command with the option --exclude plink.dupvar
However, with PLINK1.9 I am unable to use .bgen files.
Is there a way to accomplish this with PLINK2? Any suggestions?
The following is from the online manual for PLINK 2:
--rm-dup {mode}
--rm-dup usually removes all but one instance of each duplicate-ID variant (ignoring the missing ID). The five modes are:
- 'error' (default): Check each group of duplicate-ID variants for equality. (Alleles are considered unequal even if the codes are the same, just in a different order; FILTER/INFO are considered unequal if the strings don't match exactly, even if they're semantically identical.) If any mismatches are found, this errors out, and writes a list of mismatching variant IDs to plink2.rmdup.mismatch.
- 'retain-mismatch': When unequal duplicate-ID variants are found, keep every member of the group. The .rmdup.mismatch file is still written.
- 'exclude-mismatch': When unequal duplicate-ID variants are found, exclude every member of the group.
- 'exclude-all': Exclude all instances of all duplicate-ID variants.
- 'force-first': Always keep just the first instance of each duplicate-ID variant.