removing duplicate snps using --rm-dup

1,215 views
Skip to first unread message

eysha

unread,
Apr 8, 2021, 1:55:39 PM4/8/21
to plink2-users

Hi, I'm trying to remove duplicate snps from  my dataset by using plink2's --rm-dup


plink2 --bfile all_phase3 --memory 10000 --rm-dup force-first --make-bed --out all_phase3_removed_dup

PLINK v2.00a3 64-bit (17 Feb 2020)             www.cog-genomics.org/plink/2.0/

(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to all_phase3_removed_dup.log.

Options in effect:

  --bfile all_phase3

  --make-bed

  --memory 10000

  --out all_phase3_removed_dup

  --rm-dup force-first


Start time: Fri Apr  9 00:21:41 2021

12194 MiB RAM detected; reserving 10000 MiB for main workspace.

Using up to 4 compute threads.

2504 samples (1271 females, 1233 males; 2497 founders) loaded from

all_phase3.fam.

84358431 variants loaded from all_phase3.bim.

Note: No phenotype data present.

--rm-dup: 4114 duplicated IDs, 5610 variants removed.

Writing all_phase3_removed_dup.fam ... done.

Writing all_phase3_removed_dup.bim ... done.

Writing all_phase3_removed_dup.bed ... done.

End time: Fri Apr  9 00:36:26 2021


OK then, I try --flip command, but it give Duplicate ID: '.' which is a dot. Do I have to remove that dot?

plink1.9 --noweb --bfile all_phase3_removed_dup --extract dataplinkQCed.update.bim --allow-extra-chr --memory 10000 --flip dataplink.1000G.datasetmerged-merge.missnp --make-bed --out 1000G.dataplink.snps.flipped

PLINK v1.90b6.16 64-bit (17 Feb 2020)          www.cog-genomics.org/plink/1.9/

(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to 1000G.dataplink.snps.flipped.log.

Options in effect:

  --allow-extra-chr

  --bfile all_phase3_removed_dup

  --extract dataplinkQCed.update.bim

  --flip dataplink.1000G.datasetmerged-merge.missnp

  --make-bed

  --memory 10000

  --noweb

  --out 1000G.dataplink.snps.flipped


Note: --noweb has no effect since no web check is implemented yet.

12194 MB RAM detected; reserving 10000 MB for main workspace.

84352821 variants loaded from .bim file.

2504 people (1233 males, 1271 females) loaded from .fam.

Error: Duplicate ID '.'.


Christopher Chang

unread,
Apr 8, 2021, 1:57:50 PM4/8/21
to plink2-users
--rm-dup ignores missing IDs (".").  You probably want to assign unique IDs to those variants with e.g. --set-missing-var-ids before running --rm-dup.
Reply all
Reply to author
Forward
0 new messages