Duplicate ID issue

7,052 views
Skip to first unread message

bvers...@gmail.com

unread,
Apr 10, 2017, 5:05:52 PM4/10/17
to plink2-users
Dear all

I'm currently trying a set-based analysis (using Plink 1.9). I first removed duplicate IDs, but still I get the error that there are duplicate IDs afterwards... 
Can someone help? 

Kind regards

Bram 

./plink --bfile rel5_hg19_updatedid  --list-duplicate-vars --out ./duplicated_snps 
./plink --bfilerel5_hg19_updatedid --exclude ./duplicated_snps.dupvar --make-bed 
./plink --bfile plink --set-test --set Set_based.txt --mperm 1000 --logistic --ci 0.95  --out ./geneset

Then I get this in my terminal: 

Note: --mperm flag deprecated.  Use e.g. '--model mperm=[value]'.

Note: --set-test flag deprecated.  Use e.g. '--assoc perm set-test'.

8192 MB RAM detected; reserving 4096 MB for main workspace.

154858 variants loaded from .bim file.

3873 people (1820 males, 2053 females) loaded from .fam.

3873 phenotype values loaded from .fam.

Using up to 4 threads (change this with --threads).

Before main variant filters, 3873 founders and 0 nonfounders present.

Calculating allele frequencies... done.

Total genotyping rate is 0.986586.

Error: Duplicate ID 'rs165255'.


Christopher Chang

unread,
Apr 10, 2017, 5:39:25 PM4/10/17
to plink2-users
--list-duplicate-vars checks for duplicate {position + allele codes}.  To identify and remove duplicate IDs, you can use
"./plink --bfile rel5_hg19_updatedid --write-snplist --out ./all_snps" (writes all SNP IDs to all_snps.snplist, duplicates occur multiple times; this operation can also be performed with e.g. cut or awk)
"cat all_snps.snplist | sort | uniq -d > duplicated_snps.snplist" (write the duplicate IDs to duplicated_snps.snplist)
"./plink --bfile rel5_hg19_updatedid --exclude duplicated_snps.snplist --make-bed"

bvers...@gmail.com

unread,
Apr 10, 2017, 5:56:31 PM4/10/17
to plink2-users
Many thanks, it works !!!

Op maandag 10 april 2017 23:39:25 UTC+2 schreef Christopher Chang:

bioinfo_gow

unread,
Apr 26, 2018, 12:15:32 AM4/26/18
to plink2-users
Hello Mr. Chang,

This step to exclude duplicated snps will exclude them all, and what if I want to keep one and exclude the duplicates alone and keep one. 
Looking for your suggestion.

G.

Christopher Chang

unread,
Apr 26, 2018, 1:54:14 AM4/26/18
to plink2-users
I usually generate new position-based IDs with plink 2.0's --set-all-var-ids flag so I don't have to worry about duplicate IDs afterward.

statgen

unread,
May 30, 2018, 11:40:26 PM5/30/18
to plink2-users
Hi, 

I am using plink 1.9 - and when I ran ./plink --bfile mydata --list-dulplicate-vars --out newdata, it didn't work. It came up with error saying unrecognized flag (as below). Does --list-duplicate-vars and --suppress-first only work in plink2? 

Many thanks in advance.
Novia

./plink_1.9_linux_160914 --bfile cv1dummy --list-duplicate-vars --out test

PLINK v1.90b2i 64-bit (8 Sep 2014)         https://www.cog-genomics.org/plink2

(C) 2005-2014 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to test.log.

Error: Unrecognized flag ('--list-duplicate-vars').

For more information, try 'plink --help [flag name]' or 'plink --help | more'.

Christopher Chang

unread,
May 30, 2018, 11:41:59 PM5/30/18
to plink2-users
Hi Novia,

It was added to plink 1.9 in Nov 2014, so you'll need to download a newer build.

statgen

unread,
May 30, 2018, 11:56:27 PM5/30/18
to plink2-users
Thank you! It's working now. 
Reply all
Reply to author
Forward
0 new messages