Removing duplicate SNP based on allele

180 views
Skip to first unread message

R Stephanie L

unread,
Apr 29, 2021, 6:17:46 AM4/29/21
to plink2-users
Hello,

I am trying to produce a .raw file with the following command. However, I get an error as there is a duplicate SNP with different alleles (C-G and C-T). The alleles that I require based on the GWAS I want to use are G-C.
How do I keep the SNP with G-C allleles only?

plink --bfile ukbb_edn_snps --remove ${HOME}/Data/UKBIOBANK/Exclusion_participants/data_removed_dec2020.txt --reference-allele "{HOME}/Data/Discovery_samples/Edn/ukb_education_reference_list.txt" --recode A include-alt --make-bed --out ${HOME}/Data/UK_Biobank/GeneticData/ukb_edn


Error: Duplicate ID 'rs7040995'.

TIA

Christopher Chang

unread,
Apr 29, 2021, 11:10:52 AM4/29/21
to plink2-users
You need to assign the SNPs different IDs, with e.g. plink 2.0's --set-all-var-ids flag.
Message has been deleted
Message has been deleted

dodomba

unread,
Apr 30, 2021, 2:42:57 AM4/30/21
to plink2-users
when we --set-all-var-ids with allele names , there's a problem because after merging theres many snps need to be flipped 
but now its snp name has changed what do you suggest to do ?

Christopher Chang

unread,
May 3, 2021, 11:14:02 AM5/3/21
to plink2-users
This question doesn't make any sense.

Please include a full .log file(s) illustrating exactly what you're talking about in all future posts.

dodomba

unread,
May 3, 2021, 2:51:09 PM5/3/21
to plink2-users
yeah , I fixed that issue , but I have last question , which is more general.
I wanted to merge two datasets but the first has SNPs ids that are not in rs ..i also downloaded its conversion file to rs ids.
but i don't know whether to update id names or to just rename all snp ids to chr and position , which would give me more overlapping between datasets when merging?
there's also the question whether to rename snp ids by chr and position  or chr/position and unique Allele like this 1:513251 vs 1:5235823,A,C

tell me your opinion on this and specially when do we choose to rename snp id by unique allele or just chr/pos ? in regards to overlapping and later issues like positions with same id etc...

Christopher Chang

unread,
May 4, 2021, 12:24:02 PM5/4/21
to plink2-users
plink is a tool.  You are responsible for knowing what you want to do with the tool.  You should not be using plink if you cannot reason out the answers to these questions yourself.
Reply all
Reply to author
Forward
0 new messages