Bim file Variant ID column has a semicolon -- how does plink deal with this?

190 views
Skip to first unread message

fasssster101

unread,
Aug 27, 2018, 10:42:54 AM8/27/18
to plink2-users
Hello, 
I am working with 1000 Genomes Data Phase 3; specifically chr 22 has many variants with two rsIDs separated by a semicolon (see example 1) in the *.bim file. Using  PLINK/1.9b_5.2. 

Example 1: 
22 rs587638893 0 16050568 A C
22 rs587720402 0 16050607 A G
22 rs587593704 0 16050627 T G
22 rs587670191 0 16050646 T G
22 esv3647175;esv3647176;esv3647177;esv3647178 0 16050654 <CN3> A

OR 

22 rs539868657;rs561027534 0 16349650 T G
22 rs562311818;rs377092600 0 16404838 G GA
22 rs374006257;rs200929253 0 16577044 T TG


1) When filtering snps with --exclude and --extract, does plink recognizes all the RSIDs at this position? 
2) I have noticed that there are many variants with duplicate variant IDs. This causes plink to crash when I run --clump option. So, I first run plink --list-duplicate-var and generate a list of duplicate IDs. However this does not include variants that have the same variant ID at the same position (Example 2). Therefore I use bash (cut -f2 $bimfile | uniq -D > remove_these_snps.txt) to add these snps to the snps from --list-duplicate-var to ultimately filter out with --exclude.  Does this makes sense? Should this even be an issue for plink or am I doing something wrong? 

Thank you for your help! 


Example 2: 
22 rs563541510 0 18078898 AAAAT A
22 rs563541510 0 18078898 AAAATAAAT A
22 rs563541510 0 18078898 AAAATAAATAAAT A
22 rs563541510 0 18078898 AAAATAAATAAATAAAT A

Christopher Chang

unread,
Aug 27, 2018, 12:00:05 PM8/27/18
to plink2-users
1. PLINK does not treat semicolons in variant IDs in a special manner.
2. You can use plink 2.0's --set-all-var-ids flag to generate almost-always-unique variant IDs.  (1000 Genomes phase 3 will still have a few random pairs of variants with identical chrom, pos, and ref/alt alleles; I generally discard those.)

fasssster101

unread,
Aug 27, 2018, 12:53:39 PM8/27/18
to plink2-users
Thank you very much! 

Christopher Chang

unread,
Aug 28, 2018, 3:59:55 PM8/28/18
to plink2-users
(An option to allow semicolon-containing IDs to be looked up by any of their components may be added soon, though.)
Reply all
Reply to author
Forward
0 new messages