Bim file Variant ID column has a semicolon -- how does plink deal with this?

fasssster101

unread,

Aug 27, 2018, 10:42:54 AM8/27/18

to plink2-users

Hello,

I am working with 1000 Genomes Data Phase 3; specifically chr 22 has many variants with two rsIDs separated by a semicolon (see example 1) in the *.bim file. Using PLINK/1.9b_5.2.

Example 1:

22 rs587638893 0 16050568 A C

22 rs587720402 0 16050607 A G

22 rs587593704 0 16050627 T G

22 rs587670191 0 16050646 T G

22 esv3647175;esv3647176;esv3647177;esv3647178 0 16050654 <CN3> A

OR

22 rs539868657;rs561027534 0 16349650 T G

22 rs562311818;rs377092600 0 16404838 G GA

22 rs374006257;rs200929253 0 16577044 T TG

1) When filtering snps with --exclude and --extract, does plink recognizes all the RSIDs at this position?

2) I have noticed that there are many variants with duplicate variant IDs. This causes plink to crash when I run --clump option. So, I first run plink --list-duplicate-var and generate a list of duplicate IDs. However this does not include variants that have the same variant ID at the same position (Example 2). Therefore I use bash (cut -f2 $bimfile | uniq -D > remove_these_snps.txt) to add these snps to the snps from --list-duplicate-var to ultimately filter out with --exclude. Does this makes sense? Should this even be an issue for plink or am I doing something wrong?

Thank you for your help!

Example 2:

22 rs563541510 0 18078898 AAAAT A

22 rs563541510 0 18078898 AAAATAAAT A

22 rs563541510 0 18078898 AAAATAAATAAAT A

22 rs563541510 0 18078898 AAAATAAATAAATAAAT A

Christopher Chang

unread,

Aug 27, 2018, 12:00:05 PM8/27/18

to plink2-users

1. PLINK does not treat semicolons in variant IDs in a special manner.

2. You can use plink 2.0's --set-all-var-ids flag to generate almost-always-unique variant IDs. (1000 Genomes phase 3 will still have a few random pairs of variants with identical chrom, pos, and ref/alt alleles; I generally discard those.)

fasssster101

unread,

Aug 27, 2018, 12:53:39 PM8/27/18

to plink2-users

Thank you very much!

Christopher Chang

unread,

Aug 28, 2018, 3:59:55 PM8/28/18

to plink2-users

(An option to allow semicolon-containing IDs to be looked up by any of their components may be added soon, though.)

Reply all

Reply to author

Forward