--recover-var-ids

178 views
Skip to first unread message

Matthew Maher

unread,
Aug 13, 2023, 5:48:50 PM8/13/23
to plink2-users
I believe there may be some sort of minor "off-by-one" problem in the functioning of PLINK2's --recover-var-ids (possibly stemming from non-unique IDs?).  The first strange thing I note that what seems like a miscount (off by 1) of the # of rows in the old BIM; after that the operation fails to correct one of the rows.

I've attached a tar of my TEST1 fileset from the below example, which is just two PLINK2 calls (--set-all-var-ids, followed by --recover-var-ids) with the BIM file displayed before/after.

Thanks for any info and thanks for PLINK*

(base) -bash:uger-c006:~ 1100 $ cat TEST1.bim
4 rs180740005 0 10834 C A
4 rs180740005 0 10834 G A
4 rs1579842797 0 190204554 AGG A
4 rs1579842797 0 190204554 AGGG A
(base) -bash:uger-c006:~ 1100 $
$PLINK2BIN --bfile TEST1 --set-all-var-ids @:#:\$r:\$a --make-bed --out TEST2
PLINK v2.00a5LM 64-bit Intel (4 Aug 2023)      www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to TEST2.log.
Options in effect:
  --bfile TEST1
  --make-bed
  --out TEST2
  --set-all-var-ids @:#:$r:$a

Start time: Sun Aug 13 21:33:29 2023
773217 MiB RAM detected, ~736323 available; reserving 386608 MiB for main
workspace.
Using 1 compute thread.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from TEST1.fam.
4 variants loaded from TEST1.bim.
Note: No phenotype data present.
Writing TEST2.fam ... done.
Writing TEST2.bim ... done.
Writing TEST2.bed ... done.
End time: Sun Aug 13 21:33:29 2023
(base) -bash:uger-c006:~ 1101 $
cat TEST2.bim
4 4:10834:A:C 0 10834 C A
4 4:10834:A:G 0 10834 G A
4 4:190204554:A:AGG 0 190204554 AGG A
4 4:190204554:A:AGGG 0 190204554 AGGG A
(base) -bash:uger-c006:~ 1102 $
$PLINK2BIN --bfile TEST2 --recover-var-ids TEST1.bim strict-bim-order partial --make-bed --out TEST3
PLINK v2.00a5LM 64-bit Intel (4 Aug 2023)      www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to TEST3.log.
Options in effect:
  --bfile TEST2
  --make-bed
  --out TEST3
  --recover-var-ids TEST1.bim strict-bim-order partial

Start time: Sun Aug 13 21:34:04 2023
773217 MiB RAM detected, ~736323 available; reserving 386608 MiB for main
workspace.
Using 1 compute thread.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from TEST2.fam.
4 variants loaded from TEST2.bim.
Note: No phenotype data present.
--recover-var-ids:
5 lines scanned.
--recover-var-ids: 3/4 IDs updated.

Writing TEST3.fam ... done.
Writing TEST3.bim ... done.
Writing TEST3.bed ... done.
End time: Sun Aug 13 21:34:04 2023
(base) -bash:uger-c006:~ 1103 $
cat TEST3.bim
4 rs180740005 0 10834 C A
4 rs180740005 0 10834 G A
4 rs1579842797 0 190204554 AGG A
4
4:190204554:A:AGGG 0 190204554 AGGG A
TEST1.tar

Christopher Chang

unread,
Aug 13, 2023, 10:52:45 PM8/13/23
to plink2-users
Thanks for reporting this.  Bugfix is posted; let me know if you still see a problem.

Matthew Maher

unread,
Aug 14, 2023, 9:55:07 AM8/14/23
to plink2-users
The fix looks good for the resultant fileset - thanks.   But I'll note that the informational message "--recover-var-ids:  ### lines scanned" still seems to be displaying an incorrect count.

Christopher Chang

unread,
Aug 14, 2023, 12:54:03 PM8/14/23
to plink2-users
Oh, thanks for the reminder; that fix is on GitHub and will be in the next build.
Reply all
Reply to author
Forward
0 new messages