Silent Variant Re-ordering Contradicts Documentation

106 views
Skip to first unread message

rlichte...@gmail.com

unread,
Apr 30, 2018, 11:04:20 AM4/30/18
to plink2-users
Hi,

We had an issue that arose while we were using PLINK (plink 2.00 alpha build from 2018-04-16) to process the UK Biobank version 3 imputed data BGEN file sets.

Take position 32490470 on chromosome 13. Here is the metadata from the .bgen file that comes directly from UK Biobank. Note, in particular, the order.

$$$ bgenix -g ukb_imp_chr13_v3.bgen -list 2> /dev/null | awk '$4==32490470'
13:32490470_A_G rs569963582     13      32490470        2       A       G
13:32490470_A_ATGTGTGTG rs760828978     13  32490470    2       A       ATGTGTGTG

Now, let's generate a PLINK 2.0 file set as follows:

$$$ plink2 --bgen ukb_imp_chr13_v3.bgen --sample ukbAPPID_imp_chr13_v3_s487395.sample --make-pgen --out chr13_v3

And check the presence and order of variants.

$$$ < chr13_v3.pvar awk '$2==32490470'
13      32490470        rs760828978     ATGTGTGTG       A
13      32490470        rs569963582     G   A

So PLINK is silently reversing the order of some multi-allelic variants. We would expect the input and output order of variants to be identical as per https://www.cog-genomics.org/plink/2.0/data#make_pgen

--sort-vars {mode}

By default, --make-{b}pgen/--make-bed do not resort the variants, and they'll error out if the input file is not at least sorted by chromosome. (This is a change from PLINK 1.x.) However, if you add --sort-vars, the variants will be resorted by chromosome code, then position, then ID.


We were relying on the lack of variant reording as stated in the documentation, and this really bit us. It could bite somebody in the future, so I think either the documentation should be changed to fit the behavior of the software, or better yet, the software should be changed to fit the description in the documentation.

Very best,

Ryan Lichtenwalter
Human Pain Genetics Lab
McGill University

Christopher Chang

unread,
Apr 30, 2018, 11:20:09 AM4/30/18
to plink2-users
Are you absolutely certain that it's plink2, not the bgenix -list command, that is reordering the variants here?  I'm quite sure your plink2 command leaves the order unchanged.

Ryan

unread,
Apr 30, 2018, 11:27:10 AM4/30/18
to Christopher Chang, plink2-users
Huh. Fair point. No, I'm not. I guess my options are to a) check the bgenix source code, b) get in touch with Gavin, or c) verify the actual .bgen file variant order in some other way.

Do you have any suggestions to continue tracking this down, especially regarding c)?

Ryan

--
You received this message because you are subscribed to a topic in the Google Groups "plink2-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/plink2-users/2Ub8MJPL2p0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to plink2-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ryan

unread,
Apr 30, 2018, 11:30:15 AM4/30/18
to Christopher Chang, plink2-users
Well, anyhow, if you're sure it isn't plink2, then it isn't plink2. Not your problem. Sorry for the false report! I'll keep tracking this elsewhere.

Ryan

Christopher Chang

unread,
Apr 30, 2018, 11:40:31 AM4/30/18
to plink2-users
You can probably get a response from Gavin by posting on the OXSTATGEN mailing list (https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=OXSTATGEN ).  Alternatively, if you're willing to get your hands a bit dirty, it doesn't take that much C++ code to dump just the sequence of variant IDs in a .bgen file (since you can simply skip over the compressed genotype data).


On Monday, April 30, 2018 at 8:27:10 AM UTC-7, Ryan wrote:
Huh. Fair point. No, I'm not. I guess my options are to a) check the bgenix source code, b) get in touch with Gavin, or c) verify the actual .bgen file variant order in some other way.

Do you have any suggestions to continue tracking this down, especially regarding c)?

Ryan

Ryan

unread,
Apr 30, 2018, 11:56:54 AM4/30/18
to Christopher Chang, plink2-users
In case somebody stumbles onto this in the future, I'll post the resolution:

From Gavin:

Hi, bgenix -list currently lists variants in index order, which means ordered by position, alleles and ID fields, and I think that explains what you've observed. (It could be altered to do that differently but doesn't at the moment.)

Best, g.

So it seems that the plink2 command did just what it said it would and it was my assumption about the bgenix -list command that was in error.

As always, thanks for the great work on PLINK!
Reply all
Reply to author
Forward
0 new messages