Contradictory LD sign for non-major reference alleles

12 views
Skip to first unread message

Jonathan Margoliash

unread,
Jul 9, 2024, 6:42:33 PM (7 days ago) Jul 9
to plink2-users
Hi there,

I'm working with the UK Biobank imputed variant data stored in the bgen format. When I use plink2 to calculate LD between two variants from that dataset, it gives the same absolute value as when I convert the bgen file to VCF, load the data and manually calculate the LD from there, or load the data directly from the bgen file using bgen_reader in python and manually calculate the LD.

However, if one of the variants' reference alleles is not its major allele, then the sign of the LD returned by plink is opposite the sign produced when I calculate the LD using bgen_reader or bgenix. I think the bgen_reader/bgenix output has the correct sign, both because bgenix has the same authors as the bgen file format, but also because when I send LD matrices generated that way to the SuSiE fine-mapper, it terminates quickly and easily, but when I send LD matrices generated by plink2 to SuSiE, it fails to converge and warns that I may not be using the proper LD matrix.

Does anyone have an idea what might be going wrong? Is this an issue with how I'm loading the bgen file with plink, how I'm calculating the LD, or is this a bug? Below is the log file output when I used plink to generate the LD. In that example I am working with two example variants on chr15 (rs145408100 and rs2867932), one of which (rs2867932) has a non-major reference allele.

Thank you for the help,

Jonathan Margoliash

----
Log file:

PLINK v2.00a6LM AVX2 Intel (4 Jul 2024)
Options in effect:                                                                                                                               
  --bgen ukb_imp_chr15_v3.bgen ref-first                                
  --keep ../regression_ld/samples.txt                                      
  --r-unphased square                                                       
  --sample ukb46122_imp_chr1_v3_s487283.sample    
  --snps rs145408100,rs2867932
                  
Hostname: exp-15-40                                                                                              
Working directory: ...                                                                                                      
Start time: Mon Jul  8 15:43:17 2024                                                                                                                                                                                                                                              
Random number seed: 1720478597  
257485 MiB RAM detected, ~238491 available; reserving 128742 MiB for main
workspace.                                                                                                                                           Using 1 compute thread.                                                                                                                   --bgen: 2767971 variants detected, format v1.2.                                                                           487409 samples imported from .sample file to plink2-temporary.psam .                               --bgen: plink2-temporary.pgen + plink2-temporary.pvar written.                                                 487409 samples (264296 females, 222987 males, 126 ambiguous; 487409 founders)       loaded from plink2-temporary.psam.                                                                                             2767971 variants loaded from plink2-temporary.pvar.                                                                 Note: No phenotype data present.                                                                                                   --snps: 2 variants remaining.                                                                                                             --keep: 27608 samples remaining.                                                                                                  27608 samples (14248 females, 13360 males; 27608 founders) remaining after main        filters.                                                                                                                                                    Calculating allele frequencies... done.                                                                                            2 variants remaining after main filters.                                                                                          --r-unphased: Variant IDs written to plink2.unphased.vcor1.vars .                                           
--r-unphased: Matrix written to plink2.unphased.vcor1 .                                                                                                                           
End time: Mon Jul  8 17:19:42 2024                                                                                     

Chris Chang

unread,
Jul 9, 2024, 6:45:34 PM (7 days ago) Jul 9
to Jonathan Margoliash, plink2-users
You need to add the ‘ref-based’ modifier to —r-unphased.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/055e41fe-ec00-4a90-9de9-6442ccabbd3cn%40googlegroups.com.

Jonathan Margoliash

unread,
Jul 9, 2024, 6:55:26 PM (7 days ago) Jul 9
to plink2-users
D'oy, thank you. That's well documented and still I missed that.

Best,

Jonathan

Matthew Maher

unread,
Jul 9, 2024, 7:46:21 PM (7 days ago) Jul 9
to Jonathan Margoliash, plink2-users
perhaps you want to use the "ref-based" option?

--
Reply all
Reply to author
Forward
0 new messages