Question on QQ Plot

50 views
Skip to first unread message

Matthew Maher

unread,
Dec 6, 2022, 9:23:37 AM12/6/22
to locuszoom
Hi-   I recently uploaded a data set (gwas/575588) and I have a question about the generated QQ plot (see below), which I don't quite understand. 

Four different groups are plotted (by MAF ranges according to the legend).

Q1:  what is the # in the parens on each row of the legend?  I might expect it to be the # of variants in that set, but the values are all the same. 

Q2:  Maybe I'm misunderstanding, but the legend has one entry for basically "ALL" (0.0-0.5) and then three ranges which are subsets of that.   But the plot seems to indicate the presence of signal/inflation only on the ALL, but not any of the subsets, which doesn't seem mathematically possible, making me think the labels<-->colors are misapplied.   thoughts? 

Thanks for LocusZoom - nice tool!

QQ.png

Andy Boughton

unread,
Dec 7, 2022, 12:12:37 AM12/7/22
to locu...@googlegroups.com
Thanks for your question.

Broadly speaking, the QQ plot divides the dataset into four quantiles of equal size (hence similar or identical number in each bin). (if alllele freq data is not specified, only a single bin is specified) A quick sanity check would be to spot-check the number of lines in your GWAS against the number of variants in each quantile. (`zcat < my_gwas.gz | wc -l`)

Your second question about the "all" interval is a good one. At the moment I'm rather pinned down on another project with a tight deadline; would you be willing to spot check this with your own QQ plot and report the results back? If you do find a bug, we can try to feed that back into the system and improve for everyone.

Thanks,

-Andy Boughton
abo...@umich.edu

Applications Programmer/Analyst, Lead
Center for Statistical Genetics
University of Michigan



<QQ.png>

--
You received this message because you are subscribed to the Google Groups "locuszoom" group.
To unsubscribe from this group and stop receiving emails from it, send an email to locuszoom+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/locuszoom/82e6f4d9-b48a-433f-a784-5cbf90a7d006n%40googlegroups.com.
<QQ.png>

Matthew Maher

unread,
Dec 7, 2022, 11:26:54 AM12/7/22
to locuszoom
okay, I get it now that it's trying to showing even quartiles (thus the counts match).   I generated my own per-quartile ranges/plots for my GWAS and I can report that in the LocusZoom QQ plot, I'd say all the dots are correct and colored correctly for the four quartiles corresponding to the colors in the legend from top to bottom.   The problem is simply that the legend shows an incorrect value (both times for the same value) in two places:

By my calculation the precise quartile boundaries should be:

 0.000233  -  0.003883
 0.003883  -  0.027344
 0.027344  -  0.178523
 0.178523  -  0.500000

But the LocusZoom legend (which does some rounding) shows: 

     0 ≤ MAF < 0.50 (3611114)      should be .004       
     0 ≤ MAF < 0.03 (3611114)       should be .004
0.03 ≤ MAF < 0.18 (3611114)
0.18 ≤ MAF < 0.50 (3611114)

Matthew Maher

unread,
Dec 7, 2022, 1:45:02 PM12/7/22
to locuszoom
And FWIW, I clicked around via the browser debugger and I can see that the QQ data fetch from the LZ server arrives with incorrect values already in place for the MAF quartile ranges (image below).   So I believe the problem is occurring on the server side.

And FFWIW, if I take my GWAS's specific MAFs and round them to two decimals, I get these exact range values seen here (0 - 0 , 0 - 0.03, 0.03 - 0.18...).  So I'd guess there is some rogue rounding getting applied which causes this problem when the MAF's skew very low.

LZdebug.png

Matthew Maher

unread,
Dec 7, 2022, 2:19:34 PM12/7/22
to locuszoom
I'm guessing these settings from pheweb/parse_utils.py could be the problem - since a study with a large # of variants will have a large # of rare variants, using only 2 digits, the bottom quartile could easily be 0.00 - 0.00.  Most summary stats input files would supply something like 4-6 digits.


'maf': {
'type': float,
'range': [0, 0.5],

'sigfigs': 2,

'tooltip_lztemplate': {'transform': '|percent'},

'display': 'MAF',

},

'af': {

'aliases': ['A1FREQ', 'FRQ'],

'type': float,

'range': [0, 1],

'proportion_sigfigs': 2,

'tooltip_lztemplate': {'transform': '|percent'},

'display': 'AF',

},

'case_af': {

'aliases': ['af.cases'],

'type': float,

'range': [0, 1],

'proportion_sigfigs': 2,

'tooltip_lztemplate': {'transform': '|percent'},

'display': 'AF among cases',

},

'control_af': {

'aliases': ['af.controls'],

'type': float,

'range': [0, 1],

'proportion_sigfigs': 2,

'tooltip_lztemplate': {'transform': '|percent'},

'display': 'AF among controls',
},
Reply all
Reply to author
Forward
0 new messages