Re: S-Predixcan inquiries- gene lists and output interpretation

143 views

Skip to first unread message

Hae Kyung Im

unread,

May 3, 2018, 3:51:23 PM5/3/18

to Juan Carlos Ramírez Tapia, PrediXcan/MetaXcan

Hi Juan

please see response below.

I hope this help. I've cc'd the mailing list for others to chime in.

Haky

On Tue, May 1, 2018 at 11:33 AM, Juan Carlos Ramírez Tapia <jcrt...@gmail.com> wrote:

Hi Haky,

I was wondering if I can follow up on with regards to how to interpret my results from the output I get from S-PrediXcan after using my gwas sum.statistics data .

There are different components I would like to understand for interpreting results. I'm hoping that you or someone in your group can guide me through this.

1. On the number of genes available in the predictDB.

I was expecting to see many more gene hits on some tissues after I ran S-PrediXcan. When I didn’t see some genes, I explored the list of genes available in the models stored in predictDB, and I noticed that for Brain tissues for example : amygdala, has 2369 genes, and hippocampus has 2824 genes. So if I expect were to see some genes that are not within those lists, it won’t show in my results correct? Is this because the only genes included in PrediXcan are those that would have better predictive performance? Are these are the only ones that can be found from GTEx ?

The small number of genes with prediction models are mainly due to sample size, and of course effect size. If the genetic effect is very small we need larger sample sizes to be able to do any prediction.

2. On the extract I’m attaching This is part of the result from a MDD gwas summary statistics in amygdala tissue.

Of these three genes, lets say: HLA-C seems to have significant risk association with MDD (P-val 0017), although the predicted performance p-val is also significant (4.41E-08), there were only 2 SNPs out of 24 that were found in the model. Would this be reliable or fair way to interpret this gene for example? In contrast, MMP15 which had no significant risk association with MDD (P-val 0.4089), it had a lot more SNPS found in the model ( 52 out of 53).

I’m curious how to look at these opposite trends or if this is reasonable way to interpret these particular genes.

Ideally, you need to impute the SNPs so that you would have good coverage of SNPs in the model. We use elastic net rather than lasso to try to be more robust to missing SNPs but there is only so much that can be done. You can use ImpG or DIST (https://academic.oup.com/bioinformatics/article/29/22/2925/315390) to impute summary statistics. Or if you have individual level data, you can use the public imputation servers at U Michigan or Sanger.

3. What is “var_g” and ‘effect_size’ in the output columns in S-PrediXcan and how are these interpreted?

effect_size is the effect on phenotype of one standard deviation change in predicted expression, in whichever scale the original GWAS was performed.

var_g is the variance of the predicted expression. We use it when results look suspicious and this variance is very small may be a red flag.

Thank you Haky, Any suggestions or comments will be appreciated.

Best,

Juan

-----------

Hae Kyung Im, PhD

Assistant Professor, Section of Genetic Medicine

The University of Chicago Medicine & Biological Sciences

Member of Committee on Genetics, Genomics & Systems Biology

Center for Data Intensive Science

5841 S. Maryland Ave. Chicago, IL 60637, USA| N412, MC6091

Phone: 773.702.3898 | FAX: 773.702.2567

Email: ha...@uchicago.edu | Twitter: @hakyim

http://hakyimlab.org | https://github.com/hakyimlab

http://scholar.google.com/citations?user=1QD4sIcAAAAJ

Reply all

Reply to author

Forward

0 new messages