Re: Guidance for PrediXcan

355 views
Skip to first unread message

Hae Im

unread,
Mar 2, 2021, 7:57:53 PM3/2/21
to Sagnik Palmal, FAUX Pierre, PrediXcan/MetaXcan
If you want to run PrediXcan using individual level data, you can use the new implementation that Alvaro developed.

Tutorial and documentation can be found here

I hope this helps
Haky



On Tue, Mar 2, 2021 at 4:27 PM Sagnik Palmal <spsa...@gmail.com> wrote:
Hi Dr Hae,

Hope you are doing well in these troubled times.

I am Sagnik Palmal, a doctoral student from Aix-Marseille Université, France. I am writing to you with respect to your method PrediXcan. We are currently working on a TWAS of skin color pigmentation using the GTEx data and would like to take advantage of your method for our analysis. 

Using a GWAS on pigmentation previously published by our team, we already have explored the S-PrediXcan implementation, hence using the summary statistics. We have also explored the S-MultiXcan approach. In order to have a better understanding, I would like to use the PrediXcan model directly on our genotypes. Although it was quite straightforward to implement the summary-based methods thanks to the tutorials posted on your GitHub page, I could not find any guidance/documentation on how to apply PrediXcan to our data. I am therefore wondering if you could provide me with documentation to run that program (manual, tutorial, etc.) or with direct help on that basic question (either from you, or from one of your collaborators)

Please let me know if you have any questions or need anything else from our side.

Thanks and regards,
Sagnik Palmal
Doctoral Student, UMR 7268 ADES
Aix-Marseille Université, Marseille
France


--
Hae Kyung Im, PhD - 任慧耕 - 임혜경
Assistant Professor, Section of Genetic Medicine
The University of Chicago Medicine & Biological Sciences
Member of Committee on Genetics, Genomics & Systems Biology
Center for Translational Data Science
5841 S. Maryland Ave. Chicago, IL 60637, USA| N412
Phone:  773.702.3898 | FAX: 773.702.2567
Email: ha...@uchicago.edu | Twitter: @hakyim
http://hakyimlab.org | https://github.com/hakyimlab | https://orcid.org/0000-0003-0333-5685
http://scholar.google.com/citations?user=1QD4sIcAAAAJ
Directions to my office https://tinyurl.com/hki-office

Hae Kyung Im

unread,
Mar 22, 2021, 6:05:54 PM3/22/21
to Sagnik Palmal, FAUX Pierre, PrediXcan/MetaXcan
Dear Sagnik 

Please see answers below. 

Haky


On Thu, Mar 11, 2021 at 4:52 AM Sagnik Palmal <spsa...@gmail.com> wrote:
Dear Dr Hae,

Thank you again for your help.

As you had indicated, we made use of the PrediXcan package and improvised to cater to our needs. Though we have few questions regarding the same as mentioned below. It would be of great help if you could answer as per your convenience.
  1. For the moment, we run our analyses using only one GTEx tissue (Skin not sun exposed). The first step of PrediXcan does not yield a predicted gene expression for some genes of interest. We understand that this could be the consequence of either the gene not predicted by your model (in such case, that gene would not be listed in the file models/eqtl/mashr/ of that tissue), or the model not able to predict the gene because one or more prediction SNP are not in our genotype data. The first case is that of a relevant gene for our study (IRF4): it is not listed in your model for that tissue although it has TPM record in GTEx (https://gtexportal.org/home/gene/IRF4). What is the likely reason for this and is there anything that could be done?

Expression alone does not ensure predictability. There are various reasons why a gene may not make it to the list. They may not have a good enough performance (elastic net models). For mashr, they need to have a variant with high posterior inclusion probability. 

  1. In the second case, I guess it is recommended to impute SNPs genotype, is this correct?
Imputing is a good idea. 



  1. There are various phenotypes which have profound dependencies on ceratin covariates (e.g. Sex as a covariate for phenotype height). We tried to use the results from the Predict.py step as inputs in a classical GWAS setup, where phenotype is regressed on the covarites (age, sex, genetic principal components) and predicted gene expression. We derived the p-value of association one gene at a time. But this analysis differs from the p-values from the PrediXcanAssociation.py output. Should we regress on the phenotype already adjusted for the covariates? Or do you advise to prefer S-PrediXcan to PrediXcan, since summary statistics already integrate the covariates? We would like to know your suggestion about how one should take care of the covariates.

If you want to adjust for covariates, you are better off running the regression directly with your own code.


  1. As mentioned in your 2019 S-MultiXcan paper, PrediXcan performs better when only a single tissue is causal and MultiXcan performs better with multiple tissues involved. We would like to know how apriori we can determine should we go for one approach or another. Also, in the case of MultiXcan, how can we determine which tissues are causal for the phenotype of interest.
There is no automatic way to know what’s the causal tissue. Given the qtl sharing across tissues, in general we have seen that using multixcan is the preferred approach. 


Thank you in advance.

Regards,
Sagnik Palmal
Doctoral Student, UMR 7268 ADES
Aix-Marseille Université, Marseille
France

On Wed, Mar 3, 2021 at 12:03 PM Sagnik Palmal <spsa...@gmail.com> wrote:
Dear Dr Hae,

Thank you very much for your prompt reply. We will explore this implementation and if we need anything will let you know.

Thanks and regards,
Sagnik Palmal

Hae Kyung Im

unread,
Apr 8, 2021, 10:47:07 AM4/8/21
to Andres Ruiz, PrediXcan/MetaXcan
Dear Andres,

What we found is that eQTLs are largely shared across tissues, which is the reason we recommended to use MultiXcan. You can take a look at this paper where we make that case https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007889

If using a subset of tissues gives you better answers, it makes sense to report that, as long as you didn't do some kind of p-hacking where you looked for multiple combinations until you found the one that gives you most significant result.

In summary, if you have a biological argument why the three tissues you selected are the most relevant ones, then you should go with that.

Haky

On Tue, Apr 6, 2021 at 10:44 AM Andres Ruiz <a.ru...@gmail.com> wrote:

Dear Haky

Sagnik passed on your email exchanges to me. I’m his supervisor. Many thanks for your advice. Very useful. One question that keeps nagging us is the issue of whether to use all the tissues in GTex or to narrow it down to the 3 skin related, or even just to sun-non-exposed skin. This is because the phenotype we are looking at is constitutive skin pigmentation (ie not resulting from tanning). Therefore, gene expression in skin seems the most relevant for our phenotypes (ie on biological grounds). In fact, we do seem to get the most sensible results with the 3 skin-related data from GTex, as opposed to all the tissues, or just one (sun-non-exposed skin). I assume that looking at 3 tissues has more power than just one. An on the other hand perhaps including all 49 tissues adds more noise? Would there be an objective way to evaluate this?

Regards

 

Andres

 

 

From: Sagnik Palmal <spsa...@gmail.com>
Date: Tuesday 23 March 2021 at 14:32
To: "a.ru...@gmail.com" <a.ru...@gmail.com>, Kaustubh Adhikari <kadhik...@gmail.com>
Subject: Fwd: Guidance for PrediXcan

 

 

---------- Forwarded message ---------
From: Hae Kyung Im <ha...@uchicago.edu>
Date: Mon, Mar 22, 2021 at 11:05 PM
Subject: Re: Guidance for PrediXcan
To: Sagnik Palmal <spsa...@gmail.com>
Cc: FAUX Pierre <pierr...@univ-amu.fr>, PrediXcan/MetaXcan <predixca...@googlegroups.com>

 

Dear Sagnik 

 

Please see answers below. 

 

Haky

 

On Thu, Mar 11, 2021 at 4:52 AM Sagnik Palmal <spsa...@gmail.com> wrote:

Dear Dr Hae,

 

Thank you again for your help.

 

As you had indicated, we made use of the PrediXcan package and improvised to cater to our needs. Though we have few questions regarding the same as mentioned below. It would be of great help if you could answer as per your convenience.

1.      For the moment, we run our analyses using only one GTEx tissue (Skin not sun exposed). The first step of PrediXcan does not yield a predicted gene expression for some genes of interest. We understand that this could be the consequence of either the gene not predicted by your model (in such case, that gene would not be listed in the file models/eqtl/mashr/ of that tissue), or the model not able to predict the gene because one or more prediction SNP are not in our genotype data. The first case is that of a relevant gene for our study (IRF4): it is not listed in your model for that tissue although it has TPM record in GTEx (https://gtexportal.org/home/gene/IRF4). What is the likely reason for this and is there anything that could be done?

 

Expression alone does not ensure predictability. There are various reasons why a gene may not make it to the list. They may not have a good enough performance (elastic net models). For mashr, they need to have a variant with high posterior inclusion probability. 

 

1.      In the second case, I guess it is recommended to impute SNPs genotype, is this correct?

Imputing is a good idea. 

 

 

1.       

2.      There are various phenotypes which have profound dependencies on ceratin covariates (e.g. Sex as a covariate for phenotype height). We tried to use the results from the Predict.py step as inputs in a classical GWAS setup, where phenotype is regressed on the covarites (age, sex, genetic principal components) and predicted gene expression. We derived the p-value of association one gene at a time. But this analysis differs from the p-values from the PrediXcanAssociation.py output. Should we regress on the phenotype already adjusted for the covariates? Or do you advise to prefer S-PrediXcan to PrediXcan, since summary statistics already integrate the covariates? We would like to know your suggestion about how one should take care of the covariates.

 

If you want to adjust for covariates, you are better off running the regression directly with your own code.

 

1.     

2.      As mentioned in your 2019 S-MultiXcan paper, PrediXcan performs better when only a single tissue is causal and MultiXcan performs better with multiple tissues involved. We would like to know how apriori we can determine should we go for one approach or another. Also, in the case of MultiXcan, how can we determine which tissues are causal for the phenotype of interest.

There is no automatic way to know what’s the causal tissue. Given the qtl sharing across tissues, in general we have seen that using multixcan is the preferred approach. 

 

 

1.       

Thank you in advance.



Regards,

Alvaro Barbeira

unread,
Apr 8, 2021, 12:58:38 PM4/8/21
to Hae Kyung Im, Andres Ruiz, PrediXcan/MetaXcan
Dear Andres,

At the risk of being redundant, I'd like to add yet another technicality: MASHR models incorporate cross-tissue information via MASH algorithm. 
So if you happen to use (S)MultiXcan with only MASHR skin tissues, the models still leverage cross-tissue data.
Elastic Net models are specifically trained with in-tissue data.

Best,

Alvaro


--
You received this message because you are subscribed to the Google Groups "PrediXcan/MetaXcan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to predixcanmetax...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/predixcanmetaxcan/CAOL2suw_s%3D18dHNB-5FLinoOGCbsxgh3LFUGOUAgXoeJD%2B1ecg%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages