Hi Dr Hae,Hope you are doing well in these troubled times.I am Sagnik Palmal, a doctoral student from Aix-Marseille Université, France. I am writing to you with respect to your method PrediXcan. We are currently working on a TWAS of skin color pigmentation using the GTEx data and would like to take advantage of your method for our analysis.Using a GWAS on pigmentation previously published by our team, we already have explored the S-PrediXcan implementation, hence using the summary statistics. We have also explored the S-MultiXcan approach. In order to have a better understanding, I would like to use the PrediXcan model directly on our genotypes. Although it was quite straightforward to implement the summary-based methods thanks to the tutorials posted on your GitHub page, I could not find any guidance/documentation on how to apply PrediXcan to our data. I am therefore wondering if you could provide me with documentation to run that program (manual, tutorial, etc.) or with direct help on that basic question (either from you, or from one of your collaborators)Please let me know if you have any questions or need anything else from our side.Thanks and regards,Sagnik PalmalDoctoral Student, UMR 7268 ADESAix-Marseille Université, MarseilleFrance
Dear Dr Hae,Thank you again for your help.As you had indicated, we made use of the PrediXcan package and improvised to cater to our needs. Though we have few questions regarding the same as mentioned below. It would be of great help if you could answer as per your convenience.
- For the moment, we run our analyses using only one GTEx tissue (Skin not sun exposed). The first step of PrediXcan does not yield a predicted gene expression for some genes of interest. We understand that this could be the consequence of either the gene not predicted by your model (in such case, that gene would not be listed in the file models/eqtl/mashr/ of that tissue), or the model not able to predict the gene because one or more prediction SNP are not in our genotype data. The first case is that of a relevant gene for our study (IRF4): it is not listed in your model for that tissue although it has TPM record in GTEx (https://gtexportal.org/home/gene/IRF4). What is the likely reason for this and is there anything that could be done?
- In the second case, I guess it is recommended to impute SNPs genotype, is this correct?
- There are various phenotypes which have profound dependencies on ceratin covariates (e.g. Sex as a covariate for phenotype height). We tried to use the results from the Predict.py step as inputs in a classical GWAS setup, where phenotype is regressed on the covarites (age, sex, genetic principal components) and predicted gene expression. We derived the p-value of association one gene at a time. But this analysis differs from the p-values from the PrediXcanAssociation.py output. Should we regress on the phenotype already adjusted for the covariates? Or do you advise to prefer S-PrediXcan to PrediXcan, since summary statistics already integrate the covariates? We would like to know your suggestion about how one should take care of the covariates.
- As mentioned in your 2019 S-MultiXcan paper, PrediXcan performs better when only a single tissue is causal and MultiXcan performs better with multiple tissues involved. We would like to know how apriori we can determine should we go for one approach or another. Also, in the case of MultiXcan, how can we determine which tissues are causal for the phenotype of interest.
Thank you in advance.Regards,
Sagnik PalmalDoctoral Student, UMR 7268 ADESAix-Marseille Université, MarseilleFrance
On Wed, Mar 3, 2021 at 12:03 PM Sagnik Palmal <spsa...@gmail.com> wrote:Dear Dr Hae,Thank you very much for your prompt reply. We will explore this implementation and if we need anything will let you know.
Thanks and regards,Sagnik Palmal
Dear Haky
Sagnik passed on your email exchanges to me. I’m his supervisor. Many thanks for your advice. Very useful. One question that keeps nagging us is the issue of whether to use all the tissues in GTex or to narrow it down to the 3 skin related, or even just to sun-non-exposed skin. This is because the phenotype we are looking at is constitutive skin pigmentation (ie not resulting from tanning). Therefore, gene expression in skin seems the most relevant for our phenotypes (ie on biological grounds). In fact, we do seem to get the most sensible results with the 3 skin-related data from GTex, as opposed to all the tissues, or just one (sun-non-exposed skin). I assume that looking at 3 tissues has more power than just one. An on the other hand perhaps including all 49 tissues adds more noise? Would there be an objective way to evaluate this?
Regards
Andres
From: Sagnik Palmal <spsa...@gmail.com>
Date: Tuesday 23 March 2021 at 14:32
To: "a.ru...@gmail.com" <a.ru...@gmail.com>, Kaustubh Adhikari <kadhik...@gmail.com>
Subject: Fwd: Guidance for PrediXcan
---------- Forwarded message ---------
From: Hae Kyung Im <ha...@uchicago.edu>
Date: Mon, Mar 22, 2021 at 11:05 PM
Subject: Re: Guidance for PrediXcan
To: Sagnik Palmal <spsa...@gmail.com>
Cc: FAUX Pierre <pierr...@univ-amu.fr>, PrediXcan/MetaXcan <predixca...@googlegroups.com>
Dear Sagnik
Please see answers below.
Haky
On Thu, Mar 11, 2021 at 4:52 AM Sagnik Palmal <spsa...@gmail.com> wrote:
Dear Dr Hae,
Thank you again for your help.
As you had indicated, we made use of the PrediXcan package and improvised to cater to our needs. Though we have few questions regarding the same as mentioned below. It would be of great help if you could answer as per your convenience.
1. For the moment, we run our analyses using only one GTEx tissue (Skin not sun exposed). The first step of PrediXcan does not yield a predicted gene expression for some genes of interest. We understand that this could be the consequence of either the gene not predicted by your model (in such case, that gene would not be listed in the file models/eqtl/mashr/ of that tissue), or the model not able to predict the gene because one or more prediction SNP are not in our genotype data. The first case is that of a relevant gene for our study (IRF4): it is not listed in your model for that tissue although it has TPM record in GTEx (https://gtexportal.org/home/gene/IRF4). What is the likely reason for this and is there anything that could be done?
Expression alone does not ensure predictability. There are various reasons why a gene may not make it to the list. They may not have a good enough performance (elastic net models). For mashr, they need to have a variant with high posterior inclusion probability.
1. In the second case, I guess it is recommended to impute SNPs genotype, is this correct?
Imputing is a good idea.
1.
2. There are various phenotypes which have profound dependencies on ceratin covariates (e.g. Sex as a covariate for phenotype height). We tried to use the results from the Predict.py step as inputs in a classical GWAS setup, where phenotype is regressed on the covarites (age, sex, genetic principal components) and predicted gene expression. We derived the p-value of association one gene at a time. But this analysis differs from the p-values from the PrediXcanAssociation.py output. Should we regress on the phenotype already adjusted for the covariates? Or do you advise to prefer S-PrediXcan to PrediXcan, since summary statistics already integrate the covariates? We would like to know your suggestion about how one should take care of the covariates.
If you want to adjust for covariates, you are better off running the regression directly with your own code.
1.
2. As mentioned in your 2019 S-MultiXcan paper, PrediXcan performs better when only a single tissue is causal and MultiXcan performs better with multiple tissues involved. We would like to know how apriori we can determine should we go for one approach or another. Also, in the case of MultiXcan, how can we determine which tissues are causal for the phenotype of interest.
There is no automatic way to know what’s the causal tissue. Given the qtl sharing across tissues, in general we have seen that using multixcan is the preferred approach.
1.
Thank you in advance.
Regards,
--
You received this message because you are subscribed to the Google Groups "PrediXcan/MetaXcan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to predixcanmetax...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/predixcanmetaxcan/CAOL2suw_s%3D18dHNB-5FLinoOGCbsxgh3LFUGOUAgXoeJD%2B1ecg%40mail.gmail.com.