Outlier splicing analysis

198 views
Skip to first unread message

­홍성은 / 학생 / 의과학과

unread,
Jun 4, 2020, 5:02:51 AM6/4/20
to leafcutter-users
Hi, I have questions about Outlier splicing analysis.
I've done Differential Splicing analysis with my dataset, 
but since I'm studying mendelian diseases, I wanted to try Outlier splicing analysis.
I ran them without any error messages and got 3 tab-separated files (clusterPvals, effSize, pVals)

However, I couldn't understand the how to interpret and visualize output files.
My questions are as following

1. How can you adjust P values? (According to https://davidaknowles.github.io/leafcutter/articles/UsageLeafcutterMD.html , p values are unadjusted)
2. How can you visualize the effSize and pVals just like leafViz does? (how can you get the percentage of splicing events just like LeafViz)
3. Do you have any specified protocols or examples for further analysis of 3 output files?

Thank you!


garg...@gmail.com

unread,
Jun 4, 2020, 6:15:13 PM6/4/20
to leafcutter-users
Hello and thank you for your interest in LeafCutterMD,

First of all, let me note some additional materials that might help you with the LeafCutterMD package. First, we will be presenting a tutorial at this year's ISMB in July and they are taking registrations now:


which will include a hands on demo of how to best use LeafCutterMD in a rare disease cohort. We plan to make these materials available on github after that tutorial as well if you cannot attend. Also the LeafCutterMD paper is freely available on the bioinformatics website: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa259/5823301

Now to directly answer some of your questions:

1) We do not adjust p.values because adjustment is straightforward and use-case dependent. For example, if you have a single gene or a list of genes of interest in your patient prior to running LeafCutterMD, then you would only want to adjust for those comparisons and not genome-wide. I assume, however, you are asking about the genome-wide context where you have no prior list of genes to narrow your focus ahead of time. In this case, in R you could use commands like 

df_clusterP <- read.table(file="leafcutter_outlier_clusterPvals.txt")
df_cluPadj <- numcolwise(p.adjust)(df_clusterP, method = "BH")

which would give you adjusted p-values for all patients in your cohort. If you wanted this as another table, you could write the result out:

write.table(df_cluPadj,"leafcutter_outlier_clusterAdjustedPvals.txt")

If instead you were in the situation I mentioned earlier with specific genes in specific patients, then you would want to do the adjustment after subsetting your pvalues down to just those of interest and calling p.adjust with your chosen method. Above code uses the Benjamini-Hotchberg (method="BH") but all common types are available in p.adjust: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/p.adjust.html

2) We currently do not have automated visualization support with leafViz. We have written scripts to modify the leafcutterMD outputs to work with leafViz for testing purposes, but that code is not robust or ready for public consumption, and in practice at this time we do most of our work examining these tables from leafcutterMD alongside IGV browsing of the sashimi plots https://software.broadinstitute.org/software/igv/Sashimi As our code is open source we always welcome community contributions to the project if you would like to assist in the LeafViz integration.

3) This is forthcoming with our ISMB tutorial in July that I mentioned above. At this time, the bioinformatics paper, its supplement, and the github code/documentation are all the publicly available information we have. In general, the rare disease case review generally would not be looking at RNA-seq splicing results in a vacuum: you would probably have WES/WGS DNA-seq results, as well as other forms of RNA analysis: outlier expression, ASE, fusions. And so you would probably want to integrate information across these data types to narrow your candidate lists. You probably would also use tools like PCAN https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1401-2 for phenotypic prioritization. But in the absence of any of this or prior data, you would probably just do the global adjustments as above, and for a given patient select only clusters that were significant after multiple comparisons. Then you could work your way through this cluster list ranked by p-value. By looking at these clusters in the effSize and pVals tables along with IGV browsing of the proband along with ~3-5 representative samples from your cohort, you can piece together what outlier splicing events are occurring in this patient and which ones are in genes of interest.

Let me know if you have any additional questions,
Garrett
Reply all
Reply to author
Forward
0 new messages