GO-term analysis on rMATS output

166 views

Skip to first unread message

Esben Thuesen

unread,

Nov 17, 2023, 7:58:48 AM11/17/23

to rMATS User Group

Hello,

I am using rMATS for the first time.

I get quite a lot of significant hits, so I was planning to do a GO-term/GSEA/similar kind of analysis to be able to draw some conclusions.

Does anyone have experience doing such on the rMATS output? Which programs were used, and would you recommend a specific pipeline?

I'm also open to any other good ideas for making sense of the output rMATS output. :-)

Thanks in advance,

Esben

Thomas Danhorn

unread,

Nov 21, 2023, 1:13:39 PM11/21/23

to Esben Thuesen, rMATS User Group

I would recommend looking at the IncLevelDifference (also as a percentage
of the IncLevel) to see if a statisitically significant event has any
probability of beeing biologically significant. Example: an exon gets
skipped 40% of the time in your control and 45% in your treatment, and
your replicates are so consistent that your adjusted p-value is low. But
does that change the biology in any meaningful way? I doubt it. On the
other had, you might have a change going from 0% to 5%, or 20% to 80%, and
those can be huge, espcially if the alternate splice form is an inhibitor
of the original one. So be sure to distinguish statistical from
biological significance.

Pathway analysis is certainly a good way to get an overview and see if
there are certain cellular processes that are spcifically affected by
splicing events (e.g. caused by your treatment, or mutant, or whatever).
That you have to do at the gene level, so you take the list of genes and
analyze that.

There are two basic types of enrichment analyses, Over-Representation
analysis (ORA) and Gene-Set Enichment Analysis (GSEA). Of these I would
stick with ORA, since GSEA needs a ranked list of genes, and (unlike with
gene expression, where you can use fold-change), this is not trivial or
even meaningful with splicing. Probably the simplest way is a web-based
tool like DAVID (https://david.ncifcrf.gov/), or Enrichr
(https://maayanlab.cloud/Enrichr/). DAVID does a bunch of things, so you
may take a bit figure it out, but it can take a variety of gene IDs,
including Ensembl. You can export the results as a text table (which you
can open as a spreadsheet). The disadvantage of a web-based tool is that
it can be hard to reproduce results, beacuse they might update the
underlying databases, and you have no choice, or would not even know about
it.
There are several R packages you can use. The most basic might be to
count the overlaps of your significant genes with the genes in a pathway
and the "background" (genes in the database, but not this pathway) and run
a Fisher's Exact test (same as hypergeometric test), but it might be eaier
to use one of the packages designed for the purpose. ClusterProfiler has
a lot of methods and gives a variety of graphs, so might be worth
exploring.

Whatever tool you use, there are variety of databases to run it against,
including KEGG, various GO databses, Hallmark, and many more. Which ones
you pick is a question of which have the most useful categories for your
purpose -- if you care whether a gene is e.g. a phosphatase, you might
like GO Molecular Function, but if not, skip that one; it really depends
on your questions. I would use all that you really care about, but as few
as possible (and ideally as little overlap as possible), because you need
to apply multiple-testing correction to the p-values, and the more things
you test against, the harsher that is.
If you are using R for the analysis, have a look at msigdbr, it has a good
collection of these databases with a variety of gene IDs for several model
organisms.

Good luck!

Thomas

> --
> You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rmats-user-gro...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/rmats-user-group/eed17830-e94f-4653-9692-7e95f3d15057n%40googlegroups.com.
>

Reply all

Reply to author

Forward

0 new messages