I would recommend looking at the IncLevelDifference (also as a percentage
of the IncLevel) to see if a statisitically significant event has any
probability of beeing biologically significant. Example: an exon gets
skipped 40% of the time in your control and 45% in your treatment, and
your replicates are so consistent that your adjusted p-value is low. But
does that change the biology in any meaningful way? I doubt it. On the
other had, you might have a change going from 0% to 5%, or 20% to 80%, and
those can be huge, espcially if the alternate splice form is an inhibitor
of the original one. So be sure to distinguish statistical from
biological significance.
Pathway analysis is certainly a good way to get an overview and see if
there are certain cellular processes that are spcifically affected by
splicing events (e.g. caused by your treatment, or mutant, or whatever).
That you have to do at the gene level, so you take the list of genes and
analyze that.
There are two basic types of enrichment analyses, Over-Representation
analysis (ORA) and Gene-Set Enichment Analysis (GSEA). Of these I would
stick with ORA, since GSEA needs a ranked list of genes, and (unlike with
gene expression, where you can use fold-change), this is not trivial or
even meaningful with splicing. Probably the simplest way is a web-based
tool like DAVID (
https://david.ncifcrf.gov/), or Enrichr
(
https://maayanlab.cloud/Enrichr/). DAVID does a bunch of things, so you
may take a bit figure it out, but it can take a variety of gene IDs,
including Ensembl. You can export the results as a text table (which you
can open as a spreadsheet). The disadvantage of a web-based tool is that
it can be hard to reproduce results, beacuse they might update the
underlying databases, and you have no choice, or would not even know about
it.
There are several R packages you can use. The most basic might be to
count the overlaps of your significant genes with the genes in a pathway
and the "background" (genes in the database, but not this pathway) and run
a Fisher's Exact test (same as hypergeometric test), but it might be eaier
to use one of the packages designed for the purpose. ClusterProfiler has
a lot of methods and gives a variety of graphs, so might be worth
exploring.
Whatever tool you use, there are variety of databases to run it against,
including KEGG, various GO databses, Hallmark, and many more. Which ones
you pick is a question of which have the most useful categories for your
purpose -- if you care whether a gene is e.g. a phosphatase, you might
like GO Molecular Function, but if not, skip that one; it really depends
on your questions. I would use all that you really care about, but as few
as possible (and ideally as little overlap as possible), because you need
to apply multiple-testing correction to the p-values, and the more things
you test against, the harsher that is.
If you are using R for the analysis, have a look at msigdbr, it has a good
collection of these databases with a variety of gene IDs for several model
organisms.
Good luck!
Thomas
> --
> You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
rmats-user-gro...@googlegroups.com.
> To view this discussion on the web visit
https://groups.google.com/d/msgid/rmats-user-group/eed17830-e94f-4653-9692-7e95f3d15057n%40googlegroups.com.
>