sashimi plot considering the batch effect correction

157 views
Skip to first unread message

eunbi

unread,
Sep 22, 2022, 2:15:23 AM9/22/22
to rMATS User Group
Hello all, 
I'm trying to get sashimiplots using  rMATS output with batch effect correction applied.

I analyzed following to the description in the link below.

First, I ran rMATS to get inclusion and skipped counts with --statoff.
Then the counts were normalized using Combat-seq in R.
After that, I got stat result with modifying count files using --task stat code.

But, when I try to get sashimi plot, the batch effect correction does not seem to have been applied.
I think rmats2sashimiplot is based on MISO package, and the bam files, not modified count files, were used as input.

Is there any way to get sashimi plot with applied batch effect correction?

Can anybody help? 

Thanks,
Eunbi

Thomas Danhorn

unread,
Sep 26, 2022, 5:54:36 PM9/26/22
to eunbi, rMATS User Group
Hi Eunbi,

You are right, the sashimi plots are based on the information in the BAM
files, so the numbers and line thickness are in units of read counts, and
since these are actual junction-spanning events, there is no good way of
"normalizing" them to account for batch effects. (You could write your
own program to distort the BAM information used for plotting based the
counts you get from batch correcting, but people may find this hard to
interpret.)

You also have to keep in mind that a batch effect affecting gene
expression (i.e. the number of reads mapping to a gene for a particular
sample) is not the same as a batch effect affecting splicing (i.e. the
ratio between two different transcripts for a gene) -- even if there is a
batch effect with respect to gene expression, the splicing might be
unaffected, or there might be a different batch effect that influences it.
The way you are correcting the counts that go into the calculations in
rMATS may or may not be a good way to approach this. If the batch effect
only affects expression, you are just messing with the magnitude of the
counts, not their ratio, which would affect the p-value (without any good
justification), but not the PSI (IncLevel) values. I honestly don't know
if there is a tool that can detect -- let alone correct -- batch effects
affecting splicing events and disentangle them from effects on expression.
I imaging that might be difficult, given that splicing is a lot more
complex than expression.

If you have batch effects in RNA preparation, and one batch has much
higher levels of unprocessed or incompletely processed transcripts (from
ruptured nuclei; this might look like intron retention, but accross the
whole transcriptome), then you have a problem that would directly affect
your ability to detect differential splicing events. In that case I can't
think of anything you can do besides trying to catch affected samples in
QC and discard them.

If you don't know that your batch effects affect the splicing itself, my
advice would be to use the sashimi plot rMATS gives you, there is not a
lot else you can do with these plots. You could obviously create other
kinds of plots, such as bar graphs showing PSI/IncLevel values and
differences, and maybe separate plots showing the exon structures for the
event in question (without any read count information attached to it).

Good luck!

Thomas
> --
> You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rmats-user-gro...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/rmats-user-group/8f897546-8beb-43b6-8a99-44f27913c180n%40googlegroups.com.
>
Reply all
Reply to author
Forward
0 new messages