Weird behaviour of rmats 4.0.2 when running on one sample

52 views
Skip to first unread message

fgy...@gmail.com

unread,
Sep 8, 2020, 10:34:50 AM9/8/20
to rMATS User Group
Hi

I want to get PSI for one sample so I run the comparison of the sample with itself. In principle, nothing should be reported as significant, but this is what I get in the log file.

gtf: 4.001680612564087
There are 46903 distinct gene ID in the gtf file
There are 61441 distinct transcript ID in the gtf file
There are 40211 one-transcript genes in the gtf file
There are 273461 exons in the gtf file
There are 26343 one-exon transcripts in the gtf file
There are 25328 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 1.309959
Average number of exons per transcript is 4.450790
Average number of exons per transcript excluding one-exon tx is 7.040800
Average number of gene per geneGroup is 5.651824
statistic: 0.019280433654785156
novel: 80.17390561103821
The splicing graph and candidate read have been saved into sample_18/rmats/2020-09-08-16:26:29_250633.rmats
save: 0.6370856761932373
WARNING: there are redundant temporary files.
loadsg: 0.016768455505371094

==========
Done processing each gene from dictionary to compile AS events
Found 1356 exon skipping events
Found 104 exon MX events
Found 1535 alt SS events
There are 917 alt 3 SS events and 618 alt 5 SS events.
Found 532 RI events
==========

ase: 0.43750786781311035
count: 1.0025913715362549
Processing count files.
Done processing count files.

In the event files (e.g. SE.MATS.JCEC.txt) I don't get any event as significant, but it's still not clear to me why it reports in the log splicing events.

Thank you in advance for the clarification

Best
Foivos

Eric Kutschera

unread,
Sep 8, 2020, 11:36:37 AM9/8/20
to rMATS User Group
If you can use rmats 4.1.0 then you could run with `--statoff` with only a single sample to calculate PSI for that sample

For 4.0.2 there is a bug so that `--statoff` with a single sample will not work. I think you can compare a sample to itself, but you may need to use a copy of the input file. Your output has:
WARNING: there are redundant temporary files.

rmats uses the path of the input bam file to read/write some temporary data. Using the same input file more than once can corrupt that data. If you copy your input bam so that there are two distinct files (but with the same contents) then rmats should not have an issue with the temporary data

You asked: "why it reports in the log splicing events"

Do you mean these lines?:

Found 1356 exon skipping events
Found 104 exon MX events
Found 1535 alt SS events
There are 917 alt 3 SS events and 618 alt 5 SS events.
Found 532 RI events

Those are all the events that rmats detected as possible based on the gtf and read data. Only those events with supporting reads will end up in the MATS output files

Eric

fgy...@gmail.com

unread,
Sep 8, 2020, 1:00:45 PM9/8/20
to rMATS User Group
Hi Eric

Thank you for your response. Sorry, indeed I was using version 4.1. So this is the log file for this version. I just tried with a copy of the original file and with --statoff and it the warning goes away. Thanks a lot.

Yes, I was referring to these lines. So this means that with the gtf file that I provide I have only a few events (1356+104+1535+532). Is this correct?

Best
Foivos

Eric Kutschera

unread,
Sep 8, 2020, 1:16:49 PM9/8/20
to rMATS User Group
Yes that is correct. Those are all the events that rmats detected. You can look at the fromGTF.* files to determine which events were detected using only the gtf and which were based on the bam files. Any event that is only in the fromGTF.[AS_Event].txt file (not the fromGTF.novel* files) was detected based on the gtf without looking at the bam files

If you use a different gtf file then rmats will likely detect a different set of events

Eric
Reply all
Reply to author
Forward
0 new messages