rMATs input file features

23 views
Skip to first unread message

Esther

unread,
May 26, 2026, 6:19:02 PMMay 26
to rMATS User Group
Hi, 
Does rMATS-turbo expect the input fastQ file to be trimmed prior to alignment?
If running STAR separately (as recommended in the paper), is it necessary to run cutadapt /other trimming softwares or some other QC step before? 

Another related question- I ran rMATS from raw fastq files, combining prep and post steps with --task-both and enabling soft clipping and variable-reads length. 
This is what I ran: 
rmats.py \
--s1 ./header/ctl.txt \
--s2 ./header/treatment.txt \
--gtf ./genome/gencode.v49.annotation.gtf \
--bi ./genome/star_index \
-t paired \
--readLength 101 \
--nthread 60 \
--od ./rmats_output/ \
--tmp ./rmats_tmp/ \
--variable-read-length \
--allow-clipping \
--task both
The JCEC output file shows some strange features - I'm trying to figure out why:
a) several splicing event types, but they refer to the same genomic coordinates (false positives).
b) When I ran it without variable reads-length it gave me empty output files. I had set the read length as 101. The average read length of my control samples is 101, and the treatment was 98. Does this have to do with the repetitive events? 
c) Events created on unannotated isoforms (I used the latest gencode gtf file for annotation). 

Or maybe, does this have to do with the rMATS default STAR aligner allowing --twopassMode Basic? Meaning, will there be more likelihood of non-annotated isoforms with this? Should it be omitted in a manual STAR alignment prior to rMATS? 

Let me know if you need me to clarify any part of these questions. Thanks!

kutsc...@gmail.com

unread,
May 27, 2026, 9:55:18 AMMay 27
to rMATS User Group
If rMATS-turbo is run on fastq files it will align them using this STAR command: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rmats.py#L66

By default, the STAR command is set to align the entire sequence for each read without clipping and rMATS would filter out any reads with clipping in the alignment. With --allow-clipping, the STAR command will be able to clip, and rMATS will not filter out those reads. Ideally the input fastq files do not include adapters or other extra sequences, but --allow-clipping should give reasonable results even with adapter sequences

a) It's possible that there are multiple splicing events at the same exon or splice site. Do you have an example?
b) Without --variable-read-length, any reads that are not exactly --readLength will be filtered out for NOT_EXPECTED_READ_LENGTH. The stdout will include a section "read outcome totals across all BAMs" showing the counts of reads filtered for different reasons. The counts are also written to a file in the --tmp directory named like [datetime]_read_outcomes_by_bam.txt
c) rMATS will use the transcripts from the --gtf as a starting point. It can then detect events novel events. It can always look for novel events by combining annotated splice sites. Those events will be reported in the fromGTF.novelJunction.[AS_Event].txt files. It looks like you didn't use --novelSS, but that would allow additional novel events that would be reported in the fromGTF.novelSpliceSite.[AS_Event].txt files

Eric

Esther

unread,
Jun 11, 2026, 10:18:37 AM (10 days ago) Jun 11
to rMATS User Group
Sorry about the late reply. 

For question a), I have attached sashimi plots of the outputs from the JCEC files for a particular gene: 
It gives 2 SE events, 2 A5SS, and two RI. I have also attached an image of the bam viewed in IGV as a ground truth. 

b) Is there a downside to using the --variable-read-lengths parameter? My understanding is that after trimming adapters, the read lengths will never be exactly uniform. 

Thanks for your help. 
RI.png
IGV_BAM.png
A5SS.png
SE.png

kutsc...@gmail.com

unread,
Jun 12, 2026, 12:39:04 PM (9 days ago) Jun 12
to rMATS User Group
Overall I think the rMATS event detection worked as intended. All of the exons seem to be in gencode v49 (although I can't tell the exact coordinates from the plots). rMATS can detect events using the annotated exons and junctions as well as junctions from the bam files. Since the read coverage is high in that region (thousands of reads for some junctions) even relatively minor transcripts can have many supporting reads

From the bam plots, it looks like there are 6 exons with substantial read coverage. In that plot the blue sashimiplot has much higher coverage for the 5th exon compared to the other sashimiplot

It looks like the second SE event corresponds to exons 4,5,6 with the treatment group showing higher inclusion and the plot seems good. The other SE event seems to be exons 3,5,6 where the only junction with good support is 5->6 in the treatment group

The two RI events look to be the introns in the exon sequence 4,5,6. There doesn't seem to be much coverage in the intron region in either group

The first A5SS event looks like it's actually exons 4,5,6 but where the longer isoform is exons 4 and 5 with the intron retained. The other A5SS event looks like it's multiple 5' splice sites for exon 1. Neither junction has many reads compared to the junctions for other events

Using --variable-read-length with trimmed reads should be fine. These posts have some discussion:
https://groups.google.com/g/rmats-user-group/c/ZCxjlQfP9ak/m/PaO_skpQAgAJ
https://groups.google.com/g/rmats-user-group/c/eKgaDfiyrAY/m/Kiry0d8gBQAJ

Eric
Reply all
Reply to author
Forward
0 new messages