rMATs input file features

9 views
Skip to first unread message

Esther

unread,
May 26, 2026, 6:19:02 PM (6 days ago) May 26
to rMATS User Group
Hi, 
Does rMATS-turbo expect the input fastQ file to be trimmed prior to alignment?
If running STAR separately (as recommended in the paper), is it necessary to run cutadapt /other trimming softwares or some other QC step before? 

Another related question- I ran rMATS from raw fastq files, combining prep and post steps with --task-both and enabling soft clipping and variable-reads length. 
This is what I ran: 
rmats.py \
--s1 ./header/ctl.txt \
--s2 ./header/treatment.txt \
--gtf ./genome/gencode.v49.annotation.gtf \
--bi ./genome/star_index \
-t paired \
--readLength 101 \
--nthread 60 \
--od ./rmats_output/ \
--tmp ./rmats_tmp/ \
--variable-read-length \
--allow-clipping \
--task both
The JCEC output file shows some strange features - I'm trying to figure out why:
a) several splicing event types, but they refer to the same genomic coordinates (false positives).
b) When I ran it without variable reads-length it gave me empty output files. I had set the read length as 101. The average read length of my control samples is 101, and the treatment was 98. Does this have to do with the repetitive events? 
c) Events created on unannotated isoforms (I used the latest gencode gtf file for annotation). 

Or maybe, does this have to do with the rMATS default STAR aligner allowing --twopassMode Basic? Meaning, will there be more likelihood of non-annotated isoforms with this? Should it be omitted in a manual STAR alignment prior to rMATS? 

Let me know if you need me to clarify any part of these questions. Thanks!

kutsc...@gmail.com

unread,
May 27, 2026, 9:55:18 AM (5 days ago) May 27
to rMATS User Group
If rMATS-turbo is run on fastq files it will align them using this STAR command: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rmats.py#L66

By default, the STAR command is set to align the entire sequence for each read without clipping and rMATS would filter out any reads with clipping in the alignment. With --allow-clipping, the STAR command will be able to clip, and rMATS will not filter out those reads. Ideally the input fastq files do not include adapters or other extra sequences, but --allow-clipping should give reasonable results even with adapter sequences

a) It's possible that there are multiple splicing events at the same exon or splice site. Do you have an example?
b) Without --variable-read-length, any reads that are not exactly --readLength will be filtered out for NOT_EXPECTED_READ_LENGTH. The stdout will include a section "read outcome totals across all BAMs" showing the counts of reads filtered for different reasons. The counts are also written to a file in the --tmp directory named like [datetime]_read_outcomes_by_bam.txt
c) rMATS will use the transcripts from the --gtf as a starting point. It can then detect events novel events. It can always look for novel events by combining annotated splice sites. Those events will be reported in the fromGTF.novelJunction.[AS_Event].txt files. It looks like you didn't use --novelSS, but that would allow additional novel events that would be reported in the fromGTF.novelSpliceSite.[AS_Event].txt files

Eric
Reply all
Reply to author
Forward
0 new messages