Splicing events detected from rMATS is not in sync with JunctionSeq

22 views
Skip to first unread message

Lalith Punepalle

unread,
Feb 22, 2025, 12:18:35 AMFeb 22
to rMATS User Group
Hi,

I have a dataset of BAM files representing three conditions: CONTROL, PAN, and ADR, with each condition containing eight samples. I grouped the samples accordingly and ran rMATS to analyze splicing differences between the conditions. However, the splicing events identified by rMATS do not align with those obtained using JunctionSeq.

What could be the issue with such a variation of genes with overlapping? Am I missing something with respect to rMATS.

Overlap percentages
Gene overlap percentage for CONTROL_ADR: 42.86% (144 / 336 genes)
Gene overlap percentage for CONTROL_PAN: 6.09% (24 / 394 genes)


Process I followed:
  1. I have used rMATS turbo as described in paper.
  2. I already have aligned bam files and reference genome file.
  3. Pre step: python "$RMATS_SCRIPT" --b1 prep1.txt --gtf "$GTF_PATH" -t paired --readLength 150 --nthread 4 --od "$OUTPUT_DIR" --tmp "$TMP_OUTPUT_DIR" --task prep
  4. Post Step: python "$RMATS_SCRIPT" --b1 "$POST1_TXT" --b2 "$POST2_TXT" --gtf "$GTF_PATH" -t paired --readLength 150 --variable-read-length --nthread 4 --od "$OUTPUT_DIR" --tmp "$TMP_OUTPUT_POST" --task post
  5. I have filtered the events further based on criteria specified in paper.
  6. Statistically significant events were selected based on read coverage (≥20), PSI value filtering (0.05 < PSI < 0.95), FDR threshold (≤0.01), and between-group PSI difference (|ΔPSI| ≥0.05).
  7. Output genes have significant variation from junctionSeq
Thank you,
Lalith

kutsc...@gmail.com

unread,
Feb 24, 2025, 12:01:22 PMFeb 24
to rMATS User Group
One source of the difference could be that rMATS is looking for specific event types (SE, A5SS, A3SS, MXE, RI) but it looks like JunctionSeq is testing each exon and splice junction. There may be cases where a gene has a differential splice junction, but it doesn't fit into any of the event types that rMATS uses

You have --variable-read-length in the post step command, but not in the prep step command. The filtering based on read length is done in the prep step so you may need --variable-read-length there

You could check the [datetime]_read_outcomes_by_bam.txt file in the --tmp directory of the prep step to see if there are a lot of reads being filtered out for any reason

Eric
Reply all
Reply to author
Forward
0 new messages