rMATS works, but all events have p-value = 1

164 views
Skip to first unread message

Laura Jareño Ajona

unread,
Mar 15, 2023, 5:13:12 AM3/15/23
to rMATS User Group
Hi, 

I am trying to use rMATS to identify splicing events between glioblastoma and astrocytoma patients. Unfortunately, when rMATS finds the splicing events contained in the .fastq files all the p-values are equal to 1 and in the summary.txt file none of the events are considered as significant. That's wrong because it is known that there are significant events, but it is unknown why I get these results and where is the problem.

My code to run rMATS is as follows:

python rmats.py --s1 astrocytoma_2samples.txt --s2 glioblastoma_2samples.txt --gtf gencode.v24.primary_assembly.annotation.gtf --bi GenomeFiles --readLength 100 --nthread 8 --od results_3samples --tmp tmp_output_3samples


summary.txt:

EventType TotalEventsJC TotalEventsJCEC SignificantEventsJC SigEventsJCSample1HigherInclusion SigEventsJCSample2HigherInclusion SignificantEventsJCEC SigEventsJCECSample1HigherInclusion SigEventsJCECSample2HigherInclusion
SE 43296 44227 0 0 0 0 0 0
A5SS 3555 3568 0 0 0 0 0 0
A3SS 5551 5562 0 0 0 0 0 0
MXE 4497 4595 0 0 0 0 0 0
RI 4074 4108 0 0 0 0 0 0



output file: 

gtf: 22.59103512763977
There are 60725 distinct gene ID in the gtf file
There are 199348 distinct transcript ID in the gtf file
There are 38945 one-transcript genes in the gtf file
There are 1177678 exons in the gtf file
There are 27391 one-exon transcripts in the gtf file
There are 25233 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 3.282800
Average number of exons per transcript is 5.907649
Average number of exons per transcript excluding one-exon tx is 6.689387
Average number of gene per geneGroup is 7.633601
statistic: 0.051705121994018555

read outcome totals across all BAMs
USED: 118710051
NOT_PAIRED: 0
NOT_NH_1: 36067506
NOT_EXPECTED_CIGAR: 3036093
NOT_EXPECTED_READ_LENGTH: 116355810
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 2226908
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 218952
CLIPPED: 0
total: 276615320
outcomes by BAM written to: tmp_output_3samples/2023-03-14-16:53:37_778095_read_outcomes_by_bam.txt

novel: 200.99066352844238
The splicing graph and candidate read have been saved intotmp_output_3samples/2023-03-14-16:53:37_778095_*.rmats
save: 3.800950765609741
loadsg: 0.1452782154083252

==========
Done processing each gene from dictionary to compile AS events
Found 64598 exon skipping events
Found 5519 exon MX events
Found 14493 alt SS events
There are 8866 alt 3 SS events and 5627 alt 5 SS events.
Found 5967 RI events
==========

ase: 1.6339361667633057
count: 5.2813334465026855
Processing count files.
Done processing count files.


kutsc...@gmail.com

unread,
Mar 15, 2023, 9:41:46 AM3/15/23
to rMATS User Group
About 40% of the reads are being filtered out for not matching --readLength 100:
NOT_EXPECTED_READ_LENGTH: 116355810

You could try running with --variable-read-length to allow using those reads. With more reads you may get some significant events

Since the run detected some events, but no significant events maybe the issue is only with 1 of the sample groups (--b1 or --b2). If you look at the counts in the files like SE.MATS.JC.txt it may show that one of the sample groups has low read counts

Eric

Laura Jareño Ajona

unread,
Mar 15, 2023, 10:53:36 AM3/15/23
to rMATS User Group
I did a test using few samples and by adding --variable-read-length I obtained significant events and results.

Thank you very much for your help.

Laura
Reply all
Reply to author
Forward
0 new messages