Hi,
Thanks for the great tool. I have 3 questions related:
1. Does it make sense to use rMATS to compare hundreds of samples? For example, I compare 374 tumor samples as the test vs 50 normal samples as the control from TCGA LIHC. Can I expect any significant events in the output?
2. For the above comparison, alignment finishes successfully as it is sequential. However, calculation step takes forever with very high usage of memory (> 125 GB). Either I have to kill the script or it goes to state "D" and gets stuck there, hence I never get results. Any suggestion how to overcome this issue?
3. The summary output is as below. What does negative numbers mean? Does the summary statistics overall look ok?
Thanks a lot.
gtf: 7.330986976623535
There are 28207 distinct gene ID in the gtf file
There are 74263 distinct transcript ID in the gtf file
There are 14973 one-transcript genes in the gtf file
There are 752460 exons in the gtf file
There are 5281 one-exon transcripts in the gtf file
There are 4134 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 2.632786
Average number of exons per transcript is 10.132367
Average number of exons per transcript excluding one-exon tx is 10.831507
Average number of gene per geneGroup is 41.116693
statistic: 0.018664121627807617
read outcome totals across all BAMs
USED: -1111027985
NOT_PAIRED: 0
NOT_NH_1: -1751379762
NOT_EXPECTED_CIGAR: 143890078
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 26468822
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 18109667
CLIPPED: 0
total: 1621028116
outcomes by BAM written to: rMATS/temp/2021-06-03-13:28:59_099530_read_outcomes_by_bam.txt
novel: 37883.14090299606