I conducted paired-end total RNA sequencing with triplicates for both wild type and drug-treated conditions. Each wild type BAM file contains ~200 million mapped reads (processed with STAR using two-pass mode, read length 65 nt), while the treatment BAM files have ~400 million mapped reads. I analyzed splicing events, focusing on exon skipping, using rMATS-turbo (comparison-1).
To assess reproducibility, I randomly down sampled the reads in each BAM file to 90%, 80%, and 50% and repeated the splicing analysis (comparison-2). I compared the skipped exons identified in comparison-1 and comparison-2 but still observed many skipped exons.
Lastly, I generated a reference annotation using wild type data with STRINGTIE, STRINGTIE merge, and GFFCOMPARE (-R -Q -M) and reanalyzed with rMATS. However, I noticed that many reads were discarded.
What could be causing these discrepancies or read loss?"
Example:
USED: 213349495
NOT_PAIRED: 0
NOT_NH_1: 0
NOT_EXPECTED_CIGAR: 2501633
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 188082831
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 24397005
CLIPPED: 0
TOTAL_FOR_BAM: 428330964
WT_1.bam
My questions:
1. is will rMATS somehow consider the depth of the library and expression level of transcript.
2. How well rMATS can handle the annotation file that was build form GFFCOMAPRE.
3. Is there any quality check that I can do? Or is there any room for the improvement.
--
You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rmats-user-gro...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/rmats-user-group/099b1c66-f0a6-4d1b-be24-ac47ef1ee7fen%40googlegroups.com.