Novel exon inclusion

196 views
Skip to first unread message

Bio Comp

unread,
Aug 6, 2020, 12:50:47 AM8/6/20
to rMATS User Group
Hello, I'm KyoungJun who is new to rMATS.
As I know of, I can find novel(novel from my GTF) exon skipping event from 'SE.MATS.JC.txt' by filtering the ID from 'fromGTF*.txt' file.
Then I just wonder if I can find 'novel exon inclusion' events  from rMATS's  results as well.

I always thanks to your kind advices and help.
Thank you.

KyounJun

Eric Kutschera

unread,
Aug 6, 2020, 9:17:05 AM8/6/20
to rMATS User Group
The events in fromGTF.novelJunction.SE.txt and fromGTF.novelSpliceSite.SE.txt are exon skipping events which could not be detected using the GTF alone. For rMATS to detect the event it needs to find 3 junctions: left->middle, middle->right, left->right. If any of those junctions were not in the GTF then the event is novel. The rMATS output does not indicate which of those junctions were novel, just that at least 1 is novel. By filtering to events from the fromGTF.novel* files you can potentially get both novel exon skipping events and novel exon inclusion events

Here's the code that sets whether the event is from the GTF alone or that the data from a BAM file was needed. If any of the left, right, or skipping junction is novel then the event is novel: https://github.com/Xinglab/rmats-turbo/blob/ea1c9392123cd7f65ed9f00173d641c77048f5fe/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L1341

The logic for novelSpliceSite events is similar. If the event was not detected until novel splice sites were considered then it is considered novel regardless of which junction required the novel splice site

Eric

Bio Comp

unread,
Aug 6, 2020, 10:50:31 AM8/6/20
to rMATS User Group
Dear Eric

Thank you for your help and advices. I understand what novel exactly means.
But still I have unsolved question.
If I want to see novel exon skipping events, which only occur in the case group, then I should find the events like below;
SJC_Control = 0,0,0 but SJC_Case = 17, 14, 18, 19, 15, 15 from my 'SE.MATS.JC.txt' which is filtered by my fromGTF.novel*.txt. 

ID      GeneID  geneSymbol      chr     strand  exonStart_0base exonEnd upstreamES      upstreamEE      downstreamES    downstreamEE    ID      IJC_SAMPLE_1    SJC_SAMPLE_1    IJC_SAMPLE_2    SJC_SAMPLE_2    IncFormLen      SkipFormLen  
113072  "ENSG00000137817.17"    "PARP6" chr15   -       72250844        72250954        72242619        72242699        72251206        72251255        113072  68,49,56        0,0,0   79,45,47,39,34,55       17,14,18,19,15,15     200     100     


Then if I want to see novel exon inclusion events, which only occur in the case group, then I should find the events like below;
IJC_Control = 0,0,0 but IJC_Case = 17, 14, 18, 19, 15, 15 from my 'SE.MATS.JC.txt' which is filtered by my fromGTF.novel*.txt. 

ID      GeneID  geneSymbol      chr     strand  exonStart_0base exonEnd upstreamES      upstreamEE      downstreamES    downstreamEE    ID      IJC_SAMPLE_1    SJC_SAMPLE_1    IJC_SAMPLE_2    SJC_SAMPLE_2    IncFormLen      SkipFormLen  
113072  "ENSG00000137817.17"    "PARP6" chr15   -       72250844        72250954        72242619        72242699        72251206        72251255        113072      0,0,0         68,49,56         17,14,18,19,15,15     79,45,47,39,34,55      200     100     


Am I correct?
Always thanks to your help and advices.

Thank you
KyoungJun

2020년 8월 6일 목요일 오후 10시 17분 5초 UTC+9, Eric Kutschera 님의 말:

Eric Kutschera

unread,
Aug 6, 2020, 11:51:51 AM8/6/20
to rMATS User Group
Your criteria for identifying those two types of events seems reasonable to me, but the rMATS output does not explicitly say which junctions were in the GTF file or how the two inclusion junctions (left->middle, middle->right) contribute to the IJC columns. The junction counts (IJC, SJC) are defined like in this: https://github.com/Xinglab/rmats-turbo/tree/8520f7df122b1690efbf836ec3ce63512a0cbd27#output

By combining the definition of novel that I explained above with the junction counts, you are finding events that required information from the BAM files to be detected and for which certain junctions only have supporting reads in your case data, but no supporting reads in your control data. That seems like a reasonable definition to me, but just be sure that you don't actually require a more strict definition. It could be that in your first example (with ID 113072) that the skipping junction (left->right) was in the GTF file and actually the left->middle junction is the junction that was missing from the GTF. That left->middle junction may be in both the case and control BAMs or maybe just in one of them since the IJC counts can be from some combination of left->middle and middle->right. That ID=113072 example is a novel event (not in the GTF) and the skipping junction is not supported by reads in the control data, but I just wanted to show that there is some ambiguity about which of the three junctions are supported in the GTF, control, and case

Eric

Bio Comp

unread,
Aug 7, 2020, 1:46:20 AM8/7/20
to rMATS User Group
Dear eric.

I really thanks to your considerate reply.
Now I understood and realized that my thought was too naive.
  
Thank you.

KyounJun 


2020년 8월 7일 금요일 오전 12시 51분 51초 UTC+9, Eric Kutschera 님의 말:
Reply all
Reply to author
Forward
0 new messages