fromGTF.se.txt file columns for reverse strand

191 views
Skip to first unread message

Harshangda Karan Puri

unread,
Feb 22, 2023, 6:21:05 AM2/22/23
to rMATS User Group
Hi,

I am a bit confused about how to read upstream ES, upstream EE, downstream ES and downstream EE for the "-" strand in the fromGTF.se.txt file. Can you please have a look at the attached picture and clarify if 1, 2, 3, 4, 5, 6 correspond to upstream ES, upstream EE, exonStart_0base, exonEnd, downstream ES, and downstream EE from the fromGTF.se.txt file respectively for both forward and reverse strand? Or is it different for reverse strand?

Thanks,
Harsha
Splicing Nomenclature.html

Harshangda Karan Puri

unread,
Feb 22, 2023, 6:24:31 AM2/22/23
to rMATS User Group
Sorry, this is the picture I am referring to. Please ignore the previously attached file.

Thanks,
Harsha

splicing SE.png

kutsc...@gmail.com

unread,
Feb 22, 2023, 8:49:55 AM2/22/23
to rMATS User Group
rmats calls the exon with the lowest coordinates the upstream exon (upstreamES, upstreamEE) for both + and - strands. Similarly the exon with highest coordinates is the downstream exon (downstreamES, downstreamEE)

In the picture you attached, the forward strand has the coordinates labeled 1 through 6 in ascending order. Just based on the sort order the names for those coordinates given by rmats would be:
(1: 24962114): upstreamES
(2: 24962209): upstreamEE
(3: 24967029): exonStart_0base
(4: 24967152): exonEnd
(5: 24967932): downstreamES
(6: 24968082): downstreamEE

In the picture, the reverse strand coordinates are not in ascending order. Here's the names that rmats would give for those coordinates:
(1: 110502049): downstreamES
(2: 110502117): downstreamEE
(3: 110499535): exonStart_0base
(4: 110499546): exonEnd
(5: 110496012): upstreamES
(6: 110496203): upstreamEE

Basically rmats is only considering the numeric value of the coordinates to assign the names. It is not considering the strand

Eric

Harshangda Karan Puri

unread,
Feb 22, 2023, 10:27:19 AM2/22/23
to rMATS User Group
Hi,

Thank you very much for a prompt response. This was very helpful. So I am preparing this splicing nomenclature for MXE, RI, A5SS, and A3SS as well. It would be very kind if you could guide me with the column names that rmats would assign for the coordinates on forward and reverse strands for MXE, RI, A5SS, and A3SS files as well. I have attached the picture for each of these separately. 

I tried to do it myself, but I was not sure-
 1. What coordinates represent the start and end of first and second exon for forward and reverse strand in MXE
 2. What coordinates represent the start and end of long exon, short exon, and flanking exon for forward and reverse strands in A5SS and A3SS
 3. If the start of the exon is always the smaller coordinate and the end is always the higher coordinate


Thanks,
Harsha
A3SS.png
A5SS.png
MXE.png
RI.png

kutsc...@gmail.com

unread,
Feb 22, 2023, 2:11:49 PM2/22/23
to rMATS User Group
Yes, the smaller coordinate is always used as the start of the exon and the higher used as the end regardless of the strand. For A3SS and A5SS the column names are assigned as expected based on the names (long, short, flank) and the strand is accounted for as shown in your diagrams. For MXE and RI the column names are assigned just based on the numeric value of the coordinates without considering the strand (similar to SE events). While strand does not change the column names for MXE events, the strand does change which exon is in the "inclusion isoform". This is described in the README under "Event specific columns": https://github.com/Xinglab/rmats-turbo/tree/v4.1.2#output

For the diagrams:
A3SS
forward
(1: 57789037): flankingES
(2: 57789155): flankingEE
(3: 57791385): longExonStart_0base
(4: 57791492): shortES
(5: 57791673): shortEE, longExonEnd

reverse
(1: 135360497): flankingES
(2: 135360604): flankingEE
(3: 135353045): shortEE
(4: 135359942): longExonEnd
(5: 135352946): shortES, longExonStart_0base

A5SS
forward
(1: 34240737): shortES, longExonStart_0base
(2: 34240882): shortEE
(3: 34240915): longExonEnd
(4: 34242712): flankingES
(5: 34242795): flankingEE

reverse
(1: 34426071): shortEE, longExonEnd
(2: 34425796): longExonStart_0base
(3: 34426032): shortES
(4: 34425072): flankingES
(5: 34425221): flankingEE

MXE
forward
(1: 271866): upstreamES
(2: 271939): upstreamEE
(3: 272037): 1stExonStart_0base
(4: 272150): 1stExonEnd
(5: 272192): 2ndExonStart_0base
(6: 272305): 2ndExonEnd
(7: 275140): downstreamES
(8: 275201): downstreamEE

reverse
(1: 35685269): downstreamES
(2: 35685339): downstreamEE
(3: 35685064): 2ndExonStart_0base
(4: 35685139): 2ndExonEnd
(5: 35684732): 1stExonStart_0base
(6: 35684807): 1stExonEnd
(7: 35684488): upstreamES
(8: 35684550): upstreamEE

RI
forward
(1: 43059394): upstreamES, riExonStart_0base
(2: 43059714): upstreamEE
(3: 43062190): downstreamES
(4: 43062295): downstreamEE, riExonEnd

reverse
(1: 100376251): downstreamEE, riExonEnd
(2: 100375511): downstreamES
(3: 100375350): upstreamEE
(4: 100375283): upstreamES, riExonStart_0base

Harshangda Karan Puri

unread,
Feb 23, 2023, 5:55:15 AM2/23/23
to rMATS User Group
Thank you so much for sharing this. It was very helpful!

Cheers,
Harshangda

Xiao Lei

unread,
Oct 8, 2023, 3:21:28 PM10/8/23
to rMATS User Group
Hi, Eric,

Thank you for answering this!

I just found this out too. rMATS does not consider forward or reverse strands to call for "upstream" or "downstream" exons, it only considers the coordinates to assign "upstream" or "downstream" exons, but this is causing troubles in downstream analysis. For example in my case, I would like to intersect rMATS hits with the branchpoint annotation database, so that I only consider one exon with its upstream exon (introns between these two contain branch points of interest), for forward strand exons in the rMATS output list it is fine but for reverse strand exons I need to covert all the rMATS annotated "downstream" exons to "upstream" exons. I wonder if there is a way to get around this problem, it would be great if the rMATS developers could give users an option to assign "upstream" or "downstream" exons based on strandedness or not.

Best,
Xiao
Reply all
Reply to author
Forward
0 new messages