Single-end firststrand data and rMATS 4.1.1

302 views
Skip to first unread message

christin...@gmail.com

unread,
May 27, 2021, 10:01:18 PM5/27/21
to rMATS User Group
Hi rMATS folks, thanks for doing what you do!

I am getting some confusing output from rMATS 4.1.1. When I use a dataset that is single-end, stranded data (fr-firststrand), I am only seeing results from the + strand in the *.MATS.JCEC.txt files. I'm using bam files as input and there is definitely coverage on both strands. My command and the output summary information are pasted below - there are nearly as many "NOT_EXPECTED_STRAND" reads as "USED" reads. 

On the other hand, for my paired-end stranded datasets, everything looks right. So I'm not sure what's going on.

Thank you!

-Tina


rmats.py --gtf /lustre/project/erik/genomes/gtf/hg38_plus_Akata_inverted.gtf --b1 test_bam.txt --b2 control_bam.txt -t single --readLength 75 --variable-read-length --libType fr-firststrand --nthread 20 --od rMATS_output --tmp rMATS_temp




gtf: 27.9605393409729

There are 58378 distinct gene ID in the gtf file

There are 200378 distinct transcript ID in the gtf file

There are 36221 one-transcript genes in the gtf file

There are 1199888 exons in the gtf file

There are 25109 one-exon transcripts in the gtf file

There are 22601 one-transcript genes with only one exon in the transcript

Average number of transcripts per gene is 3.432423

Average number of exons per transcript is 5.988122

Average number of exons per transcript excluding one-exon tx is 6.702720

Average number of gene per geneGroup is 7.713697

statistic: 0.02234959602355957


read outcome totals across all BAMs

USED: 80019424

NOT_PAIRED: 0

NOT_NH_1: 57126113

NOT_EXPECTED_CIGAR: 599319

NOT_EXPECTED_READ_LENGTH: 0

NOT_EXPECTED_STRAND: 75150301

EXON_NOT_MATCHED_TO_ANNOTATION: 3117659

JUNCTION_NOT_MATCHED_TO_ANNOTATION: 266420

CLIPPED: 0

total: 216279236

outcomes by BAM written to: rMATS_temp/2021-05-27-16:39:07_101887_read_outcomes_by_bam.txt


novel: 200.16409182548523

The splicing graph and candidate read have been saved into rMATS_temp/2021-05-27-16:39:07_101887_*.rmats

save: 3.0275163650512695

loadsg: 0.18720602989196777


==========

Done processing each gene from dictionary to compile AS events

Found 54075 exon skipping events

Found 4634 exon MX events

Found 14215 alt SS events

There are 8669 alt 3 SS events and 5546 alt 5 SS events.

Found 5930 RI events

==========


ase: 2.071687698364258

count: 5.747217655181885

Processing count files.

Done processing count files.

Thomas Danhorn

unread,
May 28, 2021, 10:57:58 AM5/28/21
to christin...@gmail.com, rMATS User Group
Have you looked at your BAM files in IGV (or some other viewer)? If about
half of your reads are on the "wrong" strand, it looks like the library
prep protocol is not strand-specific, like you expect. Look at one exon
of one gene (that is reasonably well expressed) and see if the reads all
go in the same direction. (If you can't use IGV for some reason, you can
count the reads on each strand overlapping the exon's coordinates with
samtools view.)
If it turns out your prep is not stranded, you can run rMATS unstranded.

Good luck!

Thomas
> --
> You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rmats-user-gro...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/rmats-user-group/f5666195-c8b0-4abb-8a20-47ef7906a1bfn%40googlegroups.com.
>

kutsc...@gmail.com

unread,
May 28, 2021, 11:32:19 AM5/28/21
to rMATS User Group
I do think that there might be a bug in rMATS related to single-end stranded data. Here's part of the code that decides which strand a read should count toward: https://github.com/Xinglab/rmats-turbo/blob/v4.1.1/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L599

That line of code is reached only if rMATS is run with "-t single", but it's checking to see which strand the other read in the pair was aligned to. That seems wrong since there is no "other read in the pair" for single-end data

I'll look into implementing a fix for this

Eric

christin...@gmail.com

unread,
May 28, 2021, 12:01:00 PM5/28/21
to rMATS User Group
Thank you! Yes, this data is definitely strand-specific. And actually when I ran it using fr-firststrand with a previous version of rMATS (3.1.0) I got about twice as many results, including events on the - strand. Attached is an image of the coverage at the BCLAF1 gene, including a track showing a couple of SE events that are detected by rMATS 3.1.0 but not 4.1.1.

Likewise I do get about twice as many results when I run rMATS 4.1.1 on this data using fr-unstranded, but I assume that I am throwing away information by treating strand-specific data as unstranded.

Thank you for looking into this!

Tina

kutsc...@gmail.com

unread,
Jun 2, 2021, 1:36:02 PM6/2/21
to rMATS User Group
Here's a code change for rmats that I think fixes the issue: https://github.com/Xinglab/rmats-turbo/pull/126

If you are able to build and run from the source code, then you could try out that change to see if the new code produces expected results

Eric

Colin Lolos

unread,
Aug 2, 2021, 5:29:31 PM8/2/21
to rMATS User Group
Hi,

I use rMATs in a pipeline and I have spotted the same issue reported in this post.

I have single-end reads and half of bam sequences fall into the NOT_EXPECTED_STRAND category. I have modified the files you commit to github in reponse, but I have the same results. May I do something else to improve that?

Here is my command :

$ python rmats.py --b1 ../b1.txt --b2 ../b2.txt --gtf ../../files/file.gtf -t single --variable-read-length --allow-clipping --readLength 74 --nthread 16 --libType fr-firststrand --od ../rMATsResults/FE_CTRL --tmp ../TMP/FE_CTRL

Thank you in advance for your time,
Sincerely,

kutsc...@gmail.com

unread,
Aug 3, 2021, 8:36:28 AM8/3/21
to rMATS User Group
After modifying rMATS_pipeline/rmatspipeline/rmatspipeline.pyx with these changes: https://github.com/Xinglab/rmats-turbo/pull/126/files#diff-c18340f7c14ac230a459d63c355658e92455261119e4e407770432860418bfe2 rmats needs to be recompiled. Here are the build instructions: https://github.com/Xinglab/rmats-turbo/tree/kutscherae-single-end-strand#build. Did you run ./build_rmats after making the change and were there any errors?

Eric

Colin Lolos

unread,
Aug 3, 2021, 12:20:49 PM8/3/21
to rMATS User Group
Indeed, I just needed to rebuild! I thank you so much, I am still a newbie in this world,
Sincerely,

Reply all
Reply to author
Forward
0 new messages