geneBody_coverage.py - calculation details and interpreting the output

308 views
Skip to first unread message

Darius Khan

unread,
Jun 14, 2015, 3:43:46 PM6/14/15
to rseqc-...@googlegroups.com
Hello! I would like to know some of the details of how the script calculates the coverage. In the RSeQC documentation it is shown that the "coverage signals" from the input BAM file are normalized. How exactly is this normalization carried out? How should the Y-axis of the output image be interpreted? Is it a relative scale, and if so, relative to what? Also, is the coverage calculated using only unique reads or all mapped reads? With geneBody_coverage.py I'm getting a slightly 3'-skewed distribution for my data, but with the software qualimap the distribution is heavily 5'-skewed. I'm wondering what could be the reason for this discrepancy.

Thank in advance.

Darius Khan

unread,
Jun 14, 2015, 4:05:50 PM6/14/15
to rseqc-...@googlegroups.com
Almost forgot: does the script take into account the strandedness of the reads, or is every read in a transcript region used in calculating the coverage?

Liguo Wang

unread,
Jun 15, 2015, 10:36:47 AM6/15/15
to rseqc-...@googlegroups.com
How exactly is this normalization carried out?
See attachment.


How should the Y-axis of the output image be interpreted?
This is relative coverage. the lowest coverage is 0 and the highest is 1


Is it a relative scale, and if so, relative to what?
relative to the positions that have lowest coverage.


Also, is the coverage calculated using only unique reads or all mapped reads?
All reads are used. However, we filter out: qcfail, secondary alignments, duplicate reads

I'm getting a slightly 3'-skewed distribution for my data, but with the software qualimap the distribution is heavily 5'-skewed. I'm wondering what could be the reason for this discrepancy.
You better visualize the you RNA-seq raw data to figure out which one is correct. Also make sure the two software used the same reference gene model.

This module does NOT consider the strandness of reads.

Liguo

On Sun, Jun 14, 2015 at 2:43 PM, Darius Khan <khand...@gmail.com> wrote:
Hello! I would like to know some of the details of how the script calculates the coverage. In the RSeQC documentation it is shown that the "coverage signals" from the input BAM file are normalized. How exactly is this normalization carried out? How should the Y-axis of the output image be interpreted? Is it a relative scale, and if so, relative to what? Also, is the coverage calculated using only unique reads or all mapped reads? With geneBody_coverage.py I'm getting a slightly 3'-skewed distribution for my data, but with the software qualimap the distribution is heavily 5'-skewed. I'm wondering what could be the reason for this discrepancy.

Thank in advance.

--
You received this message because you are subscribed to the Google Groups "rseqc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rseqc-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Determine reads coverage profile.docx
Reply all
Reply to author
Forward
0 new messages