weird gene body coverage

83 views
Skip to first unread message

Dejian

unread,
Jun 30, 2014, 3:59:39 PM6/30/14
to rseqc-...@googlegroups.com, wangl...@gmail.com

Hi,

I got weird gene body coverage figures from geneBody_coverage.py and geneBody_coverage2.py when I tested these two functions.

To save time I ran the analysis on one gene model with high coverage (see igv.png with coverage ~13.5 million). The gene model was downloaded from http://dldcc-web.brc.bcm.edu/lilab/liguow/RSeQC/dat/hg19_Ensembl.bed.gz listed on RSeQC website. I used only one gene (ENST00000383925) from the bed file. It is as follows:

chr1    16840616        16840780        ENST00000383925 0       -       16840780        16840780        0       1       164,    0,

I found there was a valley in the middle of the gene body coverage (see attached figure ENST00000383925.png). But when I checked this gene in IGV, I found there was a peak in the middle of the gene (see attached figure igv.png), sharply different from RSeQC results. I tried version 2.3.7 and 2.3.9 and got similar results. Thanks!

Best,
Dejian
ENST00000383925.png
igv.png

Liguo Wang

unread,
Jun 30, 2014, 5:07:31 PM6/30/14
to Dejian, rseqc-...@googlegroups.com
first, the gene ENST00000383925 is about 165 nt, you'd better to zoom in to the gene body to check the coverage (current displaying window is too large, some small valley may not be visible).

second, in order to overlay signals  from thousands of genes of different length.   RSEQC takes 100 points (5'-->3') from each gene. The coverage profile may be not comparable to the raw coverage (wig. .bedgraph, .tdf, etc).

third, the middle part of this gene has bad mappability (accroding to Duke Uniq 35 track), is it possible that some reads were filter out if you used geneBody_coverage2.py?

-Liguo

Dejian Zhao

unread,
Jun 30, 2014, 9:01:11 PM6/30/14
to rseqc-...@googlegroups.com, Liguo Wang
Hi Liguo,

Thanks for your quick response. Your first explanation is right. I checked the bam file, instead of TDF file, and found that there was a valley. When I generated the TDF file, I used the default value for window size (25bp). Since the valley is less than 20bp, it is bridged through averaging the depth in the 25-bp windows.

-Dejian
Reply all
Reply to author
Forward
0 new messages