total tags vs assigned tags read_distribution.py

436 views
Skip to first unread message

Ido Tamir

unread,
May 13, 2014, 8:59:55 AM5/13/14
to rseqc-...@googlegroups.com
Hi,
I have a discrepancy between the number of assigned tags in the table and the number of assigned tags in the header:

When I sum up all the tags per group I get more than the total number of tags that should be there.

46584838 vs 46335157.
I have ~ 2 Million more tags assiged than in the header.
I have this always.


Total Reads                   37025088
Total Tags                    46335157
Total Assigned Tags           44736858
=====================================================================
Group               Total_bases         Tag_count           Tags/Kb
CDS_Exons           38578674            32214471            835.03
5'UTR_Exons         4463879             918229              205.70
3'UTR_Exons         20269750            5350331             263.96
Introns             895819440           4895899             5.47
TSS_up_1kb          24974943            112231              4.49
TSS_up_5kb          112852101           203210              1.80
TSS_up_10kb         203512119           258228              1.27
TES_down_1kb        25182568            500949              19.89
TES_down_5kb        108420768           1031590             9.51
TES_down_10kb       191897066           1099700             5.73
=====================================================================

thank you very much,
ido

Liguo Wang

unread,
May 13, 2014, 10:53:52 AM5/13/14
to rseqc-...@googlegroups.com
Hi Ido,
First these "groups" are not mutually exclusive. TSS_up_1kb is part of  TSS_up_5kb,  TSS_up_5kb is part of TSS_up_10kb, etc
Second, there are many reads mapped but not were counted in these groups, for example, reads mapped to TSS_up_100Kb.

Therefore, it's expected to see the discrepancy.

Liguo


--
You received this message because you are subscribed to the Google Groups "rseqc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rseqc-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ido Tamir

unread,
May 13, 2014, 11:06:42 AM5/13/14
to rseqc-...@googlegroups.com
yes you even write it in the documentation:
Therefore, “Total Assigned Tags” = CDS_Exons + 5’UTR_Exons + 3’UTR_Exons + Introns + TSS_up_10kb + TES_down_10kb.


thank you very much,
ido

Amit Singh

unread,
Jan 22, 2016, 6:20:48 AM1/22/16
to rseqc-discuss
Hi Liguo
 I Have one question related to read distribution. I used this command "read_distribution.py -i  mouse1.bam -r  GRCm38_mm10_RefSeq.bed" the output is too strange which is in below. I used tophat for alignment and I got this bam file. This is paired end bam file. Please let me know where is the error? 


Total Reads                   16742608

Total Tags                    20133050

Total Assigned Tags           0

=====================================================================

Group               Total_bases         Tag_count           Tags/Kb             

CDS_Exons           33981266            0                   0.00              

5'UTR_Exons         5891301             0                   0.00              

3'UTR_Exons         24156272            0                   0.00              

Introns             912590034           0                   0.00              

TSS_up_1kb          19192074            0                   0.00              

TSS_up_5kb          87321041            0                   0.00              

TSS_up_10kb         158733632           0                   0.00              

TES_down_1kb        19309894            0                   0.00              

TES_down_5kb        82904126            0                   0.00              

TES_down_10kb       147076274           0                   0.00              

=====================================================================

Thank you very much. 

Raegds

Amit 

Liguo Wang

unread,
Jan 22, 2016, 10:07:53 AM1/22/16
to rseqc-...@googlegroups.com
Hi,
please make sure your BAM file and BED file have the same Chromosome IDs (chr1, chr2, ...). Let me know if this fix the problem. Thanks

Liguo

john.al...@sheffield.ac.uk

unread,
May 4, 2017, 10:46:40 AM5/4/17
to rseqc-discuss
Hiya,

  I've run into the same problem.

Total Reads                   53536146
Total Tags                    69334788

Total Assigned Tags           0
=====================================================================
Group               Total_bases         Tag_count           Tags/Kb            
CDS_Exons           36747178            0                   0.00             
5'UTR_Exons         15904155            0                   0.00             
3'UTR_Exons         38399562            0                   0.00             
Introns             1266130785          0                   0.00             
TSS_up_1kb          20020387            0                   0.00             
TSS_up_5kb          89078077            0                   0.00             
TSS_up_10kb         162561801           0                   0.00             
TES_down_1kb        20872855            0                   0.00             
TES_down_5kb        88857692            0                   0.00             
TES_down_10kb       157692119           0                   0.00             
=====================================================================

I did have 'chr' missing in my bam file- so I tried running it by grepping it out -the output was the same.

echo 'executing: read_distribution.py' ; read_distribution.py --input-file $BAM --refgene ~/PROJECTS/RNASEQ/references/RSeQC/hg19_UCSC_knownGene_chr.bed > $PREFIX.read_distribution.txt


Please let me know if im going about his the right way.

cheers,
J
Reply all
Reply to author
Forward
0 new messages