Duplicated reads (estimated) vs Duplication rate in bamqc report

63 views
Skip to first unread message

Jason

unread,
Aug 29, 2016, 7:47:12 PM8/29/16
to QualiMap
Hi all,

I'm new to use Qualimap for RNA-seq data QC and I run it and I'm confused by the difference between Duplicated reads (estimated) and Duplication rate. How are each term defined and calculated?

In particular, how is the duplicated reads estimated? And why is the duplication rate different from the number of Duplicated reads (estimated) divided by number of reads?

Thank you.

Jason

Konstantin Okonechnikov

unread,
Aug 31, 2016, 3:53:49 PM8/31/16
to qual...@googlegroups.com
Hi Jason,

the duplicated reads are detected  by counting how many positions in the genome have exactly 1,2,3, etc reads starting from it. Alignment is considered as duplicate if there was already another alignment starting from the same position detected. The "Duplication Rate Histogram" plot demonstrates the general overview.  

The duplication rate value is estimated as 

Dup Rate = 1 - USP, 

where USP is a proportion of genomic positions having exactly one read starting from it to all the genomic positions that have "at least" one read starting from it. 

--
  Konstantin






--
You received this message because you are subscribed to the Google Groups "QualiMap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qualimap+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages