estimated duplicated reads are more than total reads

24 views
Skip to first unread message

吳德倫(Soappp)

unread,
Oct 20, 2022, 10:02:23 PM10/20/22
to QualiMap
Hello!

I used a artifact DNA sequence (282 bp) as sequencing sample to test the library preparation protocol,and used qualimap bamqc (2.2.2d) to assess the sequencing result. I found some weird statistic result like estimated duplicated reads are more than total reads, and duplication rate unequal with duplicated reads proportion. So i would like to know:
1. How does qualimap estimate duplicated reads and duplication rate? Why this incorrect value appear?
2. What is the definition difference between "number of mapped bases" and "number of aligned bases" in genome_results.txt, do they mean different things?
096B797B-FEA2-4dcf-B223-96103ACBEABF.png

Konstantin Okonechnikov

unread,
Oct 31, 2022, 12:35:55 PM10/31/22
to qual...@googlegroups.com
Hi,

1) The duplicated reads are estimated from counting repeating read alignment start sites. However, if the alignments are supplementary, the total amount of reads is not increased, because of this most likely it shows > 100% if these supplementary still have the same start position. What does the "supplementary" mean here? It typically depends on alignment/assembly tool.  I could take a look at the BAM file (or subsample from it) to check this.  Duplication rate is computed from the histogram, more details here: https://groups.google.com/g/qualimap/c/7PaVMFQ_48M/m/mHoANQiIAgAJ 

2) "Number of aligned bases" is out-date, could be ignored. Tnx for the note, should actually remove it from the report.

Best regards,
   Konstantin 

--
You received this message because you are subscribed to the Google Groups "QualiMap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qualimap+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qualimap/779c4cc3-e6c8-410e-ba76-2ef39b600c0en%40googlegroups.com.

吳德倫(Soappp)

unread,
Nov 1, 2022, 6:17:06 AM11/1/22
to QualiMap
Hi,  Konstantin
Like you saying, I found many supplementary alignments distributed in bam, and I used an extremely short sequence as ref so there have a tons of reads start at a same position.
This is a very unique situation so I think I can just ignore this value.
Thank you!
Konstantin Okonechnikov 在 2022年11月1日 星期二凌晨12:35:55 [UTC+8] 的信中寫道:
Reply all
Reply to author
Forward
0 new messages