Why do I loose 36.8% of reads while merging reads?

18 views
Skip to first unread message

Abdullahi Muhammad

unread,
Mar 17, 2022, 10:23:11 AM3/17/22
to Microbiome Helper
Hello everyone,
Before running PEAR for read merging, I used fastqc to visualized the quality of the reads before proceeding with read stitching. After stitching the reads I observed a decrease in the total number of sequence in the stitched reads com[pared to the unstitched read as shown in the pictures. The pear reports revealed that 36.8% reads were discarded. My questions are (1.) why were those reads discarded? is it because they failed to meet the quality score threshold or because they are uncalled bases (N)?. **(2)**Could it be that the discarded reads were responsible for the loss in reads and why is this happening?. Using PEAR, I used the default parameters in merging the forward and reverse reads. I look forward to your marvellous answers
2022-03-17.png
2022-03-17 (2).png
2022-03-17 (1).png

Andre Comeau

unread,
Mar 21, 2022, 4:03:53 PM3/21/22
to Microbiome Helper
If you look at the pear_full_log.txt, you should be able to see more extensive info that will look like the below:

Forward reads file.................: raw_data/06012020-EXP-MK-NM-1-TM-T0-BT-00_S293_L001_R1_001.fastq.gz
Reverse reads file.................: raw_data/06012020-EXP-MK-NM-1-TM-T0-BT-00_S293_L001_R2_001.fastq.gz
PHRED..............................: 33
Using empirical frequencies........: YES
Statistical method.................: OES
Maximum assembly length............: 999999
Minimum assembly length............: 50
p-value............................: 0.010000
Quality score threshold (trimming).: 0
Minimum read size after trimming...: 1
Maximal ratio of uncalled bases....: 1.000000
Minimum overlap....................: 10
Scoring method.....................: Scaled score
Threads............................: 40

Allocating memory..................: 200,000,000 bytes
Computing empirical frequencies....: DONE
  A: 0.236613
  C: 0.254660
  G: 0.246537
  T: 0.262190
  112618 uncalled bases
Assemblying reads: 0%
Assemblying reads: 100%

Assembled reads ...................: 276,910 / 296,140 (93.506%)
Discarded reads ...................: 0 / 296,140 (0.000%)
Not assembled reads ...............: 19,230 / 296,140 (6.494%)
Assembled reads file...............: stitched_reads//06012020-EXP-MK-NM-1-TM-T0-BT-00_S293_L001.assembled.fastq
Discarded reads file...............: stitched_reads//06012020-EXP-MK-NM-1-TM-T0-BT-00_S293_L001.discarded.fastq
Unassembled forward reads file.....: stitched_reads//06012020-EXP-MK-NM-1-TM-T0-BT-00_S293_L001.unassembled.forward.fastq
Unassembled reverse reads file.....: stitched_reads//06012020-EXP-MK-NM-1-TM-T0-BT-00_S293_L001.unassembled.reverse.fastq

...so there is a section there that will indicate if you have a lot of Ns in your sequences. However, if you are using the PEAR defaults, the above also highlighted section/option of "max ratio of uncalled bases = 1.0" means you would be allowing any number of Ns in the reads and they would be accepted. Could you check that yours matches all the default options above? Your one Q-score profile you show looks very good for the one forward read file.

On another note, I'm a bit worried your target fragment is a bit long = difficult to reassemble since your assembled FastQC report shows it approaching ~600 bp - which 16S fragment is this? We usually target PCR products around 450 bp so that the 2x300bp MiSeq can overlap around 150 bp to correct the lower quality ends, but also allow sufficiently large overlap to make stitching/assembling usually straightforward, even with lower quality runs.



ANDRÉ M. COMEAU, PhD
Manager  Integrated Microbiome Resource (IMR)
T: 902.494.2684 | E: andre....@dal.ca 

Address for deliveries:
Dept. of Pharmacology
Tupper Med. Bldg., room 5D
Dalhousie University
5850 College St.
Halifax NS B3H 4R2 

Research Associate (Lab Manager)

Morgan Langille Lab  Dept. of Pharmacology
ResearchGate Profile GoogleScholar Publications


"Without fantasy, there is no science. Without fact, there is no art." - Nabokov
"The good thing about science is that it's true whether or not you believe in it." - Neil deGrasse Tyson 


From: microbio...@googlegroups.com <microbio...@googlegroups.com> on behalf of Abdullahi Muhammad <abdullahi...@udusok.edu.ng>
Sent: Thursday, March 17, 2022 11:23 AM
To: Microbiome Helper <microbio...@googlegroups.com>
Subject: [microbiome-helper] Why do I loose 36.8% of reads while merging reads?
 
CAUTION: The Sender of this email is not from within Dalhousie.
Hello everyone,
Before running PEAR for read merging, I used fastqc to visualized the quality of the reads before proceeding with read stitching. After stitching the reads I observed a decrease in the total number of sequence in the stitched reads com[pared to the unstitched read as shown in the pictures. The pear reports revealed that 36.8% reads were discarded. My questions are (1.) why were those reads discarded? is it because they failed to meet the quality score threshold or because they are uncalled bases (N)?. **(2)**Could it be that the discarded reads were responsible for the loss in reads and why is this happening?. Using PEAR, I used the default parameters in merging the forward and reverse reads. I look forward to your marvellous answers

--
You received this message because you are subscribed to the Google Groups "Microbiome Helper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microbiome-hel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/microbiome-helper/38b4f083-b1a6-4278-8d30-240aa7885661n%40googlegroups.com.
Message has been deleted
Message has been deleted

Abdullahi Muhammad

unread,
Mar 22, 2022, 6:41:37 AM3/22/22
to Microbiome Helper
Hello,
I dont know if you received my last message, it seems like it was deleted. could you confirm this please?
Thank you

Andre Comeau

unread,
Mar 22, 2022, 4:51:12 PM3/22/22
to Microbiome Helper
Could you show the bottom part of that full log where it shows the total numbers of reads being processed (ie: # of assembled/discarded/not assembled reads) for this first sample (reads1 + reads2 files)?

You have an enormous number of Ns in those reads = 809 million uncalled bases!


ANDRÉ M. COMEAU, PhD
Manager  Integrated Microbiome Resource (IMR)
T: 902.494.2684 | E: andre....@dal.ca 

Address for deliveries:
Dept. of Pharmacology
Tupper Med. Bldg., room 5D
Dalhousie University
5850 College St.
Halifax NS B3H 4R2 

Research Associate (Lab Manager)

Morgan Langille Lab  Dept. of Pharmacology
ResearchGate Profile GoogleScholar Publications


"Without fantasy, there is no science. Without fact, there is no art." - Nabokov
"The good thing about science is that it's true whether or not you believe in it." - Neil deGrasse Tyson 


Sent: Monday, March 21, 2022 5:31 PM
To: Microbiome Helper <microbio...@googlegroups.com>
Subject: Re: [microbiome-helper] Why do I loose 36.8% of reads while merging reads?
 
CAUTION: The Sender of this email is not from within Dalhousie.
I have confirmed that my parameters matches the default as seen in the file attached. Thank you very much.
Muhammad

On Monday, March 21, 2022 at 8:03:53 PM UTC André Comeau wrote:

Andre Comeau

unread,
Mar 22, 2022, 4:52:31 PM3/22/22
to Microbiome Helper
OK that size should be fine for overlapping then...some of your assembled sequences were way too long in the FastQC graph.


ANDRÉ M. COMEAU, PhD
Manager  Integrated Microbiome Resource (IMR)
T: 902.494.2684 | E: andre....@dal.ca 

Address for deliveries:
Dept. of Pharmacology
Tupper Med. Bldg., room 5D
Dalhousie University
5850 College St.
Halifax NS B3H 4R2 

Research Associate (Lab Manager)

Morgan Langille Lab  Dept. of Pharmacology
ResearchGate Profile GoogleScholar Publications


"Without fantasy, there is no science. Without fact, there is no art." - Nabokov
"The good thing about science is that it's true whether or not you believe in it." - Neil deGrasse Tyson 


Sent: Monday, March 21, 2022 5:23 PM
To: Microbiome Helper <microbio...@googlegroups.com>
Subject: Re: [microbiome-helper] Why do I loose 36.8% of reads while merging reads?
 
CAUTION: The Sender of this email is not from within Dalhousie.
Thank you for your marvellous response. The V1-V2 region was target using the primers 27f & 338R primers. So I expect a band size of approximately 300-350bp. The 300bp paired of the v3 miseq was used for sequencing.

On Monday, March 21, 2022 at 8:03:53 PM UTC André Comeau wrote:
Reply all
Reply to author
Forward
0 new messages