assembling reads (with PEAR) leading to samples with zero reads matching to OTUs

13 views
Skip to first unread message

Ricardo Ramiro

unread,
Jun 14, 2017, 1:14:14 PM6/14/17
to Qiime 1 Forum


Dear all,

From sequencing, I receive already demultiplexed paired-end reads (MiSeq, 250bp). I have been trying a workflow in which I use:

1. PEAR to merge paired-end reads 
2. split_libraries_fastq.py with -q19 to filter by quality
3. pick_open_reference_otus.py to generate my OTU table

While this appeared to be going generally smoothly, I have three samples (out of 70) for which I got zero reads matching to OTUs. This is weird because I get a large number of reads if I just use the forward read, running fastQC on these three samples shows me that the reads look ok and blasting some of these reads does give me top hits for 16S.

So I am not really sure what might be going on, but if anyone has some idea, I would greatly appreciate hearing about it.

here is how the assembled .fastq looks for a sample that ends in zero reads:

@jbatista_2.3_d18_475 M01876:86:000000000-AL54B:1:1101:14875:1549 1:N:0:0 orig_bc=TCTAAGGCACGT new_bc=TCTAAGGCACGT bc_diffs=0
TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGCAGGCGGTCTGTTAAGTCAGCGGTCAAAGCCCGGGGCTCAACCCCGGCCCGCCGTTGAAACTGGCAGTCTCGAGTTGGAGAGAAGTATGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCATGCCGGCTCCACACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACAGG
+
CCBIIIICIFIFGIIGGIGGIIHIIIIGHHHHIIHGGGIHIIIIHIGGGIIIIIIIGGIIHIIHHIHHIIGGIIIGGIIHIIIGIGIHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIBBA
@jbatista_2.3_d18_652 M01876:86:000000000-AL54B:1:1101:13784:1603 1:N:0:0 orig_bc=TCTAAGGCACGT new_bc=TCTAAGGCACGT bc_diffs=0
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGTGGATTGTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGAAACTGGCAGTCTTGAGTACAGTAGAGGTGGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGTTCACTGGACTGCAACTGACACTGAGGCTCGAAAGTGTGGGTATCAAACAGG
+
BBBIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIBBB
@jbatista_2.3_d18_684 M01876:86:000000000-AL54B:1:1101:13951:1613 1:N:0:0 orig_bc=TCTAAGGCACGT new_bc=TCTAAGGCACGT bc_diffs=0
TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACAGG
+
BCCIIIIIIIIFIIIFIIIIIIIIIIIIIIIIIIIIGIIIIIIIHIIIIIIIII<EIIIIIIGIIIIIIIIIIIIIIHIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIBB
@jbatista_2.3_d18_685 M01876:86:000000000-AL54B:1:1101:13342:1613 1:N:0:0 orig_bc=TCTAAGGCACGT new_bc=TCTAAGGCACGT bc_diffs=0
TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGCAGGCGGTCTGTTAAGTCAGCGGTCAAAGCCCGGGGCTCAACCCCGGCCCGCCGTTGAAACTGGCAGTCTCGAGTTG


and how the assembled .fastq looks for a sample whose reads do match to OTUs:

@jbatista_2.6_d12_3270 M01876:86:000000000-AL54B:1:1101:14128:2130 1:N:0:0 orig_bc=TGTCCAGGTTTA new_bc=TGTCCAGGTTTA bc_diffs=0
TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGG
+
BBIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIIICC
@jbatista_2.6_d12_3351 M01876:86:000000000-AL54B:1:1101:17070:2142 1:N:0:0 orig_bc=TGTCCAGGTTTA new_bc=TGTCCAGGTTTA bc_diffs=0
TACGGAGGATCCGAGCGTTATCCGGATTTATAGGGTTTAAAGGGAGCGTAGGTGGATTGTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGAAACTGGCAGTCTTGAGTACAGTAGAGGTGGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTCACTGGACTGCAACTGACACTGATGCTCGAAAGTGTGGGTATCAAACAGG
+
AA1IIIAIIIIIIII=IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIBA33
@jbatista_2.6_d12_3491 M01876:86:000000000-AL54B:1:1101:11382:2166 1:N:0:0 orig_bc=TGTCCAGGTTTA new_bc=TGTCCAGGTTTA bc_diffs=0
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGTGGATTGTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGAAACTGGCAGTCTTGAGTACAGTAGAGGTGGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTCACTGGACTGCAACTGACACTGATGCTCGAAAGTGTGGGTATCAAACAGG
+
ABAIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIIAAA
@jbatista_2.6_d12_3525 M01876:86:000000000-AL54B:1:1101:16768:2172 1:N:0:0 orig_bc=TGTCCAGGTTTA new_bc=TGTCCAGGTTTA bc_diffs=0
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGTGGATTGTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGAAACTGGCAGTCTTGAGTACAGTAGAGGTGGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTCACTGGACTGCAACTGACACTGATGCTCGAAAGTGTGGGTATCAAACAGG
+
BBBIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII5IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIBBB
@jbatista_2.6_d12_3605 M01876:86:000000000-AL54B:1:1101:17543:2185 1:N:0:0 orig_bc=TGTCCAGGTTTA new_bc=TGTCCAGGTTTA bc_diffs=0
TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGGGGGGAGCAAAAAGG
+
BBAIIIIIIBIIGIIIIIIGIIIIIIIIIIIIIIIIGGIIIIIIIIIIIIIIIGIIE?IIIIIIIIIIIIIHDIII@IBIIIIIEIIIIIIIIII=1IIIIIFIIIIIIIIII=IIIDIIIIIIII


justink

unread,
Jun 14, 2017, 2:06:19 PM6/14/17
to Qiime 1 Forum
I'm also unsure. I wonder if it's possible that the sampleIDs are overlapping—try validate_mapping_file.py .

Maybe to debug further:
take a sample that has zero matching otus, grab 100 or so sequences after merging paired ends, then run the rest on only that one sample with those 100 sequences?

Ricardo Ramiro

unread,
Jun 15, 2017, 7:04:22 AM6/15/17
to Qiime 1 Forum
Hi,

Thanks for your reply. I do not really use the mapping file in either the split or pick_otus steps, so in the mapping file would not be the problem.

Could you please explain better your second suggestion? its not totally clear to me what you mean.

justink

unread,
Jun 16, 2017, 12:30:46 AM6/16/17
to Qiime 1 Forum


On Thursday, June 15, 2017 at 6:04:22 AM UTC-5, Ricardo Ramiro wrote:
Hi,

Thanks for your reply. I do not really use the mapping file in either the split or pick_otus steps, so in the mapping file would not be the problem.

right, sorry. I saw some underscores in the sample names and got worried.
 

Could you please explain better your second suggestion? its not totally clear to me what you mean.

jbatista_2.6 has many sequences that don't match OTUs, right? what happens if you make a file with 100 of those sequences (or even 10) and run pick_open_reference_otus.py on only that file?

sorry I can't be of more help.




Reply all
Reply to author
Forward
0 new messages