I am trying to extract 16S sequences to get my taxonomical profiles using the program PhyloFlash with EMIRGE option.
So far everything worked well until I processed my last sample.
If you use EMIRGE in your work, please cite these manuscripts, as appropriate.
Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF (2011)
EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data.
Genome biology 12: R44. doi:10.1186/gb-2011-12-5-r44.
Miller CS, Handley KM, Wrighton KC, Frischkorn KR, Thomas BC, Banfield JF (2013)
Short-Read Assembly of Full-Length 16S Amplicons Reveals Bacterial Diversity in Subsurface Sediments.
PloS one 8: e56018. doi:10.1371/journal.pone.0056018.
imported _emirge C functions from: /mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/EMIRGE/_emirge.so
Command:
/mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/EMIRGE/emirge.py LIB_meta2.emirge -1 LIB_meta2.ready_m2_1.fq.SSU.1.fq -2 LIB_meta2.ready_m2_1.fq.SSU.2.fq -i 220 -s 83 -f ./132/SILVA_SSU.noLSU.masked.trimmed.NR96.fixed.fasta -b ./132/SILVA_SSU.noLSU.masked.trimmed.NR96.fixed.bt -l 100 -a 8 --phred33
EMIRGE started at Sat May 12 16:27:40 2018
Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Seeded quality full-index search: 00:00:03
# reads processed: 26905
# reads with at least one reported alignment: 14554 (54.09%)
# reads that failed to align: 12351 (45.91%)
Reported 14554 paired-end alignments
Time searching: 00:00:03
Overall time: 00:00:03
Performing initial mapping with command:
cat /mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/phyloFlash-pf3.0b1/LIB_meta2.ready_m2_1.fq.SSU.1.fq | bowtie --phred33-quals -t -p 8 -n 3 -l 20 -e 300 --best --sam --chunkmbs 128 --minins 100 --maxins 469 /mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/phyloFlash-pf3.0b1/132/S
ILVA_SSU.noLSU.masked.trimmed.NR96.fixed.bt -1 - -2 /mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/phyloFlash-pf3.0b1/LIB_meta2.ready_m2_1.fq.SSU.2.fq | samtools view -b -S -u -F 0x0004 - > /mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/phyloFlash-pf3.0b1/LIB_meta2.emirge/initial_ mapping/initial_bowtie_mapping.PE.bam
Beginning initialization at Sat May 12 16:27:43 2018...
Reading bam file /mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/phyloFlash-pf3.0b1/LIB_meta2.emirge/initial_mapping/initial_bowtie_mapping.PE.bam at Sat May 12 16:27:43 2018...
DONE Reading bam file /mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/phyloFlash-pf3.0b1/LIB_meta2.emirge/initial_mapping/initial_bowtie_mapping.PE.bam at Sat May 12 16:27:45 2018 [0:00:01.626764]...
DONE with initialization at Sat May 12 16:27:45 2018...
Starting iteration 0 at Sat May 12 16:27:45 2018...
Reading bam file /mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/phyloFlash-pf3.0b1/LIB_meta2.emirge/initial_mapping/initial_bowtie_mapping.PE.bam at Sat May 12 16:27:45 2018...
DONE Reading bam file /mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/phyloFlash-pf3.0b1/LIB_meta2.emirge/initial_mapping/initial_bowtie_mapping.PE.bam at Sat May 12 16:27:46 2018 [0:00:01.185531]...
Calculating likelihood (4070, 29108) for iteration 0 at Sat May 12 16:27:46 2018...
Calculating Pr(N=n) for iteration 0 at Sat May 12 16:27:46 2018...
Traceback (most recent call last):
File "/mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/EMIRGE/emirge.py", line 1697, in <module>
main()
File "/mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/EMIRGE/emirge.py", line 1688, in main
do_iterations(em, max_iter = options.iterations, save_every = options.save_every)
File "/mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/EMIRGE/emirge.py", line 1439, in do_iterations
em.do_iteration(em.current_bam_filename, em.current_reference_fasta_filename)
File "/mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/EMIRGE/emirge.py", line 491, in do_iteration
self.calc_likelihoods()
File "/mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/EMIRGE/emirge.py", line 1099, in calc_likelihoods
self.calc_probN() # (handles initial iteration differently within this method)
File "/mnt/7c8fd4e1-c269-4a73-9454-b988d49f9139/EMIRGE/emirge.py", line 1270, in calc_probN
bases = numpy.array(self.fastafile.fetch(fastaname).upper(), dtype='c')[zero_indices[0]]
File "pysam/libcfaidx.pyx", line 302, in pysam.libcfaidx.FastaFile.fetch
KeyError: "sequence 'JU993969.1.1226' not present"
I didn't have this problem with my other samples so, I don't really know why did it happen now.