[None for x in range(max(self.sequence_name2sequence_i[-1].values())+1)]

66 views
Skip to first unread message

gina...@gmail.com

unread,
Jan 26, 2016, 3:41:55 PM1/26/16
to EMIRGE users

Hello EMIRGE users,


I'm new to the program and I can't quite figure out how to troubleshoot the error that I'm running into. It looks similar to the error that was discussed in a previous post when no reads were aligning, but it looks to me like that is not the problem in my case. I'm on a Mac running OS X El Capitan, if that matters. Any thoughts?


Thanks!



>emirge.py ./EMIRGE_32 -1 32_S32_L001_R1_001.fastq -2 32_S32_L001_R2_001.fastq -l 302 -f ~/refs/SSURef/SSURef_111_candidate_db.fasta -b ~/refs/SSURef/SSURef_111_candidate_db_formated -i 500 -s 500 --phred33


If you use EMIRGE in your work, please cite these manuscripts, as appropriate.


Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF (2011)

EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data.

Genome biology 12: R44. doi:10.1186/gb-2011-12-5-r44.


Miller CS, Handley KM, Wrighton KC, Frischkorn KR, Thomas BC, Banfield JF (2013)

Short-Read Assembly of Full-Length 16S Amplicons Reveals Bacterial Diversity in Subsurface Sediments.

PloS one 8: e56018. doi:10.1371/journal.pone.0056018.


imported _emirge C functions from: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/_emirge.so

Command:

/Users/Professional/bin/emirge ./EMIRGE_32 -1 32_S32_L001_R1_001.fastq -2 32_S32_L001_R2_001.fastq -l 302 -f /Users/Professional/refs/SSURef/SSURef_111_candidate_db.fasta -b /Users/Professional/refs/SSURef/SSURef_111_candidate_db_formated -i 500 -s 500 --phred33


EMIRGE started at Tue Jan 26 12:47:19 2016

Performing initial mapping with command:

cat  /Users/Professional/Dropbox/Water_Heater_Microbes/Sequencing/whole_genomes/2014_May_MiSeq/fastq_files/32_S32_L001_R1_001.fastq |  bowtie --phred33-quals -t -p 1 -n 3 -l 20 -e 300 --best --sam --chunkmbs 128 --minins 302 --maxins 2000 /Users/Professional/refs/SSURef/SSURef_111_candidate_db_formated -1 - -2 /Users/Professional/Dropbox/Water_Heater_Microbes/Sequencing/whole_genomes/2014_May_MiSeq/fastq_files/32_S32_L001_R2_001.fastq | samtools view -b -S -u -F 0x0004 - > /Users/Professional/Dropbox/Water_Heater_Microbes/Sequencing/whole_genomes/2014_May_MiSeq/fastq_files/EMIRGE_32/initial_mapping/initial_bowtie_mapping.PE.bam 

Time loading reference: 00:00:00

Time loading forward index: 00:00:00

Time loading mirror index: 00:00:00

[samopen] SAM header is present: 150807 sequences.

Seeded quality full-index search: 00:11:44

# reads processed: 849199

# reads with at least one reported alignment: 682 (0.08%)

# reads that failed to align: 848517 (99.92%)

Reported 682 paired-end alignments to 1 output stream(s)

Time searching: 00:11:44

Overall time: 00:11:44

Beginning initialization at Tue Jan 26 12:59:04 2016...

Reading bam file /Users/Professional/Dropbox/Water_Heater_Microbes/Sequencing/whole_genomes/2014_May_MiSeq/fastq_files/EMIRGE_32/initial_mapping/initial_bowtie_mapping.PE.bam at Tue Jan 26 12:59:04 2016...

Traceback (most recent call last):

  File "/Users/Professional/bin/emirge", line 1697, in <module>

    main()

  File "/Users/Professional/bin/emirge", line 1681, in main

    em.initialize_EM(options.mapping, options.fasta_db)

  File "/Users/Professional/bin/emirge", line 337, in initialize_EM

    self.read_bam(bam_filename, reference_fasta_filename)

  File "/Users/Professional/bin/emirge", line 297, in read_bam

    self.probN = [None for x in range(max(self.sequence_name2sequence_i[-1].values())+1)]

ValueError: max() arg is an empty sequence


Xabier Vázquez Campos

unread,
Feb 4, 2016, 1:08:03 AM2/4/16
to EMIRGE users, gina...@gmail.com
I'm getting the same message although in my case it doesn't find any matches

By the way, I realised that the -l option in EMIRGE is passed to Bowtie as -minins, but is defined as MAX_READ_LENGTH in the EMIRGE help, isn't it weird?

If you use EMIRGE in your work, please cite these manuscripts, as appropriate.

Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF (2011)
EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data.
Genome biology 12: R44. doi:10.1186/gb-2011-12-5-r44.

Miller CS, Handley KM, Wrighton KC, Frischkorn KR, Thomas BC, Banfield JF (2013)
Short-Read Assembly of Full-Length 16S Amplicons Reveals Bacterial Diversity in Subsurface Sediments.
PloS one 8: e56018. doi:10.1371/journal.pone.0056018.

imported _emirge C functions from: /home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/lib/python2.7/site-packages/_emirge.so
Command:
/home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/bin/emirge.py /srv/scratch/z3382651/metagenomes -1 /srv/scratch/z3382651/metagenomes/high_quality_genomes/028-LFA_S1_L001_R1_001_val_1.fq -2 /srv/scratch/z3382651/metagenomes/high_quality_genomes/028-LFA_S1_L001_R2_001_val_2.fq -f /srv/scratch/z3382651/emirge_db/SSURef_111_candidate_db.fasta -b /srv/scratch/z3382651/emirge_db/SSURef_111_candidate_db -l 880 -i 480 -s 300 -a 12 --phred33

EMIRGE started at Thu Feb  4 16:41:28 2016
Time loading reference: 00:00:01

Time loading forward index: 00:00:00
Time loading mirror index: 00:00:01

[samopen] SAM header is present: 150807 sequences.
Seeded quality full-index search: 00:06:31
# reads processed: 4903537
# reads with at least one reported alignment: 0 (0.00%)
# reads that failed to align: 4903537 (100.00%)
No alignments
Time searching: 00:06:33
Overall time: 00:06:33

Performing initial mapping with command:
cat  /srv/scratch/z3382651/metagenomes/high_quality_genomes/028-LFA_S1_L001_R1_001_val_1.fq |  bowtie --phred33-quals -t -p 12 -n 3 -l 20 -e 300 --best --sam --chunkmbs 128 --minins 880 --maxins 1380 /srv/scratch/z3382651/emirge_db/SSURef_111_candidate_db -1 - -2 /srv/scratch/z3382651/metagenomes/high_quality_genomes/028-LFA_S1_L001_R2_001_val_2.fq | samtools view -b -S -u -F 0x0004 - > /srv/scratch/z3382651/metagenomes/initial_mapping/initial_bowtie_mapping.PE.bam
Beginning initialization at Thu Feb  4 16:48:01 2016...
Reading bam file /srv/scratch/z3382651/metagenomes/initial_mapping/initial_bowtie_mapping.PE.bam at Thu Feb  4 16:48:01 2016...

Traceback (most recent call last):
  File "/home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/bin/emirge.py", line 1697, in <module>
    main()
  File "/home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/bin/emirge.py", line 1681, in main
    em.initialize_EM(options.mapping, options.fasta_db)
  File "/home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/bin/emirge.py", line 337, in initialize_EM
    self.read_bam(bam_filename, reference_fasta_filename)
  File "/home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/bin/emirge.py", line 297, in read_bam

    self.probN = [None for x in range(max(self.sequence_name2sequence_i[-1].values())+1)]
ValueError: max() arg is an empty sequence

mpac...@gmail.com

unread,
Feb 6, 2016, 12:11:55 AM2/6/16
to EMIRGE users, gina...@gmail.com
Hello,

I just started using it myself and I got the same error. I noticed that I also get a very high number of reads that failed to align so I wonder if this caused the error. Any insights from the developers? Any ideas of how the number of aligned reads would improve? Did we fail to correctly assign the -i and -s flags?

Thank you,
Maria

mpac...@gmail.com

unread,
Feb 6, 2016, 12:12:59 AM2/6/16
to EMIRGE users
Good catch, Xabier. That doesn't seem right.

Chris Miller

unread,
Feb 6, 2016, 12:45:19 AM2/6/16
to EMIRGE users, mpac...@gmail.com
I think Xabier's error is different (0 alignments).  This is due to the fact that the max read length passed doesn't make sense (880 bp), either for Illumina or for his insert size (480 bp).  See other thread for details.

Chris

Chris Miller

unread,
Feb 6, 2016, 12:57:27 AM2/6/16
to EMIRGE users, gina...@gmail.com
I can't figure this out either.  Would you mind posting privately or sending me the bamfile from the initial mapping in an email?

/Users/Professional/Dropbox/Water_Heater_Microbes/Sequencing/whole_genomes/2014_May_MiSeq/fastq_files/EMIRGE_32/initial_mapping/initial_bowtie_mapping.PE.bam

Also, do your reads overlap?  If so, did you do anything to trim adapter sequences?

Chris

Gina Wilp

unread,
Feb 9, 2016, 2:40:33 PM2/9/16
to EMIRGE users, gina...@gmail.com
I can't find a way to send a private reply, but I looked at the bamfile and I'm guessing the problem does indeed stem from something going wrong generating the initial mapping. 

samtools flagstat initial_bowtie_mapping.PE.bam 

0 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 duplicates

0 + 0 mapped (nan%:nan%)

0 + 0 paired in sequencing

0 + 0 read1

0 + 0 read2

0 + 0 properly paired (nan%:nan%)

0 + 0 with itself and mate mapped

0 + 0 singletons (nan%:nan%)

0 + 0 with mate mapped to a different chr

0 + 0 with mate mapped to a different chr (mapQ>=5)


samtools view -h initial_bowtie_mapping.PE.bam | head

@HD VN:1.0 SO:unsorted

@SQ SN:FJ788112.1.2000 LN:2000

@SQ SN:JF742194.1.2062 LN:2000

@SQ SN:GU290080.1.2000 LN:2000

@SQ SN:FJ572900.1.2000 LN:2000

@SQ SN:GU556149.1.2000 LN:2000

@SQ SN:AY288699.1.2000 LN:2000

@SQ SN:AF012514.1.2000 LN:2000

@SQ SN:AF093247.1.2007 LN:2000

@SQ SN:AF249194.1.1999 LN:1999


I'm not sure how much the reads overlap, but I had not been trimming adapters or trimming for quality prior to feeding sequences into EMIRGE. I can certainly try that, if there's a chance it will help.

Chris Miller

unread,
Feb 10, 2016, 12:56:42 AM2/10/16
to EMIRGE users, gina...@gmail.com
Well that's confusing!  So what are the 682 mappings reported in the log?

You can send the bam file to my email and I can take a look.

Also, 
Reply all
Reply to author
Forward
0 new messages