emirge error

380 views
Skip to first unread message

xvaz...@gmail.com

unread,
Feb 1, 2016, 12:54:13 AM2/1/16
to EMIRGE users
Hi,

I've been trying to run emirge for quite a while with no luck.

The input reads have been previously filtered and trimmed.

Any Idea of what can be wrong?

The usearch, samtools and bowtie versions are the ones indicated in the README.

This is the log:

If you use EMIRGE in your work, please cite these manuscripts, as appropriate.

Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF (2011)
EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data.
Genome biology 12: R44. doi:10.1186/gb-2011-12-5-r44.

Miller CS, Handley KM, Wrighton KC, Frischkorn KR, Thomas BC, Banfield JF (2013)
Short-Read Assembly of Full-Length 16S Amplicons Reveals Bacterial Diversity in Subsurface Sediments.
PloS one 8: e56018. doi:10.1371/journal.pone.0056018.

imported _emirge C functions from: /share/apps/emirge/0.5.0/lib/python2.7/site-packages/_emirge.so
Command:
/share/apps/emirge/0.5.0/bin/emirge.py /srv/scratch/z3382651/metagenomes -1 028-LFA_S1_L001_R1_001_val_1.fq -2 028-LFA_S1_L001_R2_001_val_2.fq -f /srv/scratch/z3382651/emirge_db/SILVA_123_SSURef_Nr99_tax_silva.fasta -b /srv/scratch/z3382651/emirge_db/SILVA_123_SSURef_Nr99_tax_silva -l 880 -i 480 -s 117 -a 12 --phred33

EMIRGE started at Mon Feb  1 11:41:43 2016
Time loading reference: 00:00:03
Time loading forward index: 00:00:23
Time loading mirror index: 00:00:25
[samopen] SAM header is present: 597607 sequences.
cat: write error: Broken pipe
[sam_read1] reference '77' is recognized as '*'.
[main_samview] truncated file.
Performing initial mapping with command:
cat  028-LFA_S1_L001_R1_001_val_1.fq |  bowtie --phred33-quals -t -p 12 -n 3 -l 20 -e 300 --best --sam --chunkmbs 128 --minins 880 --maxins 831 /srv/scratch/z3382651/emirge_db/SILVA_123_SSURef_Nr99_tax_silva -1 - -2 028-LFA_S1_L001_R2_001_val_2.fq | samtools view -b -S -u -F 0x0004 - > /srv/scratch/z3382651/metagenomes/initial_mapping/initial_bowtie_mapping.PE.bam
Traceback (most recent call last):
  File "/share/apps/emirge/0.5.0/bin/emirge.py", line 1697, in <module>
    main()
  File "/share/apps/emirge/0.5.0/bin/emirge.py", line 1659, in main
    options.mapping = do_initial_mapping(working_dir, options)
  File "/share/apps/emirge/0.5.0/bin/emirge.py", line 1486, in do_initial_mapping
    check_call(cmd, shell=True, stdout = sys.stdout, stderr = sys.stderr)
  File "/share/apps/python/2.7.6/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cat  028-LFA_S1_L001_R1_001_val_1.fq |  bowtie --phred33-quals -t -p 12 -n 3 -l 20 -e 300 --best --sam --chunkmbs 128 --minins 880 --maxins 831 /srv/scratch/z3382651/emirge_db/SILVA_123_SSURef_Nr99_tax_silva -1 - -2 028-LFA_S1_L001_R2_001_val_2.fq | samtools view -b -S -u -F 0x0004 - > /srv/scratch/z3382651/metagenomes/initial_mapping/initial_bowtie_mapping.PE.bam ' returned non-zero exit status 1

Chris Miller

unread,
Feb 6, 2016, 12:24:01 AM2/6/16
to EMIRGE users
Sorry you're having trouble.

Looks to me like a bogus bowtie command line is getting built based on the input parameters you specified.  EMIRGE should handle the suspect input parameters more gracefully, but in the meantime, try fixing your max read length.

Your command line includes
-l 880 -i 480
This indicates you have a maximum read length of 880 bp on an insert of 480 bp, which is not possible.
    -l MAX_READ_LENGTH, --max_read_length=MAX_READ_LENGTH
                        length of longest read in input data.
                    EMIRGE expects ASCII-offset of 64 for quality scores.
    -i INSERT_MEAN, --insert_mean=INSERT_MEAN
                        insert size distribution mean.

Your max read length is likely less than 300 bp, if this is Illumina.

Chris

Xabier Vázquez Campos

unread,
Feb 6, 2016, 1:34:46 AM2/6/16
to EMIRGE users
The max read length is based on the labchip results of the libraries minus the adapter size. So it has a median of 480 in a range of 180-880.

In any case, I set new parameters as -l 70 -i 480 -s 400 and it picked more than 1000 reads and shows over 800 in the priors file but it ends reporting the same sequence in all the 40 iterations

I attach the logjust in case
water_emirge300.o3819817

Chris Miller

unread,
Feb 7, 2016, 8:50:42 AM2/7/16
to EMIRGE users
You are describing inferring the insert size distribution from Bioanalyzer / labchip results.  The -l parameter is for the maximum read length.  That is, how many cycles were performed on the Illumina run.  You can get this by looking at the first record (first 4 lines) of your untrimmed fastq file.

Was any read trimming performed?  EMIRGE tends to perform better with (even simply) quality trimmed reads.

Do your paired-end reads overlap substantially, given your read length and insert size distribution?  If most reads overlap, I recommend merging with a program like SeqPrep or bbmap/bbmerge, then passing these in as single reads to EMIRGE.

Chris

bryme...@gmail.com

unread,
Jan 28, 2017, 11:11:52 PM1/28/17
to EMIRGE users
Hi,

I'm getting what appears to be the same error as the initial poster. I'm using the latest version of Emirge as installed from Bioconda. I can't figure out what could be going wrong. Thanks so much.

Bryan

emirge.py sample8_emirge_out -1 /scratch/PI/justins2/2016-Hadza_Project/Hadza_Metagenome/raw_fastq/sample8_R1.fastq -2 /scratch/PI/justins2/2016-Hadza_Project/Hadza_Metagenome/raw_fastq/sample8_R2.fastq -f /scratch/users/bmerrill/software/EMIRGE/SILVA_128_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed.fasta -b /scratch/users/bmerrill/software/EMIRGE/SILVA_128_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed -l 151 -i 355 -s 95 -a 8 --phred33
If you use EMIRGE in your work, please cite these manuscripts, as appropriate.

Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF (2011)
EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data.
Genome biology 12: R44. doi:10.1186/gb-2011-12-5-r44.

Miller CS, Handley KM, Wrighton KC, Frischkorn KR, Thomas BC, Banfield JF (2013)
Short-Read Assembly of Full-Length 16S Amplicons Reveals Bacterial Diversity in Subsurface Sediments.
PloS one 8: e56018. doi:10.1371/journal.pone.0056018.

imported _emirge C functions from: /home/bmerrill/anaconda2/envs/emirge2/lib/python2.7/site-packages/_emirge.so
Command:
/home/bmerrill/anaconda2/envs/emirge2/bin/emirge.py sample8_emirge_out -1 /scratch/PI/justins2/2016-Hadza_Project/Hadza_Metagenome/raw_fastq/sample8_R1.fastq -2 /scratch/PI/justins2/2016-Hadza_Project/Hadza_Metagenome/raw_fastq/sample8_R2.fastq -f /scratch/users/bmerrill/software/EMIRGE/SILVA_128_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed.fasta -b /scratch/users/bmerrill/software/EMIRGE/SILVA_128_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed -l 151 -i 355 -s 95 -a 8 --phred33

EMIRGE started at Wed Jan 25 22:47:54 2017
Performing initial mapping with command:
cat  /scratch/PI/justins2/2016-Hadza_Project/Hadza_Metagenome/raw_fastq/sample8_R1.fastq |  bowtie --phred33-quals -t -p 8 -n 3 -l 20 -e 300 --best --sam --chunkmbs 128 --minins 151 --maxins 640 /scratch/users/bmerrill/software/EMIRGE/SILVA_128_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed -1 - -2 /scratch/PI/justins2/2016-Hadza_Project/Hadza_Metagenome/raw_fastq/sample8_R2.fastq | samtools view -b -S -u -F 0x0004 - > /scratch/users/bmerrill/data/assembly_tests/16s_from_hadza/sample8_emirge_out/initial_mapping/initial_bowtie_mapping.PE.bam
Time loading reference: 00:00:00
Time loading forward index: 00:00:03
cat: write error: Broken pipe
Time loading mirror index: 00:00:05
Error: reads file does not look like a FASTQ file
Seeded quality full-index search: 00:00:00
Time searching: 00:00:08
Overall time: 00:00:08
Command: bowtie-align --wrapper basic-0 --phred33-quals -t -p 8 -n 3 -l 20 -e 300 --best --sam --chunkmbs 128 --minins 151 --maxins 640 -1 - -2 /scratch/PI/justins2/2016-Hadza_Project/Hadza_Metagenome/raw_fastq/sample8_R2.fastq /scratch/users/bmerrill/software/EMIRGE/SILVA_128_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed
Beginning initialization at Wed Jan 25 22:48:04 2017...
Reading bam file /scratch/users/bmerrill/data/assembly_tests/16s_from_hadza/sample8_emirge_out/initial_mapping/initial_bowtie_mapping.PE.bam at Wed Jan 25 22:48:04 2017...
[fai_load] build FASTA index.
Traceback (most recent call last):
  File "/home/bmerrill/anaconda2/envs/emirge2/bin/emirge.py", line 1697, in <module>
    main()
  File "/home/bmerrill/anaconda2/envs/emirge2/bin/emirge.py", line 1681, in main
    em.initialize_EM(options.mapping, options.fasta_db)
  File "/home/bmerrill/anaconda2/envs/emirge2/bin/emirge.py", line 337, in initialize_EM
    self.read_bam(bam_filename, reference_fasta_filename)
  File "/home/bmerrill/anaconda2/envs/emirge2/bin/emirge.py", line 297, in read_bam
    self.probN = [None for x in range(max(self.sequence_name2sequence_i[-1].values())+1)]
ValueError: max() arg is an empty sequence

hem...@uoguelph.ca

unread,
Mar 5, 2020, 6:20:29 PM3/5/20
to EMIRGE users
Hi Chris,
I'm getting the exact same error than Bryan, 3 years later.
Any news on that error?
Cheers, Chris
Reply all
Reply to author
Forward
0 new messages