UCHIME Error

Lawrence Davies

unread,

May 21, 2012, 11:15:02 AM5/21/12

to Qiime Forum

Hi QIIMERS,

I have downloaded USEARCH, made it executable and put it in my path
(when I type echo $PATH I can see the folder that contains USEARCH)
but when I type 'usearch' I get the reply 'command not found.

There are several posts on here relating to this problem but they
haven't helped answer my problem. The folder is clearly in my path and
executable. I have tried it with a few versions of USEARCH (v5.2.32
and 5.1.221) as one post said the newest version wasn't compatible
with QIIME 10.04.

I am not trying to run the pick_otus.py, so it has nothing to do with
my input files. I'm just typing 'usearch' or 'man usearch' to check if
it's picked up.

Any ideas would be much appreciated. This has driven me a little bit
crazy today because I'm sure it is something really simple that I am
missing......

Cheers,

Lawrence

Error command:

No command 'usearch' found, did you mean:
Command 'search' from package 'sphinxsearch' (universe)
Command 'ausearch' from package 'auditd' (universe)

echo $PATH command - 'home/qiime/usearch) is at the end

/software/cytoscape-2.7.0-release/.:/software/qiime-1.4.0-release/bin:/
software/pprospector-1.0.1-release/bin:/software/pynast-1.1-release/
bin:/software/uclust-1.2.22-release/.:/software/cdhit-3.1-release/.:/
software/rdpclassifier-2.2-release/.:/software/r-2.12.0-release/bin:/
software/blast-2.2.22-release/bin:/software/fasttree-2.1.3-release/.:/
software/python-2.7.1-release/bin:/software/mothur-1.6-release/.:/
software/vienna-1.8.4-release/.:/software/infernal-1.0.2-release/bin:/
software/chimeraslayer-4.29.2010-release/ChimeraSlayer:/software/
chimeraslayer-4.29.2010-release/NAST-iEr:/software/cdbtools-10.11.2010-
release/.:/software/ampliconnoise-1.25-release/Scripts:/software/
ampliconnoise-1.25-release/bin:/software/raxml-7.0.3-release/.:/
software/clearcut-1.0.9-release/.:/software/muscle-3.8.31-release/.:/
software/cytoscape-2.7.0-release/.:/software/qiime-1.4.0-release/bin:/
software/pprospector-1.0.1-release/bin:/software/pynast-1.1-release/
bin:/software/uclust-1.2.22-release/.:/software/cdhit-3.1-release/.:/
software/rdpclassifier-2.2-release/.:/software/r-2.12.0-release/bin:/
software/blast-2.2.22-release/bin:/software/fasttree-2.1.3-release/.:/
software/python-2.7.1-release/bin:/software/mothur-1.6-release/.:/
software/vienna-1.8.4-release/.:/software/infernal-1.0.2-release/bin:/
software/chimeraslayer-4.29.2010-release/ChimeraSlayer:/software/
chimeraslayer-4.29.2010-release/NAST-iEr:/software/cdbtools-10.11.2010-
release/.:/software/ampliconnoise-1.25-release/Scripts:/software/
ampliconnoise-1.25-release/bin:/software/raxml-7.0.3-release/.:/
software/clearcut-1.0.9-release/.:/software/muscle-3.8.31-release/.:/
software/cytoscape-2.7.0-release/.:/software/qiime-1.4.0-release/bin:/
software/pprospector-1.0.1-release/bin:/software/pynast-1.1-release/
bin:/software/uclust-1.2.22-release/.:/software/cdhit-3.1-release/.:/
software/rdpclassifier-2.2-release/.:/software/r-2.12.0-release/bin:/
software/blast-2.2.22-release/bin:/software/fasttree-2.1.3-release/.:/
software/python-2.7.1-release/bin:/software/mothur-1.6-release/.:/
software/vienna-1.8.4-release/.:/software/infernal-1.0.2-release/bin:/
software/chimeraslayer-4.29.2010-release/ChimeraSlayer:/software/
chimeraslayer-4.29.2010-release/NAST-iEr:/software/cdbtools-10.11.2010-
release/.:/software/ampliconnoise-1.25-release/Scripts:/software/
ampliconnoise-1.25-release/bin:/software/raxml-7.0.3-release/.:/
software/clearcut-1.0.9-release/.:/software/muscle-3.8.31-release/.:/
bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/opt/
vappio-py/vappio/cli:$PATH:home/qiime/_folder/:home/qiime/usearch/
usearch:home/qiime/usearch:home/qiime/usearch

print_qiime_config.py

System information
==================
Platform: linux2
Python version: 2.7.1 (r271:86832, Dec 14 2011, 00:47:21) [GCC
4.4.3]
Python executable: /software/python-2.7.1-release/bin/python

Dependency versions
===================
PyCogent version: 1.5.1
NumPy version: 1.5.1
matplotlib version: 1.1.0
QIIME library version: 1.4.0
QIIME script version: 1.4.0
PyNAST version (if installed): 1.1
RDP Classifier version (if installed): rdp_classifier-2.2.jar

QIIME config values
===================
blastmat_dir: /software/blast-2.2.22-release/data
topiaryexplorer_project_dir: None
pynast_template_alignment_fp: /software/
core_set_aligned.fasta.imputed
cluster_jobs_fp: /software/qiime-1.4.0-release/bin/
start_parallel_jobs.py
pynast_template_alignment_blastdb: None
assign_taxonomy_reference_seqs_fp: /software/gg_otus-4feb2011-release/
rep_set/gg_97_otus_4feb2011.fasta
torque_queue: friendlyq
template_alignment_lanemask_fp: /software/lanemask_in_1s_and_0s
jobs_to_start: 1
cloud_environment: False
qiime_scripts_dir: /software/qiime-1.4.0-release/bin
denoiser_min_per_core: 50
working_dir: /tmp/
python_exe_fp: /software/python-2.7.1-release/bin/
python
temp_dir: /tmp/
blastall_fp: /software/blast-2.2.22-release/bin/
blastall
seconds_to_sleep: 60
assign_taxonomy_id_to_taxonomy_fp: /software/gg_otus-4feb2011-release/
taxonomies/greengenes_tax_rdp_train.txt

Tony Walters

unread,

May 21, 2012, 11:21:36 AM5/21/12

to qiime...@googlegroups.com

Hello Lawrence,

Did you rename the useach executable that you downloaded to "usearch"? Its default name has other characters.

-Tony

Lawrence Davies

unread,

May 21, 2012, 3:31:54 PM5/21/12

to qiime...@googlegroups.com

Hi Tony,

I renamed it to 'usearch' but it still doesn't pick it up.

Cheers,

Lawrence

Tony Walters

unread,

May 21, 2012, 3:33:52 PM5/21/12

to qiime...@googlegroups.com

Hello again Lawrence,

If you open a terminal, and change to the directory that you have the usearch file in, can you type usearch there and see what it says?

-Tony

Lawrence Davies

unread,

May 21, 2012, 3:40:11 PM5/21/12

to qiime...@googlegroups.com

Hi,

I'm not sure of the IP address of my work computer so I can't get in it right now. But I did try that today and it gave the same error as shown below - it made me think there was something inherently wrong with it so I downloaded an earlier version - I don't think I typed 'usearch' into the directory it was in with that new version though.

What should it say?

I can try tomorrow. I'm in the UK so bit of a time difference....

Lawrence

Tony Walters

unread,

May 21, 2012, 3:48:09 PM5/21/12

to qiime...@googlegroups.com

Hello again Lawrence,

If you were able to type:

usearch

in the directory that it was in, and got the help text (rather than an error) it would indicate that it's a PATH issue. However, if it gave an error, then we would know it's another issue, likely permissions to execute the file (which could be fixed by doing: chmod 775 usearch

in the directory that usearch was in).

-Tony

Lawrence Davies

unread,

May 21, 2012, 3:51:42 PM5/21/12

to qiime...@googlegroups.com

Hi,

OK great. I did try 'chmod 775' too. Thanks very much for your help Tony. I'll give it a whirl tomorrow.

Cheers,

Lawrence

Lawrence Davies

unread,

May 22, 2012, 2:01:47 PM5/22/12

to qiime...@googlegroups.com

Hi,

I tried your advice today. Even if I go into the directory that usearch is in and type 'usearch' I get the comand not found error. Even after performing the chmod command, I still get the command error.

When I type 'ls-l' I can see 'usearch' written in white writing - then when I use the chmod comand, the colour of the writing turns to green so something does happen when the chmod command is used, but it still does not recognise the 'usearch' command.

Any ideas?

Cheers,

Lawrence

Jose Navas

unread,

May 22, 2012, 2:04:55 PM5/22/12

to qiime...@googlegroups.com

Hi Lawrence,

Can you type './usearch' into the directory that usearch is in?

Cheers,

Jose

2012/5/22 Lawrence Davies <lawrence...@gmail.com>

Lawrence Davies

unread,

May 23, 2012, 11:04:20 AM5/23/12

to qiime...@googlegroups.com

Hi Jose,

USEARCH 5.1.221
(C) Copyright 2010-11 Robert C. Edgar, all rights reserved.
http://drive5.com/usearch

Licensed to:

Common commands
===============
Clustering de novo (default is global alignment):
usearch -cluster seqs.sorted.fasta -uc results.uc -id 0.97 [-usersort]
    Specify -usersort if input is not sorted by length.
    Not recommended for OTU clustering. See manual.

Database search (default is local alignment):
usearch -query q.fasta -evalue 0.01 -blast6out results.b6
    -db db.fasta | -udb db.udb [-threads n] | -wdb db.wdb

Search + clustering of seqs that don't match (default is global alignment):
usearch -cluster seqs.sorted.fasta -db db.fasta -id 0.97 [-uc results.uc]
    [-seedsout seeds.fasta] [-consout cons.fasta]

Create udb or wdb database index:
usearch -makeudb db.fasta -output db.udb
usearch -makewdb db.fasta -output db.wdb

Dereplication, removing identical full-length sequences (does not search reverse strand):
usearch -derep_fullseq -cluster input.fasta -seedsout nr.fasta [-bithash] [-sizeout]

Dereplication, removing identical sub-sequences:
usearch -derep_subseq -cluster input.fasta -seedsout nr.fasta
    -w 32 -slots 40000003 [-sizeout]

Chimera detection (UCHIME ref. db. mode):
usearch -uchime q.fasta [-db db.fasta] [-chimeras ch.fasta]
    [-nonchimeras good.fasta] [-uchimeout results.uch] [-uchimealns results.alns]

Chimera detection (UCHIME de novo mode):
usearch -uchime amplicons.fasta [-chimeras ch.fasta] [-nonchimeras good.fasta]
     [-uchimeout results.uch] [-uchimealns results.alns]
Input is estimated amplicons with integer abundances specified using ";size=N".

Sort sequences by length:
usearch -sort seqs.fasta -output seqs.sorted.fasta
usearch -mergesort seqs.fasta -output seqs.sorted.fasta [-split S]
    Use -mergesort if too big for -sort. S is partition size in Mb, default 1000.0.

Sort sequences by cluster size/abundance specified by ";size=N" in label:
usearch -sortsize seqs.fasta -output seqs.sorted.fasta [-minsize n]

Output files
============
All formats are supported for clustering and searching.
-uc file           UCLUST format, tab-separated.
-blastout file     Human-readable verbose format similar to BLAST.
-blast6out file    Tab-separated, same as -outfmt 6 or -m8 option of NCBI BLAST.
-userout file      Tab-separated, fields specified by -userfields (see manual).
-seedsout file     FASTA file with cluster seeds, i.e. non-redundant version of input.
-consout file      FASTA file with consensus sequence for each cluster.
-fastapairs file   FASTA file with pair-wise alignments.

Search termination
==================
-maxaccepts N       Max accepted targets, 0=ignore, default 1.
-maxrejects N       Max rejected targets, 0=ignore, default 32.
-[no]usort          [Do not] test database sequences in U-sorted order. If -nousort is
                       specified, the entire database is searched and termination options
                       are ignored. Default is -usort.

Accept/reject criteria
======================
Criteria are combined with AND.

-id F
    Minimum identity, as a value 0.0 to 1.0, meaning 0% to 100% identity. No default value.
    The -iddef option specifies definiton of identity (see manual).

-evalue E
    Maximum E-value. Local alignments only. No default value.

-query[aln]fract F
    Minimum fraction of the query sequence covered by alignment. Default 0.0.

-target[aln]fract F
    Minimum fraction of the target sequence covered by alignment. Default 0.0.

-idprefix n / -idsuffix n
    First (last) n letters of the query must be identical to the target. Default 0.

-leftjust / -rightjust
    Left (right) terminal gaps cause reject. Recommended to use -idprefix if you
    use -leftjust or -idsuffix if you use -rightjust.

Alignment style
===============
-global             Default if -cluster is specified.
-local              Default if -query is specified.

Compressed index
================
-slots n           Size of compressed index table. Should be prime, e.g. 40000003.
                    Should also specify -w, typical is -w 16 to 32.

Misc.
=====
-quiet             Do not write progress messages to standard error.
-log filename      Write log file with information about parameters and performance.
-version           Show program version number and exit.
-help              This help.

See manual for more options.
qiime@qiime-VirtualBox:~/Edgar$

Cheers,

Lawrence

Jose Navas

unread,

May 23, 2012, 11:45:08 AM5/23/12

to qiime...@googlegroups.com

Hi Lawrence,

It seems you have the wrong version of USEARCH (5.1.221). You should download the version 5.2.32 from http://www.drive5.com/usearch/nonprofit_form.html. Then you should execute:

chmod 775 usearch5.2.32_i86linux32

And rename to usearch:

mv usearch5.2.32_i86linux32 usearch

Finally, you should add the irectory wich contains usearch to the path.

Let me know if it works!

Cheers,

Jose

2012/5/23 Lawrence Davies <lawrence...@gmail.com>

Lawrence Davies

unread,

May 24, 2012, 1:00:48 PM5/24/12

to qiime...@googlegroups.com

Hi Jose/Tony,

Yes, it worked! thank you so much for your help. I performed de novo chimera removal today and it worked fine. I tried reference based chimera removal on the same sequences using the command below:

pick_otus.py -i fungi_method_11_denoised_inflated.fna -m usearch --db_filepath=/home/qiime/fungi_database.fasta -o fungi/reference/m11_denoised/97/ --word_length 64 --cluster_size_filtering

It stops working with an error message saying the script has stopped responding. When I go into the output file the majority of the files are there but it looks like something goes wrong at the chimera removal step when the reference database is used because those two output files are empty. An example of the sequences in my reference database is shown below. Is there something wrong with the format? Any ideas what's going wrong?

>HM631728
TCTCCGTAGGTGAACCTGCGGAGGGATCATTACTGAGTGAGGGCCTTCGGGCTCGACCTCCAACCCTTTGTGAACACAACTTGTTGCTTCGGGGGCGACCCTGCCGTTTCGACGGCGAGCGCCCCCGGAGGCCTTCAAACACTGCATCTTTGCGTCGGAGTTTAAGTAAATTAAACAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCTTGGTATTCCGAGGGGCATGCCTGTTCGAGCGTCATTTCACCACTCAAGCCTCGCTTGGTATTGGGCGCCGCGGTGTTCCGCGCGCCTCAAAGTCTCCGGCTGAGCTGTCCGTCTCTAAGCGTTGTGATTTCATTAATCGCTTCGGAGCGCGGGCGGTCGCGGCCGTTAAATCTTTCATAAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGAACTTAA
>HM631727
TCTCCGTAGGTGAACCTGCGGAGGGATCATTACTGAGTGAGGGCCTTCGGGCTCGACCTCCAACCCTTTGTGAACACAACTTGTTGCTTCGGGGGCGACCCTGCCGTTTCGACGGCGAGCGCCCCCGGAGGCCTTCAAACACTGCATCTTTGCGTCGGAGTTTAAGTAAATTAAACAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCTTGGTATTCCGAGGGGCATGCCTGTTCGAGCGTCATTTCACCACTCAAGCCTCGCTTGGTATTGGGCGCCGCGGTGTTCCGCGCGCCTCAAAGTCTCCGGCTGAGCTGTCCGTCTCTAAGCGTTGTGATTTCATTAATCGCTTCGGAGCGCGGGCGGTCGCGGCCGTTAAATCTTTTATAAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGAACTTAA
>HM631726
TCTCCGTAGGTGAACCTGCGGAGGGATCATTACTGAGTGAGGGCCTTCGGGCTCGACCTCCAACCCTTTGTGAACACAACTTGTTGCTTCGGGGGCGACCCTGCCGTTTCGACGGCGAGCGCCCCCGGAGGCCTTCAAACACTGCATCTTTGCGTCGGAGTTTAAGTAAATTAAACAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCTTGGTATTCCGAGGGGCATGCCTGTTCGAGCGTCATTTCACCACTCAAGCCTCGCTTGGTATTGGGCGCCGCGGTGTTCCGCGCGCCTCAAAGTCTCCGGCTGAGCTGTCCGTCTCTAAGCGTTGTGATTTCATTAATCGCTTCGGAGCGCGGGCGGTCGCGGCCGTTAAATCTTTCACAAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGAACTTAA
>HM631725
TCTCCGTAGGTGAACCTGCGGAGGGATCATTACTGAGTGAGGGCCTTCGGGCTCGACCTCCAACCCTTTGTGAACACAACTTGTTGCTTCGGGGGCGACCCTGCCGTTTCGACGGCGAGCGCCCCCGGAGGCCTTCAAACACTGCATCTTTGCGTCGGAGTTTAAGTAAATTAAACAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCTTGGTATTCCGAGGGGCATGCCTGTTCGAGCGTCATTTCACCACTCAAGCCTCGCTTGGTATTGGGCGCCGCGGTGTTCCGCGCGCCTCAAAGTCTCCGGCTGAGCTGTCCGTCTCTAAGCGTTGTGATTTCATTAATCGCTTCGGAGCGCGGGCGGTCGCGGCCGTTAAATCTTTTACAAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGAACTTAA
>HM627540
TCGCAGGGGGGGGCTGCGGAAGGATCATTACAGTATTCTTTTGCCAGCGCTTAACTGCGCGGCGAAAAACCTTACACACAGTGTCTTTTTGATACAGAACTCTTGCTTTGGTTTGGCCTAGAGATAGGTTGGGCCAGAGGTTTAACAAAACACAATTTAATTATTTTTACAGTTAGTCAAATTTTGAATTAATCTTCAAAACTTTCAACAACGGATCTCTTGGTTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATATGAATTGCAGATTTTCGTGAATCATCGAATCTTTGAACGCACATTGCGCCCTCTGGTATTCCAGAGGGCATGCCTGTTTGAGCGTCATTTCTCTCTCAAACCCCCGGGTTTGGTATTGAGTGATACTCTTAGTCGGACTAGGCGTTTGCTTGAAAAGTATTGGCATGGGTAGTACTAGATAGTGCTGTCGACCTCTCAATGTATTAGGTTTATCCAACTCGTTGAATGGTGTGGCGGGATATTTCTGGTATTGTTGGCCCGGCCTTACAACAACCAAACAAGTTTGACCTCAAATCAGGTAGGAATACCCGCTGAACTTAAGCATATCAAAAGC
>HM043803
TCCGTAGGGGAACCTGCGGAAGGATCATTACCGAGTGAGGGCCCTCTGGGTCCAACCTCCCACCCGTGTTTATCGTACCTTGTTGCTTCGGCGAGCCCGCCTCACGGCCGCCGGGGGGCATCCGCCCCCGGGCCCGCGCTCGCCGAAGACACCATTGAACTCTGTCTGAAGATTGCAGTCTGAGTGATTAACTAAATCAGTTAAAACTTTCAACAACGGATCTCTTGGTTCCGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAGTCTTTGAACGCACATTGCGCCCCCTGGTATTCCGGGGGGCATGCCTGTCCGAGCGTCATTGCTGCCCTCAAGCACGGCTTGTGTGTTGGGCCCCGCCCCCCGGTTCCGGGGGGCGGACCCGAAAGGCAGCGGCGGCACCGCGTCCGGTCCTCGAGCGTATGGGGCTTCGTCACCCGCTCTGTAGGCCCGGCCGGCGCCCGCCGGCGACCCCAATCAATCTATCCAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGAACTTAAGCATATCAATAAGCGGAGGA
>HM049911
TCCGTAGGTGAACCTGCGGAAGGATCATTACCGAGTGCGGGCCCTCGTGGCCCAACCTCCCACCCTTGTCTCTATACACCCGTTGCTTTGGCGGGCCCACCGGGGCCACCTGGTCGCCGGGGGACGTCCGTCCCCGGGCCCGCGCCCGCCGAAGCGCTCTGTGAACCCTGATGAAGATGGGCTGTCTGAGTACCATGAAAATTGTCAAAACTTTCAACAATGGATCTCTTGGTTCCGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCCGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCCTGGCATTCCGGGGGGCATGCCTGTCCGAGCGTCATTTCTGCCCTCAAGCACGGCTTGTGTGTTGGGCGCGGTCCCCCCGGGGACCTGCCCGAAAGGCAGCGGCGACGTCCGTCTGGTCCTCGAGCGTATGGGGCTCTGTCACTCGCTCGGGAAGGACCTGCGGGGGTTGGTCACCACCATATTTTACCACGGTTGACCTCGGATCAGGTAGGAGTTACCCGCTGAACATAAGCATATCAATAAGGCGGAGGA
>HM439351
ACTGGGCTTCGGTCCATTTATCTACCCATCTACACCTGTGAACTGTTTATGTGCTTCGGCACGTTTTACACAAACTTCTAAATGTAATGAATGTAATCTTATTATAACAATAATAAAACTTTCAACAACGGATCTCTTGGCTTCCACATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAGTCTTTGAACGCAACTTGCGCCCTTTGGTATTCCGAAGGGCATGCCTGTTTGAGAGTCATGAAAATCTCAATCCCTCGGGTTTTATTACCTGTTGGACTTGGATTTGGGTGTTTGCCGCGACCTGCAAAGGACGTCGGCTCGCCTTAAATGTGTTAGTGGGAAGGTGATTACCTGTCAGCCCGGCGTAATAAGTTTCGCTGGGCCTATGGGGTAGTCTTCGGCTTGCTGATAACAACCATCTCTTTTTTGTTTGACCTCAAATCAGGTAGGGCTACCCGCTGAACTTAAGCATATCAATAAGCGGG
>GU111565
AGGACATTACCGAGTTTACAACTCCCAAACCCAATGTGAACGTTACCAATCTGTTGCCTCGGCGGGATTCTCTGCCCCGGGCGCGTCGCAGCCCCGGATCCCATGGCGCCCGCCGGAGGACCAACTCAAACTCTTTTTTCTCTCCGTCGCGGCCTACGTCGCGGCTCTGTTTTATTTTTGCTCTGAGCCTTTCTCGGCGACCCTAGCGGGCGTCTCGAAAATGAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTCCGAGCGTCATTTCAACCCTCGAACCCCTCCGGGGGGTCGGCGTTGGGGATCGGCCCCTCACCGGGCCGCCCCCGAAATACAGTGGCGGTCTCGCCGCAGCCTCTCCTGCGCAGTAGTTTGCACACTCGCACCGGGAGCGCGGCGCGGCCACAGCCGTAAAACACCCCAAACTCTGAAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAA

Best wishes,

Lawrence

Tony Walters

unread,

May 24, 2012, 1:06:58 PM5/24/12

to qiime...@googlegroups.com

Hello Lawrence,

You might run into problems if there are duplicate sequences in your reference sequences. Could you try putting a small subset (say 5 of the sequences) into a test file to use as the reference database for the --db_filepath parameter?

Also you can disable the reference chimera checking with -x to see if this is indeed causing the hang up.

-Tony

Lawrence Davies

unread,

May 25, 2012, 11:00:41 AM5/25/12

to qiime...@googlegroups.com

Hi Tony,

yes it definitely looks like it is slipping up when referencing the database. The database looks fine though. not sure what's happening - not too important anyway - I am using ITS fungal seqs so not sure about aligning the seqs prior to removing chimeras - might be best just to stick with the de-novo chimera removal.

I used the otu_category_significance.py today and got some really good results. But the otu table I've produced lumps everything together down to family level. I was hoping to use otu_category_significance.py using otu tables with increasingly greater taxonomic resolution e.g. are there significant differences between treatments at the phylum, class, order etc. level, rather than just looking at the Family level.

I tried split_otu_table_by_taxonomy.py but it didn't produce what I was after e.g. it produces a proteobacteria folder but within the folder it still goes down to the Family level.

I also tried the otu_category_significance.py by inputting the entire directory produced during split_otu_table_by_taxonomy.py (i.e. multiple ANOVAs) but the output only contained the top line with no data e.g. prob, corrected probs, treatments etc but no results. Any ideas how to get around this?

Thanks very much,

Lawrence

Tony Walters

unread,

May 25, 2012, 11:04:38 AM5/25/12

to qiime...@googlegroups.com

Hello Lawrence,

summarize_taxa.py probably is what you're looking for ( http://qiime.org/scripts/summarize_taxa.html ). It creates .txt files though, so you'd have to do some conversions back to .biom format with convert_biom.py (note that now the first column will be the taxonomy, rather than an OTU ID).

-Tony

Lawrence Davies

unread,

May 28, 2012, 11:44:34 AM5/28/12

to qiime...@googlegroups.com

Hi Tony,

Thanks very much - that was exactly what I was looking for - a way to convert the output from summarize_taxa.py into a usable format - just in case you don't know -the convert_biom.py is no longer on the QIIME scripts page.

I have been trying to re-train the classifier for assign_taxonomy.py with a 23S database I have downloaded from SILVA. I have only tested on 5 or so sequences that I know are present in my OTU file to try and get it working but I'm having some trouble.

I am using BLAST rather than rdp as sometimes the depth goes greater or less than six taxonomic levels - I have used this on bacteria (greengenes) and fungal ITS before without any issues.

The assign_taxonomy.py step does work and the output log file and txt file are created. The txt.file output links a number of OTUs to the 5 sequences in my test database but instead of linking the OTU to the taxonomy of the sequence it links it to the name of fasta reference folder e.g.

198    No blast hit    None    None
344    None    8e-18    1_galaxy_17.fasta
0    None    0.0    1_galaxy_17.fasta
346    No blast hit    None    None
347    No blast hit    None    None
340    None    2e-34    1_galaxy_17.fasta
341    No blast hit    None    None
342    None    8e-18    1_galaxy_17.fasta
343    No blast hit    None    None
348    None    6e-37    1_galaxy_17.fasta
349    No blast hit    None    None
298    None    1e-32    1_galaxy_17.fasta
299    No blast hit    None    None
296    No blast hit    None    None
297    No blast hit    None    None
294    None    2e-21    1_galaxy_17.fasta
295    No blast hit    None    None
292    None    4e-38    1_galaxy_17.fasta
293    None    4e-29    1_galaxy_17.fasta

In place of 1_galaxy_17.fasta it should say 'Root;Bacteria;Cyanobacteria;SubsectionIV;SubgroupI;Nostoc;NostocpunctiformePCC73102' 'galaxy_17.fasta is the name of the small reference database with 5 sequences. Any ideas what is going on?

Fasta reference file:

CP000951_2906028_2908838    GGUCAAGCUACAAAGGGCUAACGGUGGAUACCUAGGCACACAGAGGCGAUGAAGGACGUGGUUACCGACGAUAUACUCCGGGGAGCUGGAAGCAAGCAUUGAGCCGGAGGUUUCCGAAUGGGGCAACCCUAAAUACAGCCAUCUGAAUAUAUAGGAUGGUAUGAGCCAACUCAGCGAAUUGAAACAUCUUAGUAGCUGAAGGAAGAGAAAGAAAAAUCGAUUCCCUUAGUAGCGGCGAGCGAAGCGGGAAGAGCCUAAACCAACUGCUUAGGCGGUUGGGGUUGUGGGACAGUGAUGUGGACUAUAGAGGUUAGACGAAGUAGUUGAAAGCUACACCAAAGAAGGUGAAAGUCCUGUAGUCGAAAAUCGAAGUAGCCUAACUGUAUCCCGAGUAGGCCGGAGCACGUGAAAUUCCGGUUGAAUCAGCGAGGACCACCUCGUAAGGCUAAAUACUACUGUGUGACCGAUAGUGUAAAAGUACCGCGAGGGAAAGGUGAAAAGAACCCCGGGAGGGGAGUGAAAUAGAACAUGAAACCGUUAGCCUACAAGCAAUGGGAGGACGAUUUAACGUCUGACCGUGUGCCUGUUGAAGAAUGAGCCGGCGACUUACAGGCUGUGGCAGGUUAAGGUGAAAAGCCGAAGCCAAAGUGAAAACGAGUCUGAAAAGGGCGUUAGUCACAGUUUGUAGACCCGAACCCGGGUGAUCUAACCAUGGCCAGGAUGAAGCUUGGGUAAUACCAAGUGGAGGUCCGAACCGACUUCUGUUGAAAAAGGAGCGGAUGAGCUGUGGUUAGGGGUGAAAUGCCAAUCGAACCCGGAGCUAGCUGGUUCUCCCCGAAAUGUGUUUAGGCGCAGCGGUUGGUUUCCUUGCAUGGGGGUAAAGCACUGUUUCGCUGCGGGCUGCGAGAGCGGUACCAAAGUGAGACAAACUAAGAAUACCAUGUAAAAUUCAGCCAGUAAGACGGUGGGGGAUAAGCUUCAUCGUCGAGAGGGAAACAGCCCAGACCGCCAGCUAAGGUCCCAAAAUACUUGCUAAGUGAUAAAGGAGGUGGGAGUGCAUAGACAACCAGGAGGUUUGCCUAGAAGCAGCAAUCCUUGAAAGAGUGCGUAAUAGCUCACUGGUCAAGCGCUCCUGCGCCGAAAAUGAACGGGGCUAAGCAAGUUACCGAAGCUGCGGACUCGAAAGAGUGGUAGGGGAGCGUUCUAUAUGGUGUGAAGCAUUAGCGGUGAGCAGAUGUGGACUGUAUAGAAGUGAGAAUGUCGGCUUAAGUAGCGAAAAUAUGUGUGAGAAUCACAUACCCCGAAACCCUAAGGGUUCCUCCGGAAGGCUCGUCCGCGGAGGGUUAGUCGGGACCUAAGGCGAGGCCGAAAGGCGUAGUCGAUGGACAUGAGGUUAAUAUUCCUCAACUCCUGUGUGGGAGCAUAACUAUGACGCAUGAAAGAUAGCUACACCCUGAAUGGAUUGGGAGGAGACUACGGUCUCCGCGUAGUAAAGGAUAGUGCCUAGAAAAGCUAGUUAUGUGUUGAAAGCACAGGACCCGUACCCGAAACCGACACAGGUAGGGUGGUAGAGUAUACCGAGGGGCGCGAGGUAACUCUCUCUAAGGAACUCGGCAAAAUUGCUCCGUAACUUUGGGAGAAGGAGUGCCAGCGAGAGCUGGUCGCAGUGAAGAGGCCCAGGCGACUGUUUACCAAAAACACAGGUCUCUGCAAACUCGUAAGAGGACGUAUAGGGGCUGACGCCUGCCCAGUGCCGGAAGGUUAAGGAAGUUGGUUAGCUUAGGCGAAGCUGACGACCGAAGCCCCGGUGAACGGCGGCCGUAACUAUAACGGUCCUAAGGUAGCGAAAUUCCUUGUCGGGUAAGUUCCGACCCGCACGAAAGGCGUAACGAUCUGGGCACUGUCUCGGAGAGAGGCUCGGCGAAAUAGGAUUGUCUGUGAAGAUACGGACUACCUGCACCUGGACAGAAAGACCCUAUGAAGCUUUACUGUAGCUUGGUAUUGGGUUCGGGCUUUGUUUGCGCAGGAUAGGUGGGAGACGUUGAGAUUACUCUUGUGGGAGUAAUGGAGUCACUGGUGAGAUACCACUCUAACAAGGCUAGAAUUCUAACUUUAAACCGUGAAGCCGGUGAAAGGACAGUAUCAGGUGGGCAGUUUGACUGGGGCGGUCGCCUCCUAAAUGAUAACGGAGGCGCACAAAGGUUCCCUCAGGCUGGUUGGAAAUCAGCCAUCGAGUGCAAAAGCAGAAGGGAGCUUGACUGCGAGACCUACAAGUCAAGCAGGGACGAAAGUCGGUUUUAGUGAUCCGACGGCGCUGCGUGGAAGGGCCGUCGCUCAACGGAUAAAAGUUACUCUAGGGAUAACAGGCUGAUCUCCCCCAAGAGUUCACAUCGACGGGGAGGUUUGGCACCUCGAUGUCGGCUCAUCGCAACCUGGGGCUGAAGUAGGUCCCAAGGGUUGGGCUGUUCGCCCAUUAAAGCGGUACGUGAGCUGGGUUCAGAACGUCGUGAGACAGUUCGGUCCAUAUCCGGUGCAGGCGUAAGAGCAUUGAGAGGAGUCUUCCUUAGUACGAGAGGACCGGGAAGGACGCACCGCUGGUGUACCUGUUAUCGUGCCAACGGUAAACGCAGGGUAGCCAAGUGCGGAGCGGAUAACCGCUGAAAGCAUCUAAGUGGGAAGCCCACCUCAAGAUGAGUGCUCUCAUGGAGUUAAUCCAGUAAGGUCACGGGAAGAACACCCGUUAAUAGGCAUUAGGUGGAAGUGUGGCAACAUAUGGAGCCGAGAUGUCCUAACAGACCGAGGGCUUGUCCU
CP001037_5512119_5515008    GGUCAAGCUAAUAAGGGCUAACGGUGGAUACCUAGGCACACAGAGGCGAUGAAGGACGUGGUUACCGACGAUAUGCUCCGGGGAGUUGGAAGCAAACAUUGAGCCGGAGAUUUCCGAAUGGGGCAACCCUUAAUACUACCUGCUGAAUAUAUAGGCAGGAGAGAGCCAACCCAGCGAAUUGAAACAUCUUAGUAGCUGGAGGAAGAGAAAUCAAAACAGAGAUUCCCUAAGUAGUGGUGAGCGAAAGGGGAAAAGCCUAAACCAAUUGGUUUACCGAUUGGGGUAGUGGGACAGCAAUAUCGAAUCUGGCGGUUAAACGAAGCAGCUAAAUACUGCACCAAAGAAGGUGAAAGUCCUGUAGUUGAAAACUCAAGGAUAGUAGCUGAAUCCCGAGUAGCAUGGGGCACGAGGAAUCCCAUGUGAAUCAGCGAGGACCACCUCGUAAGGCUAAAUACUACUGUGUGACCGAUAGUGAACCAGUACCGCGAGGGAAAGGUGAAAAGAACCCCGGAAGGGGAGUGAAAUAGAACAUGAAACCGUUAGCUUACAAGCAGUGGGAGGACUAUUUAAAGUCUGACCGCGUGCCUGUUGAAGAAUGAGCCGGCGACUUAUAGGCACUGGUAGGUUAAAGCGAGAAUGCUGGAGCCAAAGGGAAACCGAGUCUGAAAAGGGCGAUAAUCAGUGUUUAUAGACCCGAACCCUGGUGAUCUAACCAUGGCCAGGAUGAAGCUUGGGUAACACCAAGUGGAGGUCCGCACCGACUGAUGUUGAAAAAUCAGCGGAUGAGUUGUGGUUAGGGGUGAAAUGCCAAUCGAACCAGGAGCUAGCUGGUUCUCCCCGAAAUGUGUUUAGGCGCAGCGGUAAUGAUUAUAUCUGGGGGGUAAAGCACUGUUUCGGUGCGGGCUGGGAGACCGGUACCAAAUCGAGACAAACUCUGAAUACCCAGAGCACACAUUGCCAGUGAGACAGUGGGGGAUAAGCUUCAUUGUCAAGAGGGAAACAGCCCAGACCACCAGCUAAGGUCCCCAAAUCAUCGCUAAGUGAUAAAGGAGGUGAGAGUGCACAGACAACUAGGAGGUUUGCCUAGAAGCAGCCACCCUUGAAAGAGUGCGUAAUAGCUCACUAGUCAAGCGCUCUCGCGCCGAAAAUGAACGGGGAUAAGCGAUGUACCGAAGCUGUGGGAUUAACUUAUGUUAAUCGGUAGGGGAGCGUUCCGUCGUAGGUAGAAGCAGUAGCGGCAAGCAGCUGUGGACGAAACGGAAGUGAGAAUGUCGGCUUGAGUAGCGCAAACAUUGGUGAGAAUCCAAUGCCCCGAAACCCUAAGGUUUCCUCCGCCAGGUUCGUCCCCGGAGGGUUAGUCAGGACCUAAGGCGAGGCCGAACGGCGUAGUCGAUGGACAACGGGUUAAAAUUCCCGUACUGAUUGUAGGUUGUGCAGAGGGACGGAGAAGAUGAAUGUCAGCCGGAUGUUGGUUACCGGUUCAAGCGUCAAGAUGUUGAGAGACGGCGAAAACGUUUCGAGUUGAGGCGUGAGUACGACCCGCUACGGCGGGGAAGUGGCAUAGUCUAGCUUCCAAGAAAAGCUCUAAACACGUUAACUUACAGUUACCUGUACCCGAAACCGACACAGGUAGGGAGGUUGAGAAUACCAAGGGGCGCGAGAUAACUCUCUCUAAGGAACUCGGCAAAAUGGCCCCGUAACUUCGGAAGAAGGGGUGCCCACCUCAGACGUGGGUCGCAGUGAAGAGAUCCAGGCGACUGUUUACCAAAAACACAGGUCUCCGCAAAGUCAUUAAGACGCAGUAUGGGGGCUGACGCCUGCCCAGUGCCGGAAGGUUAAGGAAGUUGGUCAGGGGGAAACCUUGAAGCUAGCGACCGAAGCCCCGGUGAACGGCGGCCGUAACUAUAACGGUCCUAAGGUAGCGAAAUUCCUUGUCGGGUAAGUUCCGACCCGCACGAAAGGCGUAACGAUCUGGAUGGUGUCUCAGAGAGAGACUCGGCGAAAUAGGAAUGUCUGUGAAGAUACGGACUGCCUGCACCUGGACAGAAAGACCCUAUGAAGCUUUACUGUAGCCUGGAAUUGUGUUCGGGCUUGGCUUGCGCAGGAUAGGUGGGAGGCGAUGAAGUAUUCCUUGUGGGGAGUAUGGAGCCAACGGUGAGAUACCACUCUGGCGAAGCUAGAAUUCUAACCCAUGACCGUUAUCCGGUCAGGGAACAGUUUCAGGUGGGCAGUUUGACUGGGGCGGUCGCCUCCUAAAAGGUAACGGAGGCGCGCAAAGGUUCCCUCAGCACGCUUGGAAACCGUGCGGCGAGUGUAAAGGCAUAAAGGGAGCUUGACUGCAAGACUGACAAGUCGAGCAGGUACGAAAGUAGGCCUUAGUGAUCCGACGGCGCAGAGUGGAAUGGCCGUCGCUCAACGGAUAAAAGUUACUCUAGGGAUAACAGGCUGAUCUCCCCCAAGAGUCCACAUCGACGGGGAAGGUUUGGCACCUCGAUGUCGGCUCAUCGCAACCUGGGGCGGAAGUACGUCCCAAGGGUUGGGCUGUUCGCCCAUUAAAGCGGUACGUGAGCUGGGUUCAGAACGUCGUGAGACAGUUCGGUCCAUAUCCGGUGCAGGCGUAAGAGCAUUGAGAGGAGUCCUCCUUAGUACGAGAGGACCGGGAGGAACGCACCGCUGGUGUACCAGUUAUCGUGCCAACGGUAAACGCUGGGUAGCCAAGUGCGGAGCGGAUAACCGCUGAAAGCAUCUAAGUGGGAAGCCCACCUCAAGAUGAGUGCUCUCACCACGUAAGUGGGUAAGGUCACGGGAAGAACACCCGUUCUUAGGCGGUAGGUGGAAGUGCAGUAAUGUAUGUAGCCGAGCCGUGCUAACAGACCGAGGGCUUGACCUC
CP001037_6499464_6502352    GGUCAAGCUAAUAAGGGCUAACGGUGGAUACCUAGGCACACAGAGGCGAUGAAGGACGUGGUUACCGACGAUAUGCUCCGGGGAGUUGGAAGCAAACAUUGAGCCGGAGAUUUCCGAAUGGGGCAACCCUUAAUACUACCUGCUGAAUAUAUAGGCAGGAGAGAGCCAACCCAGCGAAUUGAAACAUCUUAGUAGCUGGAGGAAGAGAAAUCAAAACAGAGAUUCCCUAAGUAGUGGUGAGCGAAAGGGGAAAAGCCUAAACCAAUUGGUUUACCGAUUGGGGUAGUGGGACAGCAAUAUCGAAUCUGGCGGUUAAACGAAGCAGCUAAAUACUGCACCAAAGAAGGUGAAAGUCCUGUAGUUGAAAACUCAAGGAUAGUAGCUGAAUCCCGAGUAGCAUGGGGCACGAGGAAUCCCAUGUGAAUCAGCGAGGACCACCUCGUAAGGCUAAAUACUACUGUGUGACCGAUAGUGAACCAGUACCGCGAGGGAAAGGUGAAAAGAACCCCGGAAGGGGAGUGAAAUAGAACAUGAAACCGUUAGCUUACAAGCAGUGGGAGGACUAUUUAAAGUCUGACCGCGUGCCUGUUGAAGAAUGAGCCGGCGACUUAUAGGCACUGGUAGGUUAAAGCGAGAAUGCUGGAGCCAAAGGGAAACCGAGUCUGAAAAGGGCGAUAAUCAGUGUUUAUAGACCCGAACCCUGGUGAUCUAACCAUGGCCAGGAUGAAGCUUGGGUAACACCAAGUGGAGGUCCGCACCGACUGAUGUUGAAAAAUCAGCGGAUGAGUUGUGGUUAGGGGUGAAAUGCCAAUCGAACCAGGAGCUAGCUGGUUCUCCCCGAAAUGUGUUUAGGCGCAGCGGUAAUGAUUAUAUCUGGGGGGUAAAGCACUGUUUCGGUGCGGGCUGGGAGACCGGUACCAAAUCGAGACAAACUCUGAAUACCCAGAGCACACAUUGCCAGUGAGACAGUGGGGGAUAAGCUUCAUUGUCAAGAGGGAAACAGCCCAGACCACCAGCUAAGGUCCCCAAAUCAUCGCUAAGUGAUAAAGGAGGUGAGAGUGCACAGACAACUAGGAGGUUUGCCUAGAAGCAGCCACCCUUGAAAGAGUGCGUAAUAGCUCACUAGUCAAGCGCUCUCGCGCCGAAAAUGAACGGGGCUAAGCGAUGUACCGAAGCUGUGGGAUUAACUUAUGUUAAUCGGUAGGGGAGCGUUCCGUCGUAGGUAGAAGCAGUAGCGGCAAGCAGCUGUGGACGAAACGGAAGUGAGAAUGUCGGCUUGAGUAGCGCAAACAUUGGUGAGAAUCCAAUGCCCCGAAACCCUAAGGUUUCCUCCGCCAGGUUCGUCCCCGGAGGGUUAGUCAGGACCUAAGGCGAGGCCGAACGGCGUAGUCGAUGGACAACGGGUUAAAAUUCCCGUACUGAUUGUAGGUUGUGCAGAGGGACGGAGAAGAUGAAUGUCAGCCGGAUGUUGGUUACCGGUUCAAGCGUCAAGAUGUUGAGAGACGGCGAAAACGUUUCGAGUUGAGGCGUGAGUACGACCCGCUACGGCGGGGAAGUGGCAUAGUCUAGCUUCCAAGAAAAGCUCUAAACACGUUAACUUACAGUUACCUGUACCCGAAACCGACACAGGUAGGGAGGUUGAGAAUACCAAGGGGCGCGAGAUAACUCUCUCUAAGGAACUCGGCAAAAUGGCCCCGUAACUUCGGAAGAAGGGGUGCCCACCUCAGACGUGGGUCGCAGUGAAGAGAUCCAGGCGACUGUUUACCAAAAACACAGGUCUCCGCAAAGUCAUUAAGACGCAGUAUGGGGGCUGACGCCUGCCCAGUGCCGGAAGGUUAAGGAAGUUGGUCAGGGGGAAACCUUGAAGCUAGCGACCGAAGCCCCGGUGAACGGCGGCCGUAACUAUAACGGUCCUAAGGUAGCGAAAUUCCUUGUCGGGUAAGUUCCGACCCGCACGAAAGGCGUAACGAUCUGGAUGGUGUCUCAGAGAGAGACUCGGCGAAAUAGGAAUGUCUGUGAAGAUACGGACUGCCUGCACCUGGACAGAAAGACCCUAUGAAGCUUUACUGUAGCCUGGAAUUGUGUUCGGGCUUGGCUUGCGCAGGAUAGGUGGGAGGCGAUGAAGUAUUCCUUGUGGGGAGUAUGGAGCCAACGGUGAGAUACCACUCUGGCGAAGCUAGAAUUCUAACCCAUGACCGUUAUCCGGUCAGGGAACAGUUUCAGGUGGGCAGUUUGACUGGGGCGGUCGCCUCCUAAAAGGUAACGGAGGCGCGCAAAGGUUCCCUCAGCACGCUUGGAAACCGUGCGGCGAGUGUAAAGGCAUAAAGGGAGCUUGACUGCAAGACUGACAAGUCGAGCAGGUACGAAAGUAGGCCUUAGUGAUCCGACGGCGCAGAGUGGAAUGGCCGUCGCUCAACGGAUAAAAGUUACUCUAGGGAUAACAGGCUGAUCUCCCCCAAGAGUCCACAUCGACGGGGAGGUUUGGCACCUCGAUGUCGGCUCAUCGCAACCUGGGGCGGAAGUACGUCCCAAGGGUUGGGCUGUUCGCCCAUUAAAGCGGUACGUGAGCUGGGUUCAGAACGUCGUGAGACAGUUCGGUCCAUAUCCGGUGCAGGCGUAAGAGCAUUGAGAGGAGUCCUCCUUAGUACGAGAGGACCGGGAGGAACGCACCGCUGGUGUACCAGUUAUCGUGCCAACGGUAAACGCUGGGUAGCCAAGUGCGGAGCGGAUAACCGCUGAAAGCAUCUAAGUGGGAAGCCCACCUCAAGAUGAGUGCUCUCACCACGUAAGUGGGUAAGGUCACGGGAAGAACACCCGUUCUUAGGCGGUAGGUGGAAGUGCAGUAAUGUAUGUAGCCGAGCCGUGCUAACAGACCGAGGGCUUGACCUC

when I put '>' in front of the sequence i.d.no sequences are matched up to the i.d. taxonomy. do you know why that is?

i.d.to taxonomy file:

CP000951_2906028_2908838    Root;Bacteria;Cyanobacteria;SubsectionI;Synechococcus;Synechococcussp_PCC7002
CP001037_5512119_5515008    Root;Bacteria;Cyanobacteria;SubsectionIV;SubgroupI;Nostoc;NostocpunctiformePCC73102
CP001037_6499464_6502352    Root;Bacteria;Cyanobacteria;SubsectionIV;SubgroupI;Nostoc;NostocpunctiformePCC73102

I created the two folders by using Galaxy Fasta to table function, importing the output into excel, and saving as tab delimited - it opens up in Gedit fine. I've attached everything - if you have time to run it through your system and offer any advice I would really appreciate it.

print_qiime_config.py

System information
==================
Platform: linux2

Python version: 2.7.3 (default, Apr 20 2012, 23:04:22) [GCC 4.6.3]
Python executable: /home/qiime/qiime_software/python-2.7.1-release/bin/python

Dependency versions
===================
                     PyCogent version:    1.5.1
                        NumPy version:    1.5.1
                   matplotlib version:    1.1.0

                  biom-format version:    0.9.3
                QIIME library version:    1.5.0
                 QIIME script version:    1.5.0

PyNAST version (if installed): 1.1
RDP Classifier version (if installed): rdp_classifier-2.2.jar

QIIME config values
===================

                     blastmat_dir:    /home/qiime/qiime_software/blast-2.2.22-release/data
                         sc_queue:    all.q
      topiaryexplorer_project_dir:    None
     pynast_template_alignment_fp:    /home/qiime/qiime_software/core_set_aligned.fasta.imputed
                  cluster_jobs_fp:    /home/qiime/qiime_software/qiime-1.5.0-release/bin/start_parallel_jobs.py
pynast_template_alignment_blastdb:    None
assign_taxonomy_reference_seqs_fp:    /home/qiime/qiime_software/gg_otus-4feb2011-release/rep_set/gg_97_otus_4feb2011.fasta
                     torque_queue:    friendlyq
              qiime_test_data_dir:    None
   template_alignment_lanemask_fp:    /home/qiime/qiime_software/lanemask_in_1s_and_0s
                    jobs_to_start:    1
                cloud_environment:    False
                qiime_scripts_dir:    /home/qiime/qiime_software/qiime-1.5.0-release/bin
            denoiser_min_per_core:    50
                      working_dir:    /tmp/
                    python_exe_fp:    /home/qiime/qiime_software/python-2.7.1-release/bin/python
                         temp_dir:    /tmp/
                      blastall_fp:    /home/qiime/qiime_software/blast-2.2.22-release/bin/blastall
                 seconds_to_sleep:    60

Cheers,

Lawrence

galaxy17_tab.txt

galaxy_17.fasta

Tony Walters

unread,

May 28, 2012, 12:34:33 PM5/28/12

to qiime...@googlegroups.com

Hello Lawrence,

There are two things that need to be fixed in the fasta file. The first is that the format needs to be

>fasta label (new line instead of tab separated)

sequence

>fasta label

sequence

and so on.

And the U characters need to be replaced with T characters. Doing a global replace will be problematic, as there are probably "U" characters in the fasta labels, which you don't want to replace. There is a script in another software package, PrimerProspector, which can be used to convert "U" to "T", removed gaps/spaces, in fasta files without altering fasta labels. The PrimerProspector code is available here: http://pprospector.sourceforge.net/ and the script you would want is clean_fasta.py

As PrimerProspector uses mostly the same dependencies as QIIME (only one that is different is not needed in your case, as it deals with RNA/DNA secondary structure prediction) it shouldn't be much of an issue to install.

I was able to alter the fasta file you sent to match the above requirements, and grabbed a subsequence and assigned with blast and got this:

test.seqs_1 Root;Bacteria;Cyanobacteria;SubsectionI;Synechococcus;Synechococcussp_PCC7002 0.0 CP000951_2906028_2908838

Which is the desired output format.

With regards to the convert_biom.py scripts-they are part of an independent package from QIIME (http://biom-format.org/index.html), and were only integrated into QIIME for part of the 1.4.0 development version before it was decided that the biom format software should be separated.

-Tony

Lawrence Davies

unread,

May 29, 2012, 9:29:46 AM5/29/12

to qiime...@googlegroups.com

Hi Tony,

Thanks very much for checking that. It worked fine today and I've run my representative OTUs through the whole database now with no problems.

Still having trouble with the convert_biom.py - the output .biom table is in a strange format. I have attached the input (.txt) and output (.biom) file. The input file is the output from summarize_taxa.py. any ideas what I am doing wrong. The command I'm using is:

convert_biom.py -i otu_table_L4.txt -o table.from_L4.biom --biom_table_type="otu table"

The output is the same for both dense and sparse biom commands

Thanks again!

Lawrence

otu_table_L4.txt

table.from_L4.biom

Tony Walters

unread,

May 29, 2012, 10:45:49 AM5/29/12

to qiime...@googlegroups.com

Hello Lawrence,

Remember that when you summarize taxa, you no longer have OTUs-the first row of the column becomes the taxonomy that you are summarizing all of the OTUs that match that taxonomy under. Also note that the abundances are by default relative abundance, if you want absolute counts of data, you need to use the -a parameter with summarize_taxa.py.

-Tony

Lawrence Davies

unread,

May 29, 2012, 12:11:26 PM5/29/12

to qiime...@googlegroups.com

Hi Tony,

Sorry I'm not sure I'm following - there are no issues with the summarize_taxa.py but when I input the .txt files it produces into convert_biom.py the output doesn't make any sense to me. Is there a way to use summarise taxa and keep the OTU column?

Cheers,

Lawrence

Tony Walters

unread,

May 29, 2012, 1:19:27 PM5/29/12

to qiime...@googlegroups.com

Hello Lawrence,

There isn't a way to keep the OTUs, as each taxonomy can contain many OTUs. For instance, at the phylum level, there are likely to be many OTUs in say the Firmicutes, and all OTUs will fall into this category will be summed up as counts of "Firmicutes."

Here is the command I used on the tutorial data after running summarize_taxa.py (with -a for absolute abundance).

convert_biom.py --biom_table_type "otu table" -i otu_table_L2.txt -o otu_table_L2.biom

The output otu_table_L2.biom file could then be used for the otu_category_significance.py script to find significant categories of taxa (in this case I tested with ANOVA).

So by "strange", do you mean it does not work with particular scripts?

-Tony

Lawrence Davies

unread,

May 30, 2012, 7:22:43 AM5/30/12

to qiime...@googlegroups.com

Hi Tony,

The output I get from convert_biom.py doesn't work in otu_category_significance.py. This post describes the problem I am having

http://groups.google.com/group/qiime-forum/browse_thread/thread/942073eb5ba3e1d1/ba752e0831d10472?lnk=gst&q=convert_biom.py#ba752e0831d10472

but it differs in that their output contains 'metadata: taxonomy' and is fixed by adding 'process_obs_metadata=taxonomy' to their command. However, my output is e.g.

"rows": [{"id": "Root;Root;Other;Other", "metadata": null}, {"id": "Root;Root;k__Bacteria;Other", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__ABY1_OD1", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Acidobacteria", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Actinobacteria", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Armatimonadetes", "metadata": null},

Do you know why I am getting this? thank you for your continued help.

cheers,

Lawrence

Tony Walters

unread,

May 30, 2012, 9:36:29 AM5/30/12

to qiime...@googlegroups.com

Hello Lawrence,

I did not need to use the process_obs_metadata=taxonomy in my convert_biom.py command. It was simply this:

convert_biom.py --biom_table_type "otu table" -i otu_table_L2.txt -o otu_table_L2.biom

Can you try running a similar command (change the -i to point to whichever taxa summary level you're interested in, and -o to be whatever output .biom file name you want), and then run otu_category_significance.py?

If you're getting an error at that point, please post the entire error here, as well as the exact command used for otu_category_significance.py.

-Tony

Lawrence Davies

unread,

May 30, 2012, 9:52:30 AM5/30/12

to qiime...@googlegroups.com

Hi Tony,

command and error shown below

ldavies@sci102[convert_biom] qiime > otu_category_significance.py -i even1817_L4.biom -m bacteria_mapping.txt -s ANOVA -c Treatment -f single_anova.txt

Traceback (most recent call last):
File "/usr/lib/qiime/bin/otu_category_significance.py", line 238, in <module>
    main()
File "/usr/lib/qiime/bin/otu_category_significance.py", line 222, in main
    category, threshold, filter, otu_include)
File "/usr/lib/python2.6/dist-packages/qiime/otu_category_significance.py", line 646, in test_wrapper
    parse_otu_table(otu_table, float)
File "/usr/lib/python2.6/dist-packages/qiime/parse.py", line 450, in parse_otu_table
    " Sample ID line:\n %s" % line
ValueError: Error parsing sample IDs in OTU table. Sample ID line:
{"rows": [{"id": "Root;Root;Other;Other", "metadata": null}, {"id": "Root;Root;k__Bacteria;Other", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__ABY1_OD1", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Acidobacteria", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Actinobacteria", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Armatimonadetes", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Bacteroidetes", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__CCM11b", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Chlorobi", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Chloroflexi", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Cyanobacteria", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Elusimicrobia", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Firmicutes", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__GAL15", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__GN02", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Gemmatimonadetes", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Lentisphaerae", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__NC10", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Nitrospirae", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__OP3", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Planctomycetes", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Proteobacteria", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__SBR1093", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__SC3", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__SC4", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__SM2F11", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__SPAM", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__TM6", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__TM7", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Tenericutes", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Verrucomicrobia", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__WPS-2", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__WS2", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__WS3", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__ZB2", "metadata": null}], "format": "Biological Observation Matrix 0.9.3", "data": [[0, 0, 2.0], [0, 1, 1.0], [0, 2, 3.0], [0, 4, 2.0], [1, 0, 78.0], [1, 1, 78.0], [1, 2, 90.0], [1, 3, 37.0], [1, 4, 48.0], [1, 5, 39.0], [2, 0, 2.0], [2, 2, 1.0], [3, 0, 327.0], [3, 1, 348.0], [3, 2, 326.0], [3, 3, 155.0], [3, 4, 196.0], [3, 5, 142.0], [4, 0, 180.0], [4, 1, 167.0], [4, 2, 185.0], [4, 3, 151.0], [4, 4, 188.0], [4, 5, 178.0], [5, 0, 5.0], [5, 1, 10.0], [5, 2, 9.0], [5, 3, 2.0], [5, 4, 5.0], [5, 5, 2.0], [6, 0, 44.0], [6, 1, 25.0], [6, 2, 24.0], [6, 3, 12.0], [6, 4, 17.0], [6, 5, 10.0], [7, 0, 4.0], [7, 4, 1.0], [8, 0, 6.0], [8, 1, 7.0], [8, 2, 4.0], [8, 3, 2.0], [8, 4, 3.0], [8, 5, 6.0], [9, 0, 169.0], [9, 1, 181.0], [9, 2, 175.0], [9, 3, 78.0], [9, 4, 124.0], [9, 5, 106.0], [10, 0, 4.0], [10, 1, 9.0], [10, 2, 3.0], [10, 3, 286.0], [10, 4, 185.0], [10, 5, 278.0], [11, 0, 1.0], [11, 1, 2.0], [11, 2, 2.0], [12, 0, 65.0], [12, 1, 84.0], [12, 2, 63.0], [12, 3, 206.0], [12, 4, 93.0], [12, 5, 272.0], [13, 0, 1.0], [13, 1, 1.0], [14, 0, 3.0], [14, 1, 4.0], [14, 2, 4.0], [14, 3, 6.0], [14, 4, 1.0], [14, 5, 9.0], [15, 0, 48.0], [15, 1, 57.0], [15, 2, 58.0], [15, 3, 14.0], [15, 4, 29.0], [15, 5, 16.0], [16, 1, 1.0], [17, 0, 1.0], [17, 2, 2.0], [17, 3, 1.0], [18, 0, 47.0], [18, 1, 46.0], [18, 2, 30.0], [18, 3, 6.0], [18, 4, 7.0], [18, 5, 15.0], [19, 4, 1.0], [20, 0, 58.0], [20, 1, 38.0], [20, 2, 48.0], [20, 3, 34.0], [20, 4, 32.0], [20, 5, 35.0], [21, 0, 710.0], [21, 1, 694.0], [21, 2, 718.0], [21, 3, 782.0], [21, 4, 849.0], [21, 5, 678.0], [22, 2, 3.0], [23, 4, 1.0], [24, 2, 1.0], [25, 4, 2.0], [25, 5, 1.0], [26, 0, 26.0], [26, 1, 24.0], [26, 2, 25.0], [26, 3, 10.0], [26, 4, 16.0], [26, 5, 13.0], [27, 0, 10.0], [27, 1, 7.0], [27, 3, 8.0], [27, 4, 3.0], [27, 5, 1.0], [28, 0, 1.0], [28, 1, 5.0], [28, 3, 1.0], [28, 5, 2.0], [29, 2, 1.0], [29, 3, 1.0], [30, 0, 12.0], [30, 1, 14.0], [30, 2, 24.0], [30, 3, 16.0], [30, 4, 9.0], [30, 5, 7.0], [31, 2, 2.0], [32, 2, 1.0], [33, 0, 13.0], [33, 1, 13.0], [33, 2, 15.0], [33, 3, 8.0], [33, 4, 5.0], [33, 5, 7.0], [34, 1, 1.0], [34, 3, 1.0]], "columns": [{"id": "D.1.16S", "metadata": null}, {"id": "D.2.16S", "metadata": null}, {"id": "D.3.16S", "metadata": null}, {"id": "L.1.16S", "metadata": null}, {"id": "L.2.16S", "metadata": null}, {"id": "L.3.16S", "metadata": null}], "generated_by": "BIOM-Format 0.9.3", "matrix_type": "sparse", "shape": [35, 6], "format_url": "http://biom-format.org", "date": "2012-05-29T10:03:26.985219", "type": "OTU table", "id": null, "matrix_element_type": "float"}


Cheers,

Lawrence

Tony Walters

unread,

May 30, 2012, 9:58:46 AM5/30/12

to qiime...@googlegroups.com

Hello Lawrence,

Can you change the -f parameter to something like -f 1 (should be a numerical value). Also, how many different categories are under "Treatment" in your mapping file?

-Tony

Lawrence Davies

unread,

May 30, 2012, 10:06:33 AM5/30/12

to qiime...@googlegroups.com

Hi Tony,

Ahh yes, I did have the -f 3 set when I tried a couple of days ago just rushed it a moment ago! The output looks the same below. There are only two treatments - the mapping file has worked fine for different scripts e.g. alpha_rarefaction. There is something wrong with the .biom file that I am putting in - when I use convert_biom.py the output is not the same as normal .biom outputs. I've attached it above from a few messages ago. Any ideas?

Cheers,

Lawrence

ldavies@sci102[convert_biom] qiime > otu_category_significance.py -i even1817_L4.biom -m bacteria_mapping.txt -s ANOVA -c Treatment -f 1 -o single_anova.txt

Traceback (most recent call last):
File "/usr/lib/qiime/bin/otu_category_significance.py", line 238, in <module>
    main()
File "/usr/lib/qiime/bin/otu_category_significance.py", line 222, in main
    category, threshold, filter, otu_include)
File "/usr/lib/python2.6/dist-packages/qiime/otu_category_significance.py", line 646, in test_wrapper
    parse_otu_table(otu_table, float)
File "/usr/lib/python2.6/dist-packages/qiime/parse.py", line 450, in parse_otu_table
    " Sample ID line:\n %s" % line
ValueError: Error parsing sample IDs in OTU table. Sample ID line:
{"rows": [{"id": "Root;Root;Other;Other", "metadata": null}, {"id": "Root;Root;k__Bacteria;Other", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__ABY1_OD1", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Acidobacteria", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Actinobacteria", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Armatimonadetes", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Bacteroidetes", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__CCM11b", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Chlorobi", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Chloroflexi", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Cyanobacteria", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Elusimicrobia", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Firmicutes", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__GAL15", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__GN02", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Gemmatimonadetes", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Lentisphaerae", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__NC10", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Nitrospirae", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__OP3", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Planctomycetes", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Proteobacteria", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__SBR1093", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__SC3", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__SC4", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__SM2F11", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__SPAM", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__TM6", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__TM7", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Tenericutes", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__Verrucomicrobia", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__WPS-2", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__WS2", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__WS3", "metadata": null}, {"id": "Root;Root;k__Bacteria;p__ZB2", "metadata": null}], "format": "Biological Observation Matrix 0.9.3", "data": [[0, 0, 2.0], [0, 1, 1.0], [0, 2, 3.0], [0, 4, 2.0], [1, 0, 78.0], [1, 1, 78.0], [1, 2, 90.0], [1, 3, 37.0], [1, 4, 48.0], [1, 5, 39.0], [2, 0, 2.0], [2, 2, 1.0], [3, 0, 327.0], [3, 1, 348.0], [3, 2, 326.0], [3, 3, 155.0], [3, 4, 196.0], [3, 5, 142.0], [4, 0, 180.0], [4, 1, 167.0], [4, 2, 185.0], [4, 3, 151.0], [4, 4, 188.0], [4, 5, 178.0], [5, 0, 5.0], [5, 1, 10.0], [5, 2, 9.0], [5, 3, 2.0], [5, 4, 5.0], [5, 5, 2.0], [6, 0, 44.0], [6, 1, 25.0], [6, 2, 24.0], [6, 3, 12.0], [6, 4, 17.0], [6, 5, 10.0], [7, 0, 4.0], [7, 4, 1.0], [8, 0, 6.0], [8, 1, 7.0], [8, 2, 4.0], [8, 3, 2.0], [8, 4, 3.0], [8, 5, 6.0], [9, 0, 169.0], [9, 1, 181.0], [9, 2, 175.0], [9, 3, 78.0], [9, 4, 124.0], [9, 5, 106.0], [10, 0, 4.0], [10, 1, 9.0], [10, 2, 3.0], [10, 3, 286.0], [10, 4, 185.0], [10, 5, 278.0], [11, 0, 1.0], [11, 1, 2.0], [11, 2, 2.0], [12, 0, 65.0], [12, 1, 84.0], [12, 2, 63.0], [12, 3, 206.0], [12, 4, 93.0], [12, 5, 272.0], [13, 0, 1.0], [13, 1, 1.0], [14, 0, 3.0], [14, 1, 4.0], [14, 2, 4.0], [14, 3, 6.0], [14, 4, 1.0], [14, 5, 9.0], [15, 0, 48.0], [15, 1, 57.0], [15, 2, 58.0], [15, 3, 14.0], [15, 4, 29.0], [15, 5, 16.0], [16, 1, 1.0], [17, 0, 1.0], [17, 2, 2.0], [17, 3, 1.0], [18, 0, 47.0], [18, 1, 46.0], [18, 2, 30.0], [18, 3, 6.0], [18, 4, 7.0], [18, 5, 15.0], [19, 4, 1.0], [20, 0, 58.0], [20, 1, 38.0], [20, 2, 48.0], [20, 3, 34.0], [20, 4, 32.0], [20, 5, 35.0], [21, 0, 710.0], [21, 1, 694.0], [21, 2, 718.0], [21, 3, 782.0], [21, 4, 849.0], [21, 5, 678.0], [22, 2, 3.0], [23, 4, 1.0], [24, 2, 1.0], [25, 4, 2.0], [25, 5, 1.0], [26, 0, 26.0], [26, 1, 24.0], [26, 2, 25.0], [26, 3, 10.0], [26, 4, 16.0], [26, 5, 13.0], [27, 0, 10.0], [27, 1, 7.0], [27, 3, 8.0], [27, 4, 3.0], [27, 5, 1.0], [28, 0, 1.0], [28, 1, 5.0], [28, 3, 1.0], [28, 5, 2.0], [29, 2, 1.0], [29, 3, 1.0], [30, 0, 12.0], [30, 1, 14.0], [30, 2, 24.0], [30, 3, 16.0], [30, 4, 9.0], [30, 5, 7.0], [31, 2, 2.0], [32, 2, 1.0], [33, 0, 13.0], [33, 1, 13.0], [33, 2, 15.0], [33, 3, 8.0], [33, 4, 5.0], [33, 5, 7.0], [34, 1, 1.0], [34, 3, 1.0]], "columns": [{"id": "D.1.16S", "metadata": null}, {"id": "D.2.16S", "metadata": null}, {"id": "D.3.16S", "metadata": null}, {"id": "L.1.16S", "metadata": null}, {"id": "L.2.16S", "metadata": null}, {"id": "L.3.16S", "metadata": null}], "generated_by": "BIOM-Format 0.9.3", "matrix_type": "sparse", "shape": [35, 6], "format_url": "http://biom-format.org", "date": "2012-05-29T10:03:26.985219", "type": "OTU table", "id": null, "matrix_element_type": "float"}

ldavies@sci102[convert_biom] qiime > [ 3:02PM]

Tony Walters

unread,

May 30, 2012, 10:13:34 AM5/30/12

to qiime...@googlegroups.com

Oops, I just realized you might still be running QIIME 1.4.0 rather than 1.5.0. Is this the case? If so, you don't want to use a .biom format file for otu_category_significance, but rather the .txt file (the text file may require manual modification if this is the case).

Lawrence Davies

unread,

May 30, 2012, 10:25:33 AM5/30/12

to qiime...@googlegroups.com

It's a bit confusing but I am using both! I do almost everything on 1.4.0 (including OTU_category_significance) by connecting to a cluster as it is far more powerful. However, if I need to install something (e.g. USEARCH/convert_biom) then I use 1.5.0 virtual computer - I do not have permission to install programs on the cluster.

I just ran it using the .txt output from summarize_taxa.py using QIME 1.4.0:

davies@sci102[summarise_taxa] qiime > otu_category_significance.py -i otu_table_even1817_L4.txt -m bacteria_mapping.txt -s ANOVA -c Treatment -f 3 -o single_anova.txt

This was the error.

Traceback (most recent call last):
File "/usr/lib/qiime/bin/otu_category_significance.py", line 238, in <module>
main()
File "/usr/lib/qiime/bin/otu_category_significance.py", line 222, in main
category, threshold, filter, otu_include)

File "/usr/lib/python2.6/dist-packages/qiime/otu_category_significance.py", line 638, in test_wrapper
otu_table = convert_OTU_table_relative_abundance(otu_table)
File "/usr/lib/python2.6/dist-packages/qiime/util.py", line 772, in convert_OTU_table_relative_abundance
vals = [float(i) for i in line[1:]]
ValueError: invalid literal for float(): D.1.16S

I have attached the input file and mapping file too.

Thanks for your quick replies.

Lawrence

bacteria_mapping.txt

otu_table_even1817_L4.txt

Tony Walters

unread,

May 30, 2012, 12:21:07 PM5/30/12

to qiime...@googlegroups.com

Hello again Lawrence,

I've attached a slightly modified version of the OTU table that you sent that should work with otu_category_significance.py in QIIME 1.4.0.

There are two changes.

1. The line # QIIME v1.4.0 OTU table

is added at the top.

2. Taxon is replaced with #OTU ID in the second line.

I was then able to run this command on the table:

otu_category_significance.py -i otu_table_even1817_L4.txt -m bacteria_mapping.txt -f 1 -c Treatment -s ANOVA -o test_ANOVA_output.txt

Which ran fine.

It's slightly less conversions in 1.5.0 QIIME (just running the convert_biom.py script) but hopefully the required modifications for QIIME 1.4.0 won't be too much of a pain for you.

-Tony

modified_otu_table_even1817_L4.txt

Lawrence Davies

unread,

Jun 1, 2012, 11:43:30 AM6/1/12

to qiime...@googlegroups.com

Hi Tony,

I tried it out today and it works! Thank you very much for your help and patience.

Best wishes,

Lawrence

Reply all

Reply to author

Forward