Error in rdp classifier based on GreenGene with assign

feng...@qq.com

unread,

Mar 22, 2016, 8:45:50 AM3/22/16

to Qiime 1 Forum

Hello,

I am contacting you all for I have met a problem with assign_taxonomy.py.

I want to use assign_taxonomy.py to make rdp classifier species based on GreenGene database.
Then I run the code:

/usr/lib/qiime/bin/assign_taxonomy.py -i /home/galaxy/user_data/datasets/000/dataset_459.dat -m rdp -t /usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.11/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt -r /usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.11/gg_13_8_otus/rep_set/97_otus.fasta -o rdp_assigned -c 0.5

But there was an error for this, like this:

Traceback (most recent call last):

File "/usr/lib/qiime/bin/assign_taxonomy.py", line 429, in <module>

main()

File "/usr/lib/qiime/bin/assign_taxonomy.py", line 406, in main

log_path=log_path)

File "/usr/lib/python2.7/dist-packages/qiime/assign_taxonomy.py", line 864, in __call__

max_memory=max_memory, tmp_dir=tmp_dir)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 503, in train_rdp_classifier_and_assign_taxonomy

tmp_dir=tmp_dir)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 473, in train_rdp_classifier

return app(training_seqs_file)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 315, in __call__

remove_tmp=remove_tmp)

File "/usr/lib/python2.7/dist-packages/burrito/util.py", line 284, in __call__

'StdErr:\n%s\n' % open(errfile).read())

burrito.util.ApplicationError: Unacceptable application exit status: 255

Command:

cd "/home/galaxy/database/job_working_directory/000/287/working/"; java -Xmx15000M -cp "/usr/local/galaxy/galaxy/tools/IEG/external_tools/rdp_classifier_2.5/rdp_classifier-2.5.jar" edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker -t"/tmp/RdpTaxonomy_F7Lo5N.txt" -s /tmp/tmpaogprxD6nMsCedla5Gvd.txt -n1 -vversion1 -mcogent -o /tmp/RdpTrainer_fD8oex > "/tmp/tmpkB2JUbezVblCdlCwEoaL.txt" 2> "/tmp/tmpieq4YXoj9rCkHIm99FKy.txt"

StdOut:

StdErr:

Usage: java ClassifierTraineeMaker <tax_file> <rawseq.fa> <trainsetNo> <version> <version_modification> <output_directory>

This program will create 4 output training files to be used by the classifier:

bergeyTrainingTree.xml, genus_wordConditionalProbList.txt, logWordPrior.txt and wordConditionalProbIndexArr.txt

Command line arguments:

--tax_file contains the hierarchical taxonomy information in the following format:

taxid*taxon name*parent taxid*depth*rank

Fields taxid, the parent taxid and depth should be in integer format

depth indicates the depth from the root taxon.

Note: the depth for the root is 0

EX: 44*ROOT*1*0*domain

--rawseq.fa contains the raw training sequences in fasta format

The header of this fasta file starts with ">",

followed by the sequence name, white space(s)

and a list taxon names seperated by ';' with highest rank taxon first.

Ex: >seq1 ROOT;Ph1;Fam1;G1

Note: a sequence can only be assigned to the lowest rank taxon.

--trainsetNo is a integer. It's used to marked the training information.

--version indicates the version of the hierarchical taxonomy

Ex: Bacteria Nomenclature

--version_modification holds the modifcation information of the taxonomy if any

Ex: Acidobacterium Added

--output_directory specifies the output directory.

The qiime version is 1.9.0 and the rdp classifier version is 2.5, but when I met this problem, I have tried to different version of rdp classifier, including rdp_classifier-2.2, rdp_classifier-2.5, rdp_classifier-2.10.1 and 2.11. But all of these got similar problem like above. Also I have tried to make --rdp_max_memory based on answers from others' suggestions, but there is still same problem.

So could any one help me with this. Thanks!

Jenya Kopylov

unread,

Mar 22, 2016, 10:04:22 AM3/22/16

to Qiime 1 Forum

Hello,

Is /home/galaxy/user_data/datasets/000/dataset_459.dat a FASTA formatted file?

If so, does taxonomy assignment work with another method, ex. uclust (just remove "-m rdp -c 0.5" from the assign_taxonomy.py command and replace "-o rdp_assigned" with "-o uclust_assigned")?

Could you also try adding "--rdp_max_memory 4000" (edit 4000 to the number of MB you may need)? Looks like RDP is asking for 15000M which may be too high for your system.

Also, please use rdp_classifier-2.2 for debugging since this is the version supported in QIIME.

Jenya

feng...@qq.com

unread,

Mar 22, 2016, 10:21:30 AM3/22/16

to Qiime 1 Forum

Hi Jenya,

Very glad you replied such quick!

I am sure /home/galaxy/user_data/datasets/000/dataset_459.dat, this is a FASTA file.

Then follows your suggestion, I run this with uclust

/usr/lib/qiime/bin/assign_taxonomy.py -i /home/galaxy/user_data/datasets/000/dataset_459.dat -t /usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.11/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt -r /usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.11/gg_13_8_otus/rep_set/97_otus.fasta -o uclust_assigned

There is no error reported. But when I checked with the results in uclust_assigned directory, all of the OTU results are the same, like below:

OTU_4337 Unassigned 1.00 1

OTU_4336 Unassigned 1.00 1

OTU_4335 Unassigned 1.00 1

OTU_4334 Unassigned 1.00 1

OTU_4333 Unassigned 1.00 1

OTU_4332 Unassigned 1.00 1

OTU_4331 Unassigned 1.00 1

OTU_4330 Unassigned 1.00 1

OTU_4339 Unassigned 1.00 1

OTU_4338 Unassigned 1.00 1

OTU_19695 Unassigned 1.00 1

OTU_19694 Unassigned 1.00 1

After this, I run the command as you suggested by changing the memory for 4000MB and use rdp_classifier-2.2, then I got the similar error, shown below:

Traceback (most recent call last):

File "/usr/lib/qiime/bin/assign_taxonomy.py", line 429, in <module>

main()

File "/usr/lib/qiime/bin/assign_taxonomy.py", line 406, in main

log_path=log_path)

File "/usr/lib/python2.7/dist-packages/qiime/assign_taxonomy.py", line 864, in __call__

max_memory=max_memory, tmp_dir=tmp_dir)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 503, in train_rdp_classifier_and_assign_taxonomy

tmp_dir=tmp_dir)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 473, in train_rdp_classifier

return app(training_seqs_file)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 315, in __call__

remove_tmp=remove_tmp)

File "/usr/lib/python2.7/dist-packages/burrito/util.py", line 284, in __call__

'StdErr:\n%s\n' % open(errfile).read())

burrito.util.ApplicationError: Unacceptable application exit status: 255

Command:

cd "/home/galaxy/database/job_working_directory/000/288/working/"; java -Xmx4000M -cp "/usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.2/rdp_classifier-2.2.jar" edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker -t"/tmp/RdpTaxonomy_OOsczW.txt" -s /tmp/tmpJrTz34vKgibIPrNIBpuR.txt -n1 -vversion1 -mcogent -o /tmp/RdpTrainer_E6hA5S > "/tmp/tmpJUhaPLqLBcm0GLInqs86.txt" 2> "/tmp/tmpF1SgwkgDFgw6eQF1a3De.txt"

StdOut:

StdErr:

Usage: java ClassifierTraineeMaker <tax_file> <rawseq.fa> <trainsetNo> <version> <version_modification> <output_directory>

This program will create 4 output training files to be used by the classifier:

bergeyTrainingTree.xml, genus_wordConditionalProbList.txt, logWordPrior.txt and wordConditionalProbIndexArr.txt

Command line arguments:

--tax_file contains the hierarchical taxonomy information in the following format:

taxid*taxon name*parent taxid*depth*rank

Fields taxid, the parent taxid and depth should be in integer format

depth indicates the depth from the root taxon.

Note: the depth for the root is 0

EX: 44*ROOT*1*0*domain

--rawseq.fa contains the raw training sequences in fasta format

The header of this fasta file starts with ">",

followed by the sequence name, white space(s)

and a list taxon names seperated by ';' with highest rank taxon first.

Ex: >seq1 ROOT;Ph1;Fam1;G1

Note: a sequence can only be assigned to the lowest rank taxon.

--trainsetNo is a integer. It's used to marked the training information.

--version indicates the version of the hierarchical taxonomy

Ex: Bacteria Nomenclature

--version_modification holds the modifcation information of the taxonomy if any

Ex: Acidobacterium Added

--output_directory specifies the output directory.

Thanks for your reply!

Jenya Kopylov

unread,

Mar 22, 2016, 11:18:02 AM3/22/16

to Qiime 1 Forum

Could you paste the first 10 sequences of your FASTA file (dataset_459.dat) here? Thanks!

feng...@qq.com

unread,

Mar 22, 2016, 8:54:24 PM3/22/16

to Qiime 1 Forum

Hi Jenya,

The first 10 sequences of this file shows below:

>OTU_0

TACGGAGGGGGCAAGCGTTACTCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCGGATGTGAAAACCCAATGGCTTAACCTTGGGCCTGCATTCGATACGGGCAAACTAGAGTGTGGTAGGGGAGACTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCGACAGCGAAGGCAGCACTTTGGGCCGGTACTGACACTGACGCTCAGGCACGAAAGCGTGGGGAGCAAACAGG

>OTU_1

TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTCATGCAAGACAGATGTGAAATCCCCGAGCTCAACTTGGGAACTGCATTTGTGACTGCATGGCTTGAGTGCGGCAGAGGGGGATGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATGCGGAGGAACACCGATGGCGAAGGCAATCCCCTGGGCCTGCACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGAT

>OTU_10

TACGGAGGGTGCAGCTCTAGTGGTGGCCAGTTTTATTGGGCCTAAAGCGTTCGTAGCCGGTTTATTAAGTCTCTGGTGAAATCCCGTAGCTTAACTATGGGAATTGCTGGAGATACTAGTAGACTTGAGGTCGGGAGAGGTTAGAGGTACTCCCAGGGTAGGGGTGAAATCCTGTAATCCTGGGAGGACCACCTGTGGCGAAGGCGTCTAACTGGAACGAACCTGACGGTGAGGGACGAAAGCTAGGGGCGCGAACCGG

>OTU_100

TCCCGGCAGTCCAAGTCGCAGCCAACATTATTGGGTCTAAAACATCCGTAGCTTGCTTATTAAGTTCCTTGTGAAATTCTACGTCTCAAGCGTGGAGCTTGCGAGGAATACTGGTAAGCTAGAGACCGGAAGATGCAAAGAGTACACTCAAGGTAGTAGTAAAATACGTTAATCTTGGGTGGACTAACAATTGGCGAAGGCACTTTGCAAGTACGGATCTGACAGTGAGGGATGAAGGCTAGAGGCGCAAAATGG

>OTU_1000

GACGAACCGTCCAAACGTTATTCGGAATCACTGGGCTTAAAGGGCGCGTAGGCGGCCGAGCGGGTCGCGGGTGAAATCCTCCAGCTTAACTGGAGAACTGCCCTCGATACCACTGCGGCTCGAGGAAGGAAGGGGTCATGGGAACTGTCGGTGGAGCGGTGAAATGCGTTGATATTCACAGGAACTCCGGTGGCGAAGGCGGCTCGCTGGGCCGCTTCTGACGCTGAGGCGCGGAAGCCAGGGGAGCAAACGGG

>OTU_10000

TACAGAGGTCTCGAGCGTTAATCGGAATTACTGGGCTTAAAGGGTGCGTAGGCGGCCCCGAAAGCGTCGTGTGAAAGCCCCCGGCTCAACCGGGGAACTGCACGGCGAACTACGGGGCTTGAGAGAACTAGGGGCTGACAGAACAGTAGGTGGAGCGGTGAAATGCGTAGATATCTATTGGAATGCCAAAGGTGAAGACAGCCGGCTGGGGATTTCCTGACGCTGAGGCACGAAAGATGGGGGAGCAAACAGG

>OTU_10001

TACGGGGGGTGCAAGCGTTGTTCGGAATTATTGGGCGTAAAGCGCGTGTAGGCGGTTTGATTAGTCTGATGTGAAAGCCCCGGGCTCAACCCAGGAAGCGCATTGGAAACTGTCGGACTTGAATACGGAAGAGGGTAGTGGAATTCCTAGTGTAGGAGTGAAATCCGTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGCTACCTGGACCGATATTGACGCTGAGACGCGAAAGCGTGGGTAGCAAACAGG

>OTU_10002

AACGTAGGAGGCGAGCGTTATCCGGAATTACTGGGCGTAAAGGGCATGCAGGCGGCTTGGTAAGTTGGACGTGAAAGCTCCTGGCTTAACTGGGAGAGTGCGTTCGAAACTGCTGAGCTTGAGGCTGGGAGAGGGGTGTGGAATTCCCGGTGTAGTGGTGGAATGCGTAGATATCGGGAGGAACACCGGTGGCGAAGGCGACTTCCTGGCCCTATACTGACGCTGAGACGCGAGAGCGTGGGTAGCAAACAGG

>OTU_10003

TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGCGTAAAGCGCGTGTAGGCGGTCTTTTAAGTCTGATGTGAAAGCCCCGGGCTTAACCTGGGAAGTGCATTGGAAACTGGAAGACTTGAGTACGGGAGAGGAGAGTGGAATTCCTAGTGTAGGAGTGAAATCCGTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGCTCTCTGGACCGATACTGACGCTGAGACGCGAAAGCGTGGGTAGCAAACAGG

>OTU_10004

TACGAAGGTGGCAAGCGTTGTTCGGATTCACTGGGCGTACAGGGTGTGTAGGCGGTTTGGTAAGCCTTCTGTTAAAGCTTCGGGCCCAACCCGAAGCCTGCATTTGAAACTATTGACCTTGAGTCTTGGAGAGACAGGCAGAATTCCTGGTGTAGCGGTGAAATGCGTAGATATCAGGAGGAATACCGATGGCGAAGGCAGCCTGCTGGACAATGACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGG

>OTU_10005

TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGTGGTTGTTTAAGACAGATGTGAAAGCCCCGGGCTCAACCTGGGAACTGCGTTTGAAACTGCAAGGCTAGAGTGTAGCAGAGGGGGGTAGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAACACCGATGGCGAAGGCAGGTCTCTGGGCCACAACTGACGCTGAGAAGCGAAAGCATGGGGAGCGAACAGG

Hope you can fix this problem!

Kai

Jenya Kopylov

unread,

Mar 23, 2016, 8:02:31 AM3/23/16

to Qiime 1 Forum

Hello,

I was able to run both Uclust and Rdp on these 10 sequences which generated results for both (Qiime 1.9.1). You mentioned that your Uclust assignments were 'Unassigned', are you sure that is the case for *all* sequences, including the ones below? If so, there may be an issue with your reference database or taxonomy path (97_otus.fasta, 97_otu_taxonomy.txt), can you make sure those are correct?

Otherwise, make sure that you have write access to the "temp_dir:" path given in your Qiime config file (see output of print_qiime_config.py). RDP uses that temporary directory and has been known to fail if it cannot write there.

$ assign_taxonomy.py -i test_reads.fasta -o uclust_assign
$ cat ./uclust_assign/test_reads_tax_assignments.txt
OTU_100 Unassigned 1.00 1
OTU_10 k__Archaea; p__Euryarchaeota; c__Methanobacteria; o__Methanobacteriales; f__Methanobacteriaceae; g__Methanobrevibacter; s__ 0.67 3
OTU_10005 k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Methylophilales; f__; g__; s__ 0.67 3
OTU_1000 k__Bacteria; p__Planctomycetes; c__Planctomycetia; o__Gemmatales; f__Gemmataceae; g__; s__ 1.00 3
OTU_10004 Unassigned 1.00 1
OTU_10003 k__Bacteria; p__Proteobacteria; c__Deltaproteobacteria; o__Desulfuromonadales; f__Geobacteraceae; g__Geobacter; s__ 1.00 3
OTU_10002 k__Bacteria; p__Chloroflexi; c__Anaerolineae; o__SJA-15; f__; g__; s__ 1.00 3
OTU_10001 k__Bacteria; p__Proteobacteria; c__Deltaproteobacteria 0.67 3
OTU_10000 Unassigned 1.00 1
OTU_0 Unassigned 1.00 1
OTU_1 k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__Comamonadaceae; g__; s__ 1.00 3

feng...@qq.com

unread,

Mar 23, 2016, 8:17:31 AM3/23/16

to Qiime 1 Forum

Hi Jenya,

I checked the reference taxonomy path both for 97_otus.fasta, 97_otu_taxonomy.txt, again. And I replace them with new files which I downloaded again. After that, the uclust results are normal ,just like you got, shown below.

But the RDP results were still the same problem.

OTU_4337 k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Rhodocyclales; f__Rhodocyclaceae; g__; s__ 1.00 3

OTU_4336 k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Xanthomonadales; f__Xanthomonadaceae; g__Dokdonella; s__ 0.67 3

OTU_4335 k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Thiotrichales; f__Piscirickettsiaceae; g__; s__ 1.00 3

OTU_4334 k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__Comamonadaceae; g__; s__ 0.67 3

OTU_4333 k__Bacteria; p__Proteobacteria; c__Deltaproteobacteria; o__Myxococcales; f__0319-6G20; g__; s__ 1.00 2

OTU_4332 k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__; f__; g__; s__ 1.00 1

OTU_4331 k__Bacteria; p__Chloroflexi; c__Anaerolineae; o__Caldilineales; f__Caldilineaceae; g__; s__ 1.00 3

OTU_4330 k__Bacteria; p__Chlorobi; c__Ignavibacteria; o__Ignavibacteriales; f__Ignavibacteriaceae; g__; s__ 1.00 3

OTU_4339 Unassigned 1.00 1

OTU_4338 k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Nocardioidaceae; g__Nocardioides; s__ 1.00 3

Here is the result of

print_qiime_config.py -t

System information

==================

Platform: linux2

Python version: 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2]

Python executable: /usr/bin/python

QIIME default reference information

===================================

For details on what files are used as QIIME's default references, see here:

https://github.com/biocore/qiime-default-reference/releases/tag/0.1.1

Dependency versions

===================

QIIME library version: 1.9.0

QIIME script version: 1.9.0+dfsg-0biolinux5

qiime-default-reference version: 0.1.1

NumPy version: 1.8.2

SciPy version: 0.13.3

pandas version: 0.13.1

matplotlib version: 1.3.1

biom-format version: 2.1.4

h5py version: 2.2.1 (HDF5 version: 1.8.11)

qcli version: 0.1.0

pyqi version: 0.3.2

scikit-bio version: 0.2.3

PyNAST version: 1.2.2

Emperor version: 0.9.51

burrito version: 0.9.0

burrito-fillings version: Installed.

sortmerna version: SortMeRNA version 2.0, 29/11/2014

sumaclust version: SUMACLUST Version 1.0.01

swarm version: Swarm 1.2.20 [Feb 1 2015 09:42:15]

gdata: Not installed.

QIIME config values

===================

For definitions of these settings and to learn how to configure QIIME, see here:

http://qiime.org/install/qiime_config.html

http://qiime.org/tutorials/parallel_qiime.html

blastmat_dir: /usr/share/ncbi/data

cluster_jobs_fp: None

pick_otus_reference_seqs_fp: /usr/lib/python2.7/dist-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta

jobs_to_start: 1

pynast_template_alignment_blastdb: None

qiime_scripts_dir: /usr/lib/qiime/bin/

working_dir: .

pynast_template_alignment_fp: /usr/share/qiime/data/core_set_aligned.fasta.imputed

python_exe_fp: python

temp_dir: /tmp

assign_taxonomy_reference_seqs_fp: /usr/share/qiime/data/gg_13_8_otus/rep_set/97_otus.fasta

blastall_fp: blastall

seconds_to_sleep: 60

assign_taxonomy_id_to_taxonomy_fp: /usr/share/qiime/data/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt

QIIME base install test results

===============================

F........

======================================================================

FAIL: test_FastTree_supported_version (__main__.QIIMEDependencyBase)

FastTree is in path and version is supported

----------------------------------------------------------------------

Traceback (most recent call last):

File "/usr/bin/print_qiime_config.py", line 342, in test_FastTree_supported_version

"which components of QIIME you plan to use.")

AssertionError: FastTree not found. This may or may not be a problem depending on which components of QIIME you plan to use.

----------------------------------------------------------------------

Ran 9 tests in 0.017s

FAILED (failures=1)

And for the temp_dir directory, I have make permission for tmp directory for 777, which is the highest permission of directory.

Thanks!

Kai

Jenya Kopylov

unread,

Mar 23, 2016, 9:31:16 AM3/23/16

to Qiime 1 Forum

Could you see if RDP is working for you standalone, try one of the examples commands in their README file (they provide input files in the /samplefiles folder).

I haven't seen your error before and it may be related somehow to the temporary directory, eg. a space issue in temp_dir on the galaxy system. Since you have Uclust running, would you be fine to use that for your taxonomy assignment?

Jenya

feng...@qq.com

unread,

Mar 23, 2016, 9:55:47 AM3/23/16

to Qiime 1 Forum

Hi Jenya,

As you suggested, I run the RDP follows the README file like this:

java -Xmx1g -jar /usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.11/dist/classifier-2.11.jar classify -c 0.5 -o usga_classified.txt -h soil_hier.txt /usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.11/samplefiles/USGA_2_4_B_trimmed.fasta

And there is no error and I could also get the correct results.

In my problem, there is always showing the usage of ClassifierTraineeMaker, do you know how it works for RDP or should I do something for the command of ClassifierTranineeMaker? Just because when I run this:

java -Xmx4000M -cp "/usr/local/galaxy/galaxy/tools/IEG/external_tools/rdp_classifier_2.5/rdp_classifier-2.5.jar" edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker -t"/tmp/RdpTaxonomy_F7Lo5N.txt" -s /tmp/tmpaogprxD6nMsCedla5Gvd.txt -n1 -vversion1 -mcogent -o /tmp/RdpTrainer_fD8oex > "/tmp/tmpkB2JUbezVblCdlCwEoaL.txt" 2> "/tmp/tmpieq4YXoj9rCkHIm99FKy.txt"

In the file of 2> "/tmp/tmpieq4YXoj9rCkHIm99FKy.txt", there also shows the error:

Usage: java ClassifierTraineeMaker <tax_file> <rawseq.fa> <trainsetNo> <version> <version_modification> <output_directory>

This program will create 4 output training files to be used by the classifier:

bergeyTrainingTree.xml, genus_wordConditionalProbList.txt, logWordPrior.txt and wordConditionalProbIndexArr.txt

.......

Even though the Uclust seems running correctly, I still want to use RDP for my taxonomy assignment. So thanks for your advice!

Thanks again!

Kai

feng...@qq.com

unread,

Mar 23, 2016, 10:27:37 AM3/23/16

to Qiime 1 Forum

Hello,

I also followed the Tony's suggestion to change the temp_dir to /tmp/tmp and then the error was still the same. So changing the temp_dir seems failed.

And the file "/tmp/tmpkB2JUbezVblCdlCwEoaL.txt" here contains like this with a lot of sequences:

>4479946 Root;k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__MND1; f___qiime_unique_taxon_tag_2609; g___qiime_unique_taxon_tag_2610; s___qiime_unique_taxon_tag_2611

AGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCATGCCTTACATATGCAAGTCGAGCGGCAGCGCGGGGGCAACCCTGGCGGCGAGCGGCGAACGGGTGAGTAATACATCGGAACGTGTCCAGTCGTGGGGGATAGCCCGGCGAAAGCCGGATTAATACCGCATGAGATCGAGAGATGAAAGCAGGGGACCCAAGGGAAACCGAGGGCCTTGCGCGATTGGAGCGGCCGATGTCCGATTAGCTAGTTGGTGAGGTAAAAGCTCACCAAGGCGACGATCGGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCCATGCCGCGTGTGTGATGAAGGCCTTCGGGTTGTAAAGCACTTTCGGACGGAACGAAATCGCGCGGGCGAATATCCCGCGTGGATGACGGTACCGTAAGAAGAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGGGTGCGCAGGCGGCTTCGCAAGTCAGGCGTGAAATCCCCGAGCTTAACTTGGGAATTGCGTTTGAAACTACGAGGCTGGAGTGTGGCAGAGGGAGGTGGAATTCCACGTGTAGCGGTGAAATGCGTAGATATGTGGAGGAACACCGATGGCGAAGGCAGCCTCCTGGGCCAACACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACTATGTCTACTAGTTGTTGGGGGAGTTAAATCCCTTAGTAACGCAGCTAACGCGAGAAGTAGACCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGATGATGTGGTTTAATTCGATGCAACGCGAAAAACCTTACCTACCCTTGACATGTCCGGAATCCTGCAGAGATGCAGGAGTGCCCGAAAGGGAGCCGGAACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGCCACTAGTTGCTACATTCAGTTGAGCACTCTAATGGGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCCTCATGGCCCTTATGGGTAGGGCTACACACGTCATACAATGGCGCGTACAGAGGGTTGCCAACCCGCGAGGGGGAGCTAATCCCAGAAAGCGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGCGGATCAGCATGTCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTTCACCAGAAGCAGGTCGCCTAACCGCAAGGGGGGCGCCTACCACGGTGAGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACC

>2970300 Root;k__Bacteria; p__OP11; c__WCHB1-64; o__d153; f___qiime_unique_taxon_tag_4042; g___qiime_unique_taxon_tag_4043; s___qiime_unique_taxon_tag_4044

GCTCAGGATGAACGCTGGCGGTGTGCCTTAGGCATGCAAGTCGAACGAAAAATGCTCTTCGGAGTATTTTAGTGGCAGACGGGTGAGTAACACGTAGGAATCTACCCAGTAGTGGGGAATAGCCTTCCGAAAGGAGGGATAATACCGCATGTGTCCGCAAGGACAAAGACTTCGGTCGCTATTGGAGGAGCCTGCGCGCTATCAGCTTGTTGGTGAGGTAAAGGCTCACCAAGGCAATGACGGCTAGCCGATGTGAGAGCATGATCGGCCACAAGGTTACTGAGACACGGAGCCTACACCTACGGGTGGCAGCAACCGGGAATATTGCGCAATGGACGAAAGTCTGACGCAGCGACACCGCGTGTGGGATGACGCTCTTCGGAGTGTAAACCACTGTGGCAGGGGATGAAATTTTGACAGTACCCTGCTAGAAAGCACCTGCTAACCACGTGCCAGAAGCCTCGGTAATACGTGGGGTGCAAGCGTTATCCGGATTTATTGGGCGTAAAGCGTACCGTCGGTTGTTTTTTAAGTTTCTTGTTTAAGCCTGAGGCCCAACTTCAGAGCAGCAAGAAAAACTGAAAAACTTGAGTTTGTTTGGGGCAGCTGGAATTCTCAGTGGAGGGGTGAAATCCGTAGATATTGAGAGGAACGCCAATCGCGTAGGCAGGCTGCTAAGACATAACTGACACTCATGGACGAAAGCTAGGGTAGCAAAAGGGATTAGAGACCCCTGTAGTCCTAGCCGTAAACACTCCTTGCTAGCTGTTTCCCCTTAGGGGGGAGTGGCGTAAGCTAACGCGTTAAGCAAGGCGCCTGGGGAGTACGACCGCAAGGTTAAAACTCAAAGGAATTGACGGGGGAGCGCACAACCGGTGGAGCATGTGGTTTAATTCGATACAAAGCGAAAAACCTTACCCAGGTTTGACATCCGCCTATGTCCTTTTCGAAAGAAAAGCACTTAGTGGCGTGACAGGTGATGCATGGCTGTCGTCAGCTCGTGTCGTGAGACGTCCTCTTAAGTGAGGTAACGAGCGCAACCCTTGTTCTGTGTTATACGTGTCACAGAAGACTGCCTCGATCATATCGGGGAGGAAGGAGAGGACGACGTCAAGTCAGCATGTCCCTTACACCTGGGGCTACACACATCCTACAATGGGACCGACAACAGGTAGCAAAGTCGTAAGGCTGAGCCAATCCCTAAACGGTCTCTCAGTCCGGATTGAGGGCTGCAATTCGCCCTCATGAAGTCGGAATCGGTAGTAATCGCAAATCAGCAGGTTGCGGTGAATACGTTCTCGCTCCTTGTACACACTGCCCGTCAAGCCAGCAAAGTCGGCAACGCCCGAAGCGGGTGACCGGAACCATTTATTTGGACCGGCCCTTCTACGGTGAGGTCGGCG

>684238 Root;k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Ruminococcaceae; g___qiime_unique_taxon_tag_201; s___qiime_unique_taxon_tag_202

AGTCGAACGGACTGATTCGCTTCGGTGATGAAAGTCAGTGGCGGACGGGTGAGTAACGCGTGAGTAACCTGCCTTTCAGAGGGGGATAACGTTTGGAAACGAACGCTAATACCGCATAACGTACAGTTGTCGCATGGCAGCAGTACCAAAGGAGCAATCCGCTGGAAGATGGACTCGCGTCTGATTAGATAGTTGGCGGGGTAACGGCCCACCAAGTCGACGATCAGTAGCCGGACTGAGAGGTTGAACGGCCACATTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGAGGGATATTGCACAATGGGCGCAAGCCTGATGCAGCAACGCCGCGTGAGGGATGACGGTTTTCGGATTGTAAACCTCTGTTCTTAGTGATGATAATGACATTAGCTAAGGAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGTGTAAAGGGAGCGCAGGCGGGAAAGCAAGTCAGTGGTGAAATGCATGGGCTTAACCCATGAACTGCCGTTGAAACTGTTTTTCTTGAGTGGAGTAGAGGCAAGCGGAATTCCGAGTGTAGCGGTGAAATGCGTAGATATTCGGAGGAACACCAGTGGCGAAGGCGGCTTGCTGGGCTCTAACTGATGCTGAGGCTCGAAAGTGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACACTGTAAACGATGATTACTAGGTGTGGGGGGTCTGACCCCTTTCGTGCCGCAGTTAACGCAATAAGTAATCCACCTGGGGGGTACGACCGCAAGGTTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCAGTGGATTATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCGAGTGACCGGCTAAGAGATTAGCCTTTCCTTCGGGACACGAAGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGCCATTAGTTGCTACGCAAGGGCACTCTAATGGGACCGCTACCGACAAGGTGGAGGAAGGTGGGGACGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTAATACAATGGCCGTTAACAATGGGAAGCAATGCCGCGAGGCGGAGCAAACCCCCAAAAACGGTCTCAGTTCGGATCGCAGGCTGCAACTCGCCTGCGTGAAGTTGGAATTGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTTTGTAACACCCGAAGT

Hope this will also help you to understand my state of RDP.

I want to know how to run the RDP classifier outside of QIIME as you mentioned with my fasta data along with GreenGene database.

Thanks!

Kai

TonyWalters

unread,

Mar 23, 2016, 11:58:34 AM3/23/16

to Qiime 1 Forum

Please try again with a different directory, one that does not start with /tmp/

feng...@qq.com

unread,

Mar 23, 2016, 9:05:57 PM3/23/16

to Qiime 1 Forum

Yes, Tony. I have tried to change the directory of temp_dir with another one, like /home/galaxy/temp/ or /temp/. Then the problem was still remained.

File "/usr/lib/python2.7/dist-packages/burrito/util.py", line 284, in __call__

'StdErr:\n%s\n' % open(errfile).read())

burrito.util.ApplicationError: Unacceptable application exit status: 255

Command:

cd "/home/galaxy/database/job_working_directory/000/310/working/"; java -Xmx4000M -cp "/usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.2/rdp_classifier-2.2.jar" edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker -t"/home/galaxy/temp/RdpTaxonomy_jjOjup.txt" -s /home/galaxy/temp/tmpayUZyH5Dwaj5eZZSJ0pQ.txt -n1 -vversion1 -mcogent -o /home/galaxy/temp/RdpTrainer_aCAKSY > "/home/galaxy/temp/tmpFhlTFDk9wubfNwWLmp3O.txt" 2> "/home/galaxy/temp/tmpgrIc0cqfVqq9Qxfubza2.txt"

Jenya Kopylov

unread,

Mar 24, 2016, 7:51:45 AM3/24/16

to Qiime 1 Forum

Hi Kai,

ClassifierTraineeMaker creates training files (using 97_otus) to be used by RDP.

Could you try running the unit tests for RDP?

$ python /path/to/qiime/tests/test_assign_taxonomy.py RdpTaxonAssignerTests

Thanks!

feng...@qq.com

unread,

Mar 24, 2016, 8:14:59 AM3/24/16

to Qiime 1 Forum

I have tried to find test_assign_taxonomy.py, but I can't find it. Maybe because this qiime was installed automatically when installing bio-linux system. And then there is no test files. Does this version of qiime by bio-linux have an effect on the test files?

Jenya Kopylov

unread,

Mar 24, 2016, 8:32:14 AM3/24/16

to Qiime 1 Forum

Can you try downloading https://github.com/biocore/qiime/blob/master/tests/test_assign_taxonomy.py directly and running:

python /path/to/qiime/tests/test_assign_taxonomy.py RdpTaxonAssignerTests

?

Jenya Kopylov

unread,

Mar 24, 2016, 8:32:58 AM3/24/16

to Qiime 1 Forum

Sorry, just

$ python /path/to/test_assign_taxonomy.py RdpTaxonAssignerTests

feng...@qq.com

unread,

Mar 24, 2016, 8:52:35 AM3/24/16

to Qiime 1 Forum

Yes, I download it and run the file test_assign_taxonmy.py RdpTaxonAssignerTests

I got this :

root@dell-PowerEdge-R920:~# python test_assign_taxonomy.py RdpTaxonAssignerTests

..E...EEE

======================================================================

ERROR: test_call_with_properties_file (__main__.RdpTaxonAssignerTests)

RdpTaxonAssigner should return correct taxonomic assignment

----------------------------------------------------------------------

Traceback (most recent call last):

File "test_assign_taxonomy.py", line 1546, in test_call_with_properties_file

training_seqs_file, taxonomy_file, training_dir)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 473, in train_rdp_classifier

return app(training_seqs_file)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 315, in __call__

remove_tmp=remove_tmp)

File "/usr/lib/python2.7/dist-packages/burrito/util.py", line 284, in __call__

'StdErr:\n%s\n' % open(errfile).read())

ApplicationError: Unacceptable application exit status: 255

Command:

cd "/home/fengkai/"; java -Xmx1000m -cp "/usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.2/rdp_classifier-2.2.jar" edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker -t"/tmp/RdpTaxonomy_PBySD0.txt" -s /tmp/tmpqSThGOwS2CMApKayw8ZA.txt -n1 -vversion1 -mcogent -o /tmp/RdpTrainer_Veokwe > "/tmp/tmp3aBkuhpWWrZFVP2IMdr7.txt" 2> "/tmp/tmpiN19vA4BJITHPmCUYEAz.txt"

StdOut:

StdErr:

Usage: java ClassifierTraineeMaker <tax_file> <rawseq.fa> <trainsetNo> <version> <version_modification> <output_directory>

This program will create 4 output training files to be used by the classifier:

bergeyTrainingTree.xml, genus_wordConditionalProbList.txt, logWordPrior.txt and wordConditionalProbIndexArr.txt

Command line arguments:

--tax_file contains the hierarchical taxonomy information in the following format:

taxid*taxon name*parent taxid*depth*rank

Fields taxid, the parent taxid and depth should be in integer format

depth indicates the depth from the root taxon.

Note: the depth for the root is 0

EX: 44*ROOT*1*0*domain

--rawseq.fa contains the raw training sequences in fasta format

The header of this fasta file starts with ">",

followed by the sequence name, white space(s)

and a list taxon names seperated by ';' with highest rank taxon first.

Ex: >seq1 ROOT;Ph1;Fam1;G1

Note: a sequence can only be assigned to the lowest rank taxon.

--trainsetNo is a integer. It's used to marked the training information.

--version indicates the version of the hierarchical taxonomy

Ex: Bacteria Nomenclature

--version_modification holds the modifcation information of the taxonomy if any

Ex: Acidobacterium Added

--output_directory specifies the output directory.

======================================================================

ERROR: test_taxa_with_special_characters (__main__.RdpTaxonAssignerTests)

Special characters in taxa do not cause RDP errors

----------------------------------------------------------------------

Traceback (most recent call last):

File "test_assign_taxonomy.py", line 1453, in test_taxa_with_special_characters

res = app(self.tmp_seq_filepath)

File "/usr/lib/python2.7/dist-packages/qiime/assign_taxonomy.py", line 864, in __call__

max_memory=max_memory, tmp_dir=tmp_dir)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 503, in train_rdp_classifier_and_assign_taxonomy

tmp_dir=tmp_dir)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 473, in train_rdp_classifier

return app(training_seqs_file)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 315, in __call__

remove_tmp=remove_tmp)

File "/usr/lib/python2.7/dist-packages/burrito/util.py", line 284, in __call__

'StdErr:\n%s\n' % open(errfile).read())

ApplicationError: Unacceptable application exit status: 255

Command:

cd "/home/fengkai/"; java -Xmx1000m -cp "/usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.2/rdp_classifier-2.2.jar" edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker -t"/tmp/RdpTaxonomy_q6f6jq.txt" -s /tmp/tmpeQdyIwbSFDYccyG2xI95.txt -n1 -vversion1 -mcogent -o /tmp/RdpTrainer_0oF0SD > "/tmp/tmpfFSaR9TmyihmDVv1ckUg.txt" 2> "/tmp/tmpFJswIbTsuJYnywNsM17g.txt"

StdOut:

StdErr:

Usage: java ClassifierTraineeMaker <tax_file> <rawseq.fa> <trainsetNo> <version> <version_modification> <output_directory>

This program will create 4 output training files to be used by the classifier:

bergeyTrainingTree.xml, genus_wordConditionalProbList.txt, logWordPrior.txt and wordConditionalProbIndexArr.txt

Command line arguments:

--tax_file contains the hierarchical taxonomy information in the following format:

taxid*taxon name*parent taxid*depth*rank

Fields taxid, the parent taxid and depth should be in integer format

depth indicates the depth from the root taxon.

Note: the depth for the root is 0

EX: 44*ROOT*1*0*domain

--rawseq.fa contains the raw training sequences in fasta format

The header of this fasta file starts with ">",

followed by the sequence name, white space(s)

and a list taxon names seperated by ';' with highest rank taxon first.

Ex: >seq1 ROOT;Ph1;Fam1;G1

Note: a sequence can only be assigned to the lowest rank taxon.

--trainsetNo is a integer. It's used to marked the training information.

--version indicates the version of the hierarchical taxonomy

Ex: Bacteria Nomenclature

--version_modification holds the modifcation information of the taxonomy if any

Ex: Acidobacterium Added

--output_directory specifies the output directory.

======================================================================

ERROR: test_train_on_the_fly (__main__.RdpTaxonAssignerTests)

Training on-the-fly classifies reference sequence correctly with 100% certainty

----------------------------------------------------------------------

Traceback (most recent call last):

File "test_assign_taxonomy.py", line 1434, in test_train_on_the_fly

obs_assignments = app(self.tmp_seq_filepath)

File "/usr/lib/python2.7/dist-packages/qiime/assign_taxonomy.py", line 864, in __call__

max_memory=max_memory, tmp_dir=tmp_dir)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 503, in train_rdp_classifier_and_assign_taxonomy

tmp_dir=tmp_dir)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 473, in train_rdp_classifier

return app(training_seqs_file)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 315, in __call__

remove_tmp=remove_tmp)

File "/usr/lib/python2.7/dist-packages/burrito/util.py", line 284, in __call__

'StdErr:\n%s\n' % open(errfile).read())

ApplicationError: Unacceptable application exit status: 255

Command:

cd "/home/fengkai/"; java -Xmx1000m -cp "/usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.2/rdp_classifier-2.2.jar" edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker -t"/tmp/RdpTaxonomy_xV5O30.txt" -s /tmp/tmpMab5Z2tl1B4fb80YTdQb.txt -n1 -vversion1 -mcogent -o /tmp/RdpTrainer_iwpXza > "/tmp/tmpjEzGSijbGUvRJo5EQsQ7.txt" 2> "/tmp/tmpxxi0I5TVtHEM7GjRcXkg.txt"

StdOut:

StdErr:

Usage: java ClassifierTraineeMaker <tax_file> <rawseq.fa> <trainsetNo> <version> <version_modification> <output_directory>

This program will create 4 output training files to be used by the classifier:

bergeyTrainingTree.xml, genus_wordConditionalProbList.txt, logWordPrior.txt and wordConditionalProbIndexArr.txt

Command line arguments:

--tax_file contains the hierarchical taxonomy information in the following format:

taxid*taxon name*parent taxid*depth*rank

Fields taxid, the parent taxid and depth should be in integer format

depth indicates the depth from the root taxon.

Note: the depth for the root is 0

EX: 44*ROOT*1*0*domain

--rawseq.fa contains the raw training sequences in fasta format

The header of this fasta file starts with ">",

followed by the sequence name, white space(s)

and a list taxon names seperated by ';' with highest rank taxon first.

Ex: >seq1 ROOT;Ph1;Fam1;G1

Note: a sequence can only be assigned to the lowest rank taxon.

--trainsetNo is a integer. It's used to marked the training information.

--version indicates the version of the hierarchical taxonomy

Ex: Bacteria Nomenclature

--version_modification holds the modifcation information of the taxonomy if any

Ex: Acidobacterium Added

--output_directory specifies the output directory.

======================================================================

ERROR: test_train_on_the_fly_low_memory (__main__.RdpTaxonAssignerTests)

Training on-the-fly with lower heap size classifies reference sequence correctly with 100% certainty

----------------------------------------------------------------------

Traceback (most recent call last):

File "test_assign_taxonomy.py", line 1476, in test_train_on_the_fly_low_memory

obs_assignments = app(self.tmp_seq_filepath)

File "/usr/lib/python2.7/dist-packages/qiime/assign_taxonomy.py", line 864, in __call__

max_memory=max_memory, tmp_dir=tmp_dir)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 503, in train_rdp_classifier_and_assign_taxonomy

tmp_dir=tmp_dir)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 473, in train_rdp_classifier

return app(training_seqs_file)

File "/usr/lib/python2.7/dist-packages/bfillings/rdp_classifier.py", line 315, in __call__

remove_tmp=remove_tmp)

File "/usr/lib/python2.7/dist-packages/burrito/util.py", line 284, in __call__

'StdErr:\n%s\n' % open(errfile).read())

ApplicationError: Unacceptable application exit status: 255

Command:

cd "/home/fengkai/"; java -Xmx75M -cp "/usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.2/rdp_classifier-2.2.jar" edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker -t"/tmp/RdpTaxonomy_aF1c_9.txt" -s /tmp/tmp0DUXwqv4ajnP87ZTree2.txt -n1 -vversion1 -mcogent -o /tmp/RdpTrainer_E7DoMj > "/tmp/tmp1zicGWHkR5TaEspVMFYu.txt" 2> "/tmp/tmp56dlhOlaarzX4waT1Ntv.txt"

StdOut:

StdErr:

Usage: java ClassifierTraineeMaker <tax_file> <rawseq.fa> <trainsetNo> <version> <version_modification> <output_directory>

This program will create 4 output training files to be used by the classifier:

bergeyTrainingTree.xml, genus_wordConditionalProbList.txt, logWordPrior.txt and wordConditionalProbIndexArr.txt

Command line arguments:

--tax_file contains the hierarchical taxonomy information in the following format:

taxid*taxon name*parent taxid*depth*rank

Fields taxid, the parent taxid and depth should be in integer format

depth indicates the depth from the root taxon.

Note: the depth for the root is 0

EX: 44*ROOT*1*0*domain

--rawseq.fa contains the raw training sequences in fasta format

The header of this fasta file starts with ">",

followed by the sequence name, white space(s)

and a list taxon names seperated by ';' with highest rank taxon first.

Ex: >seq1 ROOT;Ph1;Fam1;G1

Note: a sequence can only be assigned to the lowest rank taxon.

--trainsetNo is a integer. It's used to marked the training information.

--version indicates the version of the hierarchical taxonomy

Ex: Bacteria Nomenclature

--version_modification holds the modifcation information of the taxonomy if any

Ex: Acidobacterium Added

--output_directory specifies the output directory.

----------------------------------------------------------------------

Ran 9 tests in 15.979s

FAILED (errors=4)

Jenya Kopylov

unread,

Mar 24, 2016, 10:07:49 AM3/24/16

to Qiime 1 Forum

While we try to resolve RDP working via QIIME, you could also run examples f) and g) on the RPD github README to assign taxonomy manually.

If you choose to do that, you can then add the new taxonomy to your BIOM table using the following command:

$ biom add-metadata -i otutable.biom -o otutable_wtax.biom --observation-metadata-fp /path/to/rdp_assigned_taxonomy/seqs_rep_set_tax_assignments.txt --observation-header OTUID,taxonomy,confidence --sc-separated taxonomy

feng...@qq.com

unread,

Mar 24, 2016, 10:26:16 AM3/24/16

to Qiime 1 Forum

Yes, I will try to this manually. And also please inform me when you have more suggestions and advice or solutions about my problem of RDP classifer via QIIME. Thanks!

I will tell you about my result of running examples of f) and g) on RDP github README.

Kai

Kyle Bittinger

unread,

Mar 24, 2016, 2:16:54 PM3/24/16

to Qiime 1 Forum

Kai,

Hi, this is Kyle, I wrote a lot of this RDP code and I'm going to try and help out here.

The command issued to the RDP Classifier for training shouldn't have some of those flags in it. For example, the last command you pasted should read:

cd "/home/fengkai/"; java -Xmx75M -cp "/usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.2/rdp_classifier-2.2.jar" edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker "/tmp/RdpTaxonomy_aF1c_9.txt" /tmp/tmp0DUXwqv4ajnP87ZTree2.txt 1 version1 cogent /tmp/RdpTrainer_E7DoMj > "/tmp/tmp1zicGWHkR5TaEspVMFYu.txt" 2> "/tmp/tmp56dlhOlaarzX4waT1Ntv.txt"

Instead of:

cd "/home/fengkai/"; java -Xmx75M -cp "/usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.2/rdp_classifier-2.2.jar" edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker -t"/tmp/RdpTaxonomy_aF1c_9.txt" -s /tmp/tmp0DUXwqv4ajnP87ZTree2.txt -n1 -vversion1 -mcogent -o /tmp/RdpTrainer_E7DoMj > "/tmp/tmp1zicGWHkR5TaEspVMFYu.txt" 2> "/tmp/tmp56dlhOlaarzX4waT1Ntv.txt"

Notice that the "-t", "-s", "-n", "-v", "-m", and "-o" parts are gone. This is why your RDP Classifier is exiting with a message about usage -- the command is wrong!

In QIIME, the actual call to the RDP Classifier program is done from another python package, called burrito-fillings. Each external application called in QIIME has a corresponding class in the burrito-fillings package. The basic logic for calling applications in burrito-fillings is provided by yet another python package, burrito.

I can't find any version of burrito or burrito-fillings that adds those flags. I just downloaded burrito-fillings from github, set up an isolated environment, ran it manually, and verified that it did not add the flags. Are you working with a modified version of the QIIME package, the burrito package, or the burrito-fillings package?

Best,

Kyle

feng...@qq.com

unread,

Mar 24, 2016, 9:17:37 PM3/24/16

to Qiime 1 Forum

Hi Kyle,

I don't know what is version of QIIME package, the burrito package and the burriot-fillings package. Because this was implemented by Bio-linux system. This system was installed just two weeks ago. In order to help you know what versions I have, I upload the bfillings and burrito directories from /usr/lib/python2.7/dist-packages/burrito and /usr/lib/python2.7/dist-packages/bfillings.

BTW, can I just run the command directly without as you revised?

java -Xmx75M -cp "/usr/local/galaxy/galaxy/tools/IEG/RDP/rdp_classifier_2.2/rdp_classifier-2.2.jar" edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker "/tmp/RdpTaxonomy_aF1c_9.txt" /tmp/tmp0DUXwqv4ajnP87ZTree2.txt 1 version1 cogent /tmp/RdpTrainer_E7DoMj > "/tmp/tmp1zicGWHkR5TaEspVMFYu.txt" 2> "/tmp/tmp56dlhOlaarzX4waT1Ntv.txt"

After I run this directly, Here is the content in 2> "/tmp/tmp56dlhOlaarzX4waT1Ntv.txt":

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

Authors's mailng address:

Center for Microbial Ecology

2225A Biomedical Physical Science

Michigan State University

East Lansing, Michigan USA 48824-4320

E-mail: James R. Cole at co...@msu.edu

Qiong Wang at wang...@msu.edu

James M. Tiedje at tie...@msu.edu

Exception in thread "main" java.io.FileNotFoundException: /tmp/RdpTaxonomy_aF1c_9.txt (No such file or directory)

at java.io.FileInputStream.open(Native Method)

at java.io.FileInputStream.<init>(FileInputStream.java:146)

at java.io.FileInputStream.<init>(FileInputStream.java:101)

at java.io.FileReader.<init>(FileReader.java:58)

at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.<init>(ClassifierTraineeMaker.java:38)

at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.main(ClassifierTraineeMaker.java:133)

Does this look like normally or not?

Thanks for your help!

Kai

bfillings.rar

burrito.rar

Kyle Bittinger

unread,

Mar 25, 2016, 2:07:16 PM3/25/16

to Qiime 1 Forum

Kai, thanks for the files -- I'm looking them over.

The output of your command is not what is expected from a successful run. QIIME re-formats your input files according to the requirements of the RDP Classifier software and stores them in a temporary location. The temp file was cleaned up before you ran the command manually, thus it failed.

I'll get back to you soon.

Best,

Kyle

Kyle Bittinger

unread,

Mar 25, 2016, 2:25:25 PM3/25/16

to Qiime 1 Forum

It looks like the bfillings package was modified by the BioLinux people. I've attached the differences between your rdp_classifier.py file and the one in our github repository.

Your assignments might work if you unset the RDP_JAR_PATH variable. It seems like BioLinux added the patch a while back to support version 2.5 of the classifier. https://bugs.launchpad.net/bio-linux/+bug/1076024

Otherwise, you will need to install burrito-fillings from github to get our un-modified code.

Best,

Kyle

bfillings-biolinux.diff

feng...@qq.com

unread,

Mar 26, 2016, 3:37:42 AM3/26/16

to Qiime 1 Forum

Kyle, thanks a lot.

I replace the bio-linux's bfillings with your un-modified code. Then the problem was solved. I can get correct results of taxonomy with assign_taxonomy.py.

You are so nice and brilliant.

Also thanks to Jenya and Tony.

Thanks for all of your help.

Reply all

Reply to author

Forward

Error in rdp classifier based on GreenGene with assign_taxonomy

feng...@qq.com

Jenya Kopylov

feng...@qq.com

Jenya Kopylov

feng...@qq.com

Jenya Kopylov

feng...@qq.com

Jenya Kopylov

feng...@qq.com

feng...@qq.com

TonyWalters

feng...@qq.com

Jenya Kopylov

feng...@qq.com

Jenya Kopylov

Jenya Kopylov

feng...@qq.com

Jenya Kopylov

feng...@qq.com

Kyle Bittinger

feng...@qq.com

Kyle Bittinger

Kyle Bittinger

feng...@qq.com