taxonomical assignment of OTU file with PR2

78 views
Skip to first unread message

Ulrike Obertegger

unread,
Mar 9, 2017, 6:22:38 AM3/9/17
to micca users
Dear MICCA users,

I would like to classify my OTU table with PR2.
Commands work but the output I get is non-sense: for all OTUs I got "unclassified"

details:
download of PR2 from

command lines
pr2_ref=PR2_database/pr2_gb203_version_4.5.fasta
pr2_tax=PR2_database/pr2_gb203_version_4.5.taxo

micca classify -m cons --cons-threads 10 --cons-maxhits 5 -i denovo_greedy_otus/otus.fasta -o SILVA123_classify/taxa_pr2.txt --ref $pr2_ref --ref-tax $pr2_tax

extract of my OTU table attached

new command with the example file
micca classify -m cons --cons-threads 10 --cons-maxhits 5 -i denovo_greedy_otus/OTU_table_extract.fasta -o SILVA123_classify/taxa_pr2.txt --ref $pr2_ref --ref-tax $pr2_tax

Perhaps somebody out there can help?!

best regards,
Ulrike

OTU_table_extract.fasta

Davide Albanese

unread,
Mar 9, 2017, 7:24:06 AM3/9/17
to micca users
Dear Ulrike,
the database seems to be compatible with micca (qiime-formatted). Could you please report the output of

head PR2_database/pr2_gb203_version_4.5.taxo

?

Ulrike Obertegger

unread,
Mar 10, 2017, 1:58:20 AM3/10/17
to micca users
Dear Davide,
with the stated commands, MICCA gave an output (see attached file) but the result is unsatisfactory.
best regards,
Ulrike
taxa_pr2.txt

Davide Albanese

unread,
Mar 10, 2017, 4:43:44 AM3/10/17
to micca users
Dear Ulrike

could you please copy/and/paste the output of

head PR2_database/pr2_gb203_version_4.5.taxo

?

Ulrike Obertegger

unread,
Mar 10, 2017, 4:59:00 AM3/10/17
to micca users

head PR2_database/pr2_gb203_version_4.5.taxo


gives:



GU824834.1.1056_U Eukaryota;Alveolata;Alveolata_X;Alveolata_XX;Alveolata_XXX;Alveolata_XXXX;Alveolata_XXXXX;Alveolata_XXXXX_sp.;

FJ911926.1.909_U Eukaryota;Alveolata;Alveolata_X;Alveolata_XX;Alveolata_XXX;Alveolata_XXXX;Alveolata_XXXXX;Alveolata_XXXXX_sp.;

GU823937.1.1056_U Eukaryota;Alveolata;Alveolata_X;Alveolata_XX;Alveolata_XXX;Alveolata_XXXX;Alveolata_XXXXX;Alveolata_XXXXX_sp.;

GU824317.1.1056_U Eukaryota;Alveolata;Alveolata_X;Alveolata_XX;Alveolata_XXX;Alveolata_XXXX;Alveolata_XXXXX;Alveolata_XXXXX_sp.;

GU823858.1.1056_U Eukaryota;Alveolata;Alveolata_X;Alveolata_XX;Alveolata_XXX;Alveolata_XXXX;Alveolata_XXXXX;Alveolata_XXXXX_sp.;

GU824316.1.1370_U Eukaryota;Alveolata;Alveolata_X;Alveolata_XX;Alveolata_XXX;Alveolata_XXXX;Alveolata_XXXXX;Alveolata_XXXXX_sp.;

GU824094.1.1180_U Eukaryota;Alveolata;Alveolata_X;Alveolata_XX;Alveolata_XXX;Alveolata_XXXX;Alveolata_XXXXX;Alveolata_XXXXX_sp.;

GU824585.1.1181_U Eukaryota;Alveolata;Alveolata_X;Alveolata_XX;Alveolata_XXX;Alveolata_XXXX;Alveolata_XXXXX;Alveolata_XXXXX_sp.;

GU824593.1.1180_U Eukaryota;Alveolata;Alveolata_X;Alveolata_XX;Alveolata_XXX;Alveolata_XXXX;Alveolata_XXXXX;Alveolata_XXXXX_sp.;

DQ103802.1.1629_U Eukaryota;Alveolata;Alveolata_X;Alveolata_XX;Alveolata_XXX;Alveolata_XXXX;Alveolata_XXXXX;Alveolata_XXXXX_sp.;

Davide Albanese

unread,
Mar 13, 2017, 4:17:42 AM3/13/17
to micca users
The taxa file seems compatible with micca. Please try to lower the sequence identity threshold of the consensus classifier, e.g. adding the parameter "--cons-id 0.7".

Ulrike Obertegger

unread,
Mar 13, 2017, 4:30:20 AM3/13/17
to micca users
Dear Micca users,

lowering the sequence identity threshold of the consensus classifier to 0.7 does not help: again all OTUs are classified as "Unclassified"
command used:
micca classify -m cons --cons-threads 10 --cons-maxhits 5 --cons-id 0.7 -i denovo_greedy_otus/otus.fasta -o SILVA123_classify/taxa_pr2.txt --ref $pr2_ref --ref-tax $pr2_tax

best regards,
Ulrike

Ulrike Obertegger

unread,
Mar 23, 2017, 4:48:48 AM3/23/17
to micca users
Hi Micca users,
it is again me with my problem that MICCA does not want to use PR2 database.
I lowered the threshold value to 50% and got the following error message

root@757719c3d2e0:/micca/NGS.Illumina.16S# pr2_ref=PR2_database/pr2_gb203_version_4.5.fasta

root@757719c3d2e0:/micca/NGS.Illumina.16S# pr2_tax=PR2_database/pr2_gb203_version_4.5.taxo


root@757719c3d2e0:/micca/NGS.Illumina.16S# micca classify -m cons --cons-threads 10 --cons-maxhits 5 --cons-id 0.5 -i denovo_greedy_otus/otus.fasta -o SILVA123_classify/taxa_pr2.txt --ref $pr2_ref --ref-tax $pr2_tax

Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?


Then I increased the threshold and MICCA did not complain, BUT the output is still unsatisfactory: all OTUs are unclassified


root@757719c3d2e0:/micca/NGS.Illumina.16S# # 70% identity matching

root@757719c3d2e0:/micca/NGS.Illumina.16S# micca classify -m cons --cons-threads 10 --cons-maxhits 5 --cons-id 0.7 -i denovo_greedy_otus/otus.fasta -o SILVA123_classify/taxa_pr2.txt --ref $pr2_ref --ref-tax $pr2_tax

root@757719c3d2e0:/micca/NGS.Illumina.16S# 


Can anybody help?

Ulrike 

Massimo Pindo

unread,
Mar 23, 2017, 5:31:26 AM3/23/17
to micca users
Dear Uli,
about the "Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?". I know that has been fixed in the new MICCA version 1.6.1. Otherwise try to format your files using the command "dos2unix filename.txt"
best
m.

Ulrike Obertegger

unread,
Mar 24, 2017, 4:55:05 AM3/24/17
to micca users
Hi Massimo,
also dos2unix does not help ;-(
thanks for the idea,
Ulrike

Davide Albanese

unread,
Mar 31, 2017, 5:05:56 AM3/31/17
to micca...@googlegroups.com
The "new-line error" is definitely fixed in version 1.6.2

Davide Albanese

unread,
Mar 31, 2017, 5:14:29 AM3/31/17
to micca users
SOLVED: the PR2 database contains Protist only!

Ciao,
Davide
Reply all
Reply to author
Forward
0 new messages