We are currently working on getting the UNITE ITS database into a format
compatible with QIIME. We've been getting a lot of unclassified
sequences when using the RDP classifier, but much less when using BLAST,
so we are looking into this issue before we post the files publicly. The
files should be available shortly.
Thanks,
Jai
Jai's files will be going up on the 'Data Files' section of the QIIME
website (http://qiime.org/home_static/dataFiles.html) when they're
ready.
Greg
Sarah
On Nov 15, 9:02 am, Greg Caporaso <gregcapor...@gmail.com> wrote:
> Thanks Jai!
>
> Jai's files will be going up on the 'Data Files' section of the QIIME
> website (http://qiime.org/home_static/dataFiles.html) when they're
> ready.
>
> Greg
>
> On Tue, Nov 15, 2011 at 8:55 AM, Jai Ram Rideout <jai.ride...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Hi Xueju and Chris,
>
> > We are currently working on getting the UNITEITSdatabase into a format
> > compatible with QIIME. We've been getting a lot of unclassified sequences
> > when using the RDP classifier, but much less when using BLAST, so we are
> > looking into this issue before we post the files publicly. The files should
> > be available shortly.
>
> > Thanks,
> > Jai
>
> > On 11/15/2011 06:11 AM, Chris B wrote:
>
> >> Hi Xueju,
>
> >> I have a python script that goes through the ncbi's nt database,
> >> generates the taxonomy mapping file that qiime requires to use blast,
> >> and generates a list of accession numbers that you can use as an input
> >> to the ncbi's alias tool to make a custom database. I use this to
> >> restrict the database to sequences labelled as fungi. It does not
> >> specifically pull outITS-labelled sequences, but I find it works OK
> >> to blastITSagainst the resulting set anyhow. If you did want to
> >> restrict the database toITS, I think you could probably do that as a
> >> preliminary step.
>
> >> An alternative would be to use the Unite database, and maybe this is
> >> what you are referring to? Here it's a matter of going through the
> >> fasta file that can be downloaded from their site and generating the
> >> qiime taxonomy mapping file from the sequence identifier lines in that
> >> file, which can also be done with python. Then it's just a matter of
> >> turning the fasta file into a blast-formatted database using the
> >> ncbi's tool.
>
> >> If you drop me a line at cba...@oeb.harvard.edu I'd be more than happy
> >> to share code.
>
> >> Cheers,
> >> Chris
>
> >> On Nov 14, 1:00 pm, Sam<xueju...@gmail.com> wrote:
>
> >>> Hello,
> >>> I wonder if there is a qiime-compatible database for fungalITS
> >>> sequences or anybody would like to share it, so that it can be used
> >>> for taxa assignment by blast?
>
> >>> There is a Blast database available for fungalITS, but do not know
If you have a reference collection that achieves better results we'd
definitely be interested in trying it out. Is this something you're
able to share with us?
Greg
I'm keeping an eye on this thread because I'm just starting to move
into fungal sequences (still have to work out primers that work with
the sequencing), but I noticed talk of trying to find a decent
database. Through one of the papers I have read I found this database
http://www.emerencia.org/fungalitspipeline.html (click the "download"
for the actual fasta file). It is not listed out with taxonomy that
is set up for the RDP, but it has a lot of ITS1 sequences in it (and
the strains I was looking at). I have no idea (but watching in case I
can learn) how to turn this into a usable database for Qiime, but it
might be use to those that can.
Nigel
For example, we have one fungal ITS sequence which is assigned as
Root;Fungi;Unknown;Unknown;Unknown;Unknown7637 in the qiime blast;
If we use Blastn in the UNITE website, we can get better phylogenetic
resolution:
Sequences producing significant alignments:
(bits) Value
EF031111 fungal sp W303
1043 0.0
HQ445982 uncultured fungus
1035 0.0
EU725701 fungal sp Q1
1021 0.0
EU725700 fungal sp M23
1021 0.0
EU725698 fungal sp M21
1021 0.0
EU725696 fungal sp M18
1021 0.0
EU725675 fungal sp F13
1021 0.0
EU725674 fungal sp F12
1021 0.0
EF126341 fungal sp WD34A
1021 0.0
EU240135 fungal sp WD32A
1011 0.0
EF434139 uncultured fungus
1011 0.0
HM589305 Mucoromycotina sp BEA_2010
1001 0.0
AY969842 uncultured fungus
1001 0.0
FJ553914 uncultured Mortierella
999 0.0
EU240133 Mortierella sp WD2G
997 0.0
EU240132 fungal sp WD2F
997 0.0
EU240130 Mortierella sp WD25F
997 0.0
If we remove those "fungal sp." near the top, Blast in qiime will
assign my sequence to Mucoromycotina or Mortierella sp. so that we can
get a phylogenetic resolution to a phylum level or finer.
Xueju
On Dec 5, 9:51 am, Sam <xueju...@gmail.com> wrote:
> Hi Sarah and Greg,
> I had the same problem as described by Sarah.
> To reduce the % of unknown fungal in the assignment, one strategy is
> to remove all unknown fungal sequences in the unite_ref_seqs file.
>
> For example, we have one fungalITSsequence which is assigned as
> > There are no greatITSreference collections that we're aware of. We
> > put this one together for one specific study, but from what I
> > understand the UNITEdatabaseis very ectomycorrhizal biased as that
> > was the original purpose of thedatabase. We mention the issue with
> > many unclassified sequences in the README.txt associated with the
> > unite_taxonomy_21nov2011.zip file.
>
> > If you have a reference collection that achieves better results we'd
> > definitely be interested in trying it out. Is this something you're
> > able to share with us?
>
> > Greg
>
> > On Sat, Dec 3, 2011 at 6:36 PM, garlicscape <garlicsc...@gmail.com> wrote:
> > > I just tried using thisITSdatabase, and it is resulting in a very
> > > large number of unknowns, even at phylum level. Any idea why?
> > > Before I was using adatabaseI made on my own. I downloaded AFTOL
> > >> >> to the ncbi's alias tool to make a customdatabase. I use this to
> > >> >> restrict thedatabaseto sequences labelled as fungi. It does not
> > >> >> specifically pull outITS-labelled sequences, but I find it works OK
> > >> >> to blastITSagainst the resulting set anyhow. If you did want to
> > >> >> restrict thedatabasetoITS, I think you could probably do that as a
> > >> >> preliminary step.
>
> > >> >> An alternative would be to use the Unitedatabase, and maybe this is
> > >> >> what you are referring to? Here it's a matter of going through the
> > >> >> fasta file that can be downloaded from their site and generating the
> > >> >> qiime taxonomy mapping file from the sequence identifier lines in that
> > >> >> file, which can also be done with python. Then it's just a matter of
> > >> >> turning the fasta file into a blast-formatteddatabaseusing the
> > >> >> ncbi's tool.
>
> > >> >> If you drop me a line at cba...@oeb.harvard.edu I'd be more than happy
> > >> >> to share code.
>
> > >> >> Cheers,
> > >> >> Chris
>
> > >> >> On Nov 14, 1:00 pm, Sam<xueju...@gmail.com> wrote:
>
> > >> >>> Hello,
> > >> >>> I wonder if there is a qiime-compatibledatabasefor fungalITS
> > >> >>> sequences or anybody would like to share it, so that it can be used
> > >> >>> for taxa assignment by blast?
>
> > >> >>> There is a Blastdatabaseavailable for fungalITS, but do not know
I am happy to share my database, but I feel it is not very
professionally made. Basically I went to ncbi and seached for:
fungi[orgn] NOT (unknown OR uncultured) AND internal transcribed
spacer [title] AND AFTOL [word]. I downloaded thise as INDSseq xml
file. A year ago this resulted in ~600 sequences (which I hope are
representative of the fungal kingdom). Then I managed to open in
excel, but had to do some lengthy editing because I couldn't figure
out how to convert .xml to .xls. It wasn't TOO bad because there was
some consistency between entries, but the most time consuming part was
going through each line one by one to make sure only 6 taxa levels
were listed, and that they were the correct levels (ie no sub-levels).
I could not figure out what file formats were required for RDP
analysis, but it seemed even more complicated, so I made the two files
required for the blast method in assign taxonomy. It seems like that's
what should be used anyway, since ITS is hyper-variable and cannot be
aligned and therefore arguably should not be used to develop
phylogenetic trees.
I'm happy to share the files, if anyone still wants them after reading
about these clumsy methods, but I don't know how to attach on this
forum :)
Sarah
Hi Blanca,
Of course I can share it. I formatted it myself to be used with Qiime, so I hope I did it ok. It worked for me at least, but it is not deeply tested to be honest. I recommend you to validate your results with this database with those obtained with Unit or by blast against NCBI of at least some sequences.
I couldn't attach the DB (only 6mb compressed) in this message but if you give me your email I can send it to you.
Good Luck,
Nicolas.2012/3/29 Blanca B. Landa <ag2l...@uco.es>
--
--
Maybe just a really stupid question but to be sure I understand it correctly:
by the sequence ID you mean what is written after ">" in the fasta file which I will use as a ref seqs database file, right?
--
--
--
--
Hi Xueju,
I have a python script that goes through the ncbi's nt database,
generates the taxonomy mapping file that qiime requires to use blast,
and generates a list of accession numbers that you can use as an input
to the ncbi's alias tool to make a custom database. I use this to
restrict the database to sequences labelled as fungi. It does not
specifically pull out ITS-labelled sequences, but I find it works OK
to blast ITS against the resulting set anyhow. If you did want to
restrict the database to ITS, I think you could probably do that as a
preliminary step.
An alternative would be to use the Unite database, and maybe this is
what you are referring to? Here it's a matter of going through the
fasta file that can be downloaded from their site and generating the
qiime taxonomy mapping file from the sequence identifier lines in that
file, which can also be done with python. Then it's just a matter of
turning the fasta file into a blast-formatted database using the
ncbi's tool.
If you drop me a line at cba...@oeb.harvard.edu I'd be more than happy
to share code.
Cheers,
Chris
On Nov 14, 1:00 pm, Sam <xueju...@gmail.com> wrote:
--
---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/v57AXiNsm9c/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
--
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.