convert RDP database (fasta file - 3,000,000ish items) for qiime database format

105 views
Skip to first unread message

Joo Wook Kim

unread,
Jun 20, 2017, 8:52:43 PM6/20/17
to qiime...@googlegroups.com
Hello,

I am doing metagenomics data analysis with qiime and would like to use RDPdatabase.(https://rdp.cme.msu.edu/misc/resources.jsp) for 16s reference database.

I found this post, so I used cd-hit-est for sequence similarity clustering - 97% and extracted taxonomy information (7 levels)

So I got 400,000ish items ( grep ">" -c is 400,000ish) in the fasta file and id_to_taxonomy file.
I was just wondering this is the right way of converting the rdp fasta file for qiime database format.
Am I missing out on something or doing the wrong way?

Any piece of advice would be really appreciated.

Thank you

-jk Kim-

sorry for my English.

justink

unread,
Jun 21, 2017, 5:18:57 PM6/21/17
to Qiime 1 Forum
Sounds correct to me. Note that the default greengenes reference is only about 100,000 items [1], so it'll take a little longer.

[1] grep -c '>' $(print_qiime_config.py | grep assign_taxonomy_reference | cut -f 2)

jk Kim

unread,
Jun 21, 2017, 8:18:28 PM6/21/17
to Qiime 1 Forum
Thank you! anyway, that's a nice one liner!

-jk-

DV

unread,
Jun 27, 2017, 1:02:05 PM6/27/17
to Qiime 1 Forum
Hello, jk Kim

Could you convert the RDP database to use it in QIIME?
How did you do it?

Thanks,
Daniela

jk Kim

unread,
Jul 25, 2017, 3:25:39 AM7/25/17
to Qiime 1 Forum
Hi DV,

I did, but I found out that I didn't make pynast template files.
what I did was I opened otu_97.fasta and taxonomy files and parsed it; it was just dirty but not difficult.

I am still figuring out how to generate pynast template files which is used for MSA.

Hope this helps.

Take care,
Jk Kim

Reply all
Reply to author
Forward
0 new messages