Default reference files in the PICRUSt2 workflow

309 views
Skip to first unread message

Maria Camila

unread,
Oct 2, 2018, 11:15:02 AM10/2/18
to picrust-users
Hi Gavin 

I'm about to run the PICRUSt2 workflow but first I have a question about the reference databased used. I used the DADA2 workflow in R for the ASV denoising of my data, I did the taxonomy assignation with the Silva taxonomic training data formatted for DADA2 (v.132) so I was wondering if this affects the first two steps of the PICRUSt2 workflow or what database is used to run the reference multiple-sequence alignment in the sequence placement step? how were the reference.fna and reference.tre files built? and The default pre-calculated count tables in the hidden state prediction works for any database or was it build with one specifically? 


Ps: I'm sorry if I'm a little lost, I'm new to the microbiome and bioinformatics fields 

Thanks in advance!

Camila

Gavin Douglas

unread,
Oct 2, 2018, 3:15:51 PM10/2/18
to picrus...@googlegroups.com
Hi Maria,

The databases in PICRUSt2 are based on genomes in the Integrated Microbial Genomes database. Taxonomic information is not used by PICRUSt2 so that step wont influence the result. The reference files are based on a MSA of full-length 16S sequences from these IMG genomes and so it doesn’t matter what pipeline was used to produce the representative 16S sequences (e.g. it doesn’t matter if a representative sequence is de novo or a known reference sequence).


Best,

Gavin

--
You received this message because you are subscribed to the Google Groups "picrust-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picrust-user...@googlegroups.com.
To post to this group, send email to picrus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jan Söderman

unread,
Oct 29, 2018, 4:42:25 AM10/29/18
to picrust-users
Hi Gavin,

Is it always the latest version of the Integrated Microbial Genomes database that is used? Otherwise, what is the up-date policy regarding new IMG releases and where can I find information on the current IMG-version used by PICRUSt2?

Sincererly,
Jan

Gavin Douglas

unread,
Oct 29, 2018, 11:30:24 AM10/29/18
to picrus...@googlegroups.com
Hey Jan,

No, the PICRUSt2 predictions will not always be based on the latest IMG database. The main reason for this is that it’s difficult to get access to the entire database - the IMG developers directly provided us with files containing all gene family abundances and 16S sequences as of Nov 8th, 2017, but we don’t have a system for keeping the reference databases updated with IMG. IMG no longer uses version numbers, but hopefully that date will be sufficient for your purposes. Please let me know if you have other questions!


Gavin
Reply all
Reply to author
Forward
0 new messages