OTU picking for multiple samples

399 views
Skip to first unread message

André Soares

unread,
Feb 29, 2016, 8:39:54 AM2/29/16
to Qiime 1 Forum
Hello there,

Following QIIME's http://qiime.org/tutorials/chaining_otu_pickers.html tutorial while trying a meta-analysis, I'm not sure whether I'm aiming at the right strategy.
I can either OTU-pick each sample individually, eventually ending up with a lot of otu_maps to merge or merge the .fna's together by study and close-reference OTU pick each one, ending up with only two files to merge.

Currently, I'm trying the first strategy, but getting this error: Some keys do not map ('1397') -- is the order of your OTU maps equivalent to the order in which the OTU pickers were run? If expanding a failures file, did you remember to leave out the otu map from the run which generated the failures file?

Important:
Using SRA-downloaded data. Assuming already demultiplexed and thus only running convert_fastaqual_fastq.py. Then running split_libraries with individual mapping files by sample name.
Using SILVA 123 as reference.
Using bash and python to run QIIME along several files (as in split_libraries)
First OTU-pick = prefix_suffix, second = SILVA 123 +  uclust_ref

Anyone with tips on this, in the context of a meta-analysis (dealing with several samples from multiple studies)?

Cheers,
André

Jenya Kopylov

unread,
Mar 3, 2016, 2:45:18 PM3/3/16
to Qiime 1 Forum
Hi André,

It will help to know the exact commands you called and the error generated on the screen, could you copy/paste that information here?

Thanks,
Jenya

Colin Brislawn

unread,
Mar 3, 2016, 3:50:20 PM3/3/16
to Qiime 1 Forum
Hello Andre,

merge the .fna's together by study 
This is the most common way of doing it and it works very well. The output of split_libraraires.py, split_libraries_fastq.py, and add_qiime_lables.py is a single fasta file which is ready for OTU picking. 

Anyone with tips on this, in the context of a meta-analysis (dealing with several samples from multiple studies)?
Yes! Take all your files and process them using multiple_split_libraries_fastq.py (for lots of fastq files) or add_qiime_lables.py (for lots of fasta files). You will then have a single seqs.fna file which you can use with any OTU picking method.

Good luck!
Colin


André Soares

unread,
Mar 4, 2016, 5:48:55 AM3/4/16
to Qiime 1 Forum
Hey Colin,

About OTU-picking samples by study, how would a merge of the resulting biom's then be handled?
This is, wouldn't OTU duplication be a problem? e.g., OTUs from different studys with the same taxonomical affiliation could be considered as very different, right?
Not sure how the commands work intrinsically, but this worries me the most in the analysis I am pursuing.

Thanks again!
André

Colin Brislawn

unread,
Mar 4, 2016, 1:02:45 PM3/4/16
to Qiime 1 Forum
Hello Andre,

In the default method I'm suggesting, you merge all the labeled reads together into a single seqs.fna file, which will make a single .biom file after OTU picking. (One seqs.fna file --> one .biom file, no merging necessary.)

Merging .biom files is tricky because OTUs do conflict, just like you said. The solution is to merge the samples you want before OTU picking, so that you get one .biom file and avoid this problem. 


Colin 

André Soares

unread,
Mar 7, 2016, 3:14:33 AM3/7/16
to Qiime 1 Forum
Hello again Colin,

This would be an important feature in a meta-analysis concerning multiple 16S regions across studies, right?

Any other recommendations for this kind of analysis?

Cheers,
André

André Soares

unread,
Mar 7, 2016, 6:14:28 AM3/7/16
to Qiime 1 Forum
This is, for datasets with multiple 16S regions, OTU picking should be done by region (1 concatenated .fna per region), right?
Can this avoid redundancy at the OTU-level in the best way possible?

Colin Brislawn

unread,
Mar 8, 2016, 12:27:18 PM3/8/16
to Qiime 1 Forum
Hello Andre,

Oh, I'm glad you mentioned the multiple regions. You are correct, each region should undergo OTU picking separately. 

Any other recommendations for this kind of analysis?
It's challenging, so have a good reason to do it. :-)

Because reads are from different regions, it's really hard to combine the produced OTU tables. If you are trying to combine tables, using a closed-ref OTU picking method will work well because all the OTU IDs are the same, even if the sequenced regions are different. Of course, this method forces you to trust the accuracy and completeness of your referance.  

I would suggest that you keep the resulting .biom tables separate and run your analysis on both tables. Different regions catch different taxa better, so one region may do a better job showing you which Firmicutes are different between treatments, for example. If your narrative focuses on Firmicutes, you could just use that region and .biom table in all downstream analysis, completely omitting the other .biom table. 

I hope that helps!
Colin Brislawn

André Soares

unread,
Mar 9, 2016, 4:49:13 AM3/9/16
to Qiime 1 Forum
Hello again,

Thanks for that!

What about if I wanted to see if trends are maintained across common environmental variables?
Should I still take an individual look at each table or are there other ways to do this?

Best,
André

Colin Brislawn

unread,
Mar 9, 2016, 1:51:37 PM3/9/16
to Qiime 1 Forum
No matter how you slice it, combining different regions is imperfect. 

I guess I'm suggesting to run all your analysis in parallel on both tables. You can see if trends are maintained, and build your narrative around the table that give you the best resolution. (I guess you could also mention sequencing another region and results for that region. Reviewers may ask about conflicting results, and then you can explain how some taxa are better resolved in one region over another.) 

It's possible to perform closed ref OTU picking with all your reads from different regions. This is possible because the different regions are still matching to the same full length reference OTUs. I would NOT recommend doing this, but you can and some reasonable scientists do. If you really want to get these reads into the same table, this is the way to go.

Keep in touch!
Colin Brislawn

André Soares

unread,
Mar 11, 2016, 1:08:10 PM3/11/16
to Qiime 1 Forum
Hey again Colin,

So, from my view, this can be done in multiple ways, but in your opinion, for a study concerning about 500 samples (from 20 studies) and some 16S regions not available for all samples (e.g. 20 V68, 200 V46, 100 V13 and so on...), what option would you go for?
In this study, I mostly want to see how these prokaryomes are defined by environmental factors which vary across all studies.

  1. would be to merge all samples before CR OTU-picking, to then compare them by means of 1 OTU table (by PCA, for example)   --->   you do 'NOT' recommend this and I also don't like it, but can you show examples of previous works using this method? what are the main dangers here?
  2. merge samples by 16S region before CR OTU-picking, compare differences between OTU tables (various PCA's and Procrustes analysis?)   ---->  actually seems to me the best way to not get redudancy, but clearly the one leading to the most difficult statistical analysis
  3. merge OTU maps/tables after 16S region-specific CR OTU-picking, getting 1 OTU table   ---->  danger of redudant taxonomical assignments, OTU table merging may not go as planned

I'm starting to become inclined towards comparing 1 and 2... Could be a nice way to build up a good chapter on my PhD... :)

@Colin, can you give your opinion on all this?


Thanks!
André

Colin Brislawn

unread,
Mar 11, 2016, 2:20:03 PM3/11/16
to Qiime 1 Forum
Ah ah! This is a large meta analysis, using many studies and regions. I understand the scope of your challenge better now.

Because you are trying to combine so many disparate data types, I think closed-ref OTU picking is a good method, combining all samples and regions. (I thought you had two regions from same samples. In your case, the limitations of closed-ref are more than outweighed by the benefit of having a static set of common OTUs between all these studies and regions.)

Because closed-ref OTU picking always outputs the same OTU IDs and taxonomic assignments, 1) and 3) are equivalent (you can combine before or after picking and get the same biom table). I guess 2) is still possible, but now that I know the scope, this seems way harder so go try the other ones! 

(I should mention that this gets much harder and stranger if you were doing open-ref of de novo OTU picking. This is kind of situation where closed-ref really shines!)

Colin

Reply all
Reply to author
Forward
0 new messages