pick_open_reference _otus.py default similarity threshold and taxonomy assignment

MarAfa

unread,

Mar 5, 2017, 12:54:52 PM3/5/17

to Qiime 1 Forum

Hi QIIME team,
I have a question about pick_open_reference_otu.py. I run it with default parameters and get some results I don't understand very well.

1) I was comparing results from pick_open_reference_otus.py and pick_closed_reference_otus.py. And results surprised me, because by close reference approach I get some taxons which I didn't get by open reference approach. Shouldn't it be opposite way around ? I thought at first open reference work as close reference (so it should find the same taxons) and than it works like de_novo_otu_picking.py so it should add some more OTUs. But by close reference I get taxons which I didn't found by open reference.

2) what is the default similarity threshold ? Is it 0,97 % for comparing against database as well as comparing against reads one to another ?

3) How can I get NewOTU by pick_open_reference_otus.py which are assigned to some taxon, when NewOTU are made if they don't match reference database. Is then somehow lowered similarity threshold?

Thanks for any ideas or solution.

jonsan

unread,

Mar 6, 2017, 10:24:32 PM3/6/17

to Qiime 1 Forum

Hi,

1. That is odd! how are you evaluating the taxa returned?

2. 97% is the default similarity in both cases.

3. I'm not quite sure what you mean by 'NewOTU', but open reference OTU picking works by clustering the OTUs that don't initially hit to the reference database. So in other words, first you take out all the sequences you've seen before, and then the remaining sequences are compared against one another at the same similarity threshold. Does that make sense?

Cheers,

-jon

MarAfa

unread,

Mar 18, 2017, 3:37:03 PM3/18/17

to Qiime 1 Forum

Thank you for your response!

1. I have a list of all taxa returned by pick_open_reference_otus.py and pick_closed_reference_otus.py. Both scripts I run with default parameters. I put all taxa families that had been found into graph and see that pick_open_reference_otus.py found two taxa families that had not been found by closed reference approach (namely Kineosporiaceae and Patulibacteraceae). On the other hand by pick_closed_reference_otus.py I founded another 14 taxa families, that were not detected by pick_open_reference_otus.py (for example Staphylococcaceae, Aurantimonadaceae, Propionibacteriaceae,...).. I would love to know how this happened and if there are any differences in default parameters of those two approaches that cause this.

2. I was suspecting this parameter may cause strange results, but seems something else is to be blamed.

3. By NewOTU I meant "New.CleanUp.ReferenceOTU" -> according to QIIME tutorial those are OTUs made from reads that doesn't hit reference database in first step. So reads which failed to hit the database are than clustered together and make new OTUs described as New.CleanUp.ReferenceOTU. But how is it than possible that in my biom file I see they are assigned to some taxonomy, when they didn't hit the reference database at first step? If it is possible to match them with database why they weren't assigned to taxonomy immediately in first step?

I will be glad for any advice or explanation.
Thanks!

Reply all

Reply to author

Forward