Result alpha-rarefaction.py

349 views
Skip to first unread message

Mondher KHEDIRI

unread,
May 22, 2014, 3:11:04 PM5/22/14
to qiime...@googlegroups.com
Hello,
I performed an analysis using the alpha-rarefaction workflow using the following command :

          alpha_rarefaction.py -i otu_table.biom -m mapping-file-modified -p alpha-parameter -t rep_set.tre -o Alpha-diversity -e 234456

and in the parameter file i specified the differents metrics to use by using :
           alpha_diversity:metrics shannon,PD_whole_tree,chao1,observed_species

I'm interrested in the output of the shnnon matric, so i verified the result in alpha_div_collated file and i found that the script rarefy 10 time the total number of sequences starting to increase the number of sequences rarefied in each iteration and in each time 10 iteration are performed
So my questions are :
1) I don't understand why the analysis is performed in an increasing manner
2) I don't see why we need to perform 10 time iteration for each subset of sequence
3) If want to get the Shannon index for each sample would i be called to calculate the mean of all the iteration steps or just the ones in the end when the total number of sequences is considered.

Thank you.
Mondher

shannon.txt

Will Van Treuren

unread,
May 22, 2014, 4:50:53 PM5/22/14
to qiime...@googlegroups.com
HI Mondher,

1) I don't understand why the analysis is performed in an increasing manner

If I understand your question correctly you are asking why QIIME does rarefactions at 10 seqs/sample, then 23454 seqs/sample, then 46898, etc. The reason is that the purpose of the rarefaction curve (also known as an accumulation curve) is to see if you have adequately sampled the diversity of your samples. So, at increasing depth (larger numbers of seqs/sample) we should hopefully see the rarefaction curve level out; in other words, we should see that the amount of additional diversity we are getting by sequencing 100,000 vs 50,000 seqs/sample is not very much. This implies that we have adequately sequenced our community. 

> 2) I don't see why we need to perform 10 time iteration for each subset of sequence

The rarefaction process is random sampling without replacement. So, depending on which sequences the computer picks, the information can be different. Repeating 10 times ensures that we are getting a more true estimate of the actual alpha diversity at that level of rarefaction. 

> 3) If want to get the Shannon index for each sample would i be called to calculate the mean of all the iteration steps or just the ones in the end when the total number of sequences is considered.

I am not sure I understand your question. If you wanted to get the Shannon index for each sample you would use the script alpha_diversity.py. This can give you alpha diversity numbers for each sample without computing the rarefaction curves as alpha_rarefaction.py does. 

Hope this helps,
Will 


--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mondher KHEDIRI

unread,
May 23, 2014, 11:00:50 AM5/23/14
to qiime...@googlegroups.com
Good Morning,
Thank you for the explication, it wa really helpful.
I used the alpha_diversity.py to get the differents diversity index that i needed.
In the
alpha_diversity Documentation it wa mentioned that a documentation is available for the differents matrics implimented in qiime :
      

http://qiime.org/scripts/alpha_diversity_metrics.html

but the link is no more available, so if you have any idea of where should i be looking to get a description for the differents metrics used, it will be helpful.

Thank you
Mondher

Will Van Treuren

unread,
May 23, 2014, 1:18:24 PM5/23/14
to qiime...@googlegroups.com
Hi Mondher,

This page should have what you are looking for. 

Best,
Will 


Mondher KHEDIRI

unread,
May 26, 2014, 11:03:32 AM5/26/14
to qiime...@googlegroups.com
Hello,
Regarding the  10 time iteration done for each subset of sequence for the rarefaction, if qiime pick a random subset of sequence each time, for the maximum number of sequence picked (in my case  234450) normally we would have the same rarefaction values, science qiime is using all the sequences available and there is no random picking.

Thank you
Mondher
 
Le jeudi 22 mai 2014 15:11:04 UTC-4, Mondher KHEDIRI a écrit :
Reply all
Reply to author
Forward
0 new messages