Rarefaction Questions

200 views
Skip to first unread message

pmb0010

unread,
May 2, 2013, 5:36:58 PM5/2/13
to qiime...@googlegroups.com
Hi All-

I am trying to make some rarefaction curve on a data set and have a couple of questions.  I started by using alpha_rarefaction.py to generate these plots (see attached Doc1.pdf).  I was seeing an abrupt (or sharp) change in the slope of my curves.  To see if I could resolve this area a little better I went through all the scripts in the alpha_rarefaction.py pipeline individually so I could could control the min and max number of sequences as well as the step number.  I kept min # of sequences the same, decreased the max sequences down by half, and changed the step number.  I am still seeing a this sharp change in the slope my curve (see Doc2.pdf) and it appears as though that the point where the slope drastically changes has shifted left to lower number of sequences per sample. 

My questions are:

1. What might be causing the sharp slope change in my rarefaction curves?  All of the rarefaction curves I have previously seen have been a more gradual or smoother curve around the area where the slope of the line is changing than the ones that I am producing. 

2.  What would cause a dramatic shift in where the slope of the curve changes from using just the alpha_rarefaction.py verses stating my own values of the max sequence number and the step number?

3. How does Qiime choose samples during the rarefaction portion (during multiple_rarefactions.py and single_rarefaction.py)?  Are the sequences chosen randomly from the sample?  Are OTU's in a sample that have a higher abundance weighted more than OTU's with lower abundances? 

Thanks in advance.  I am really trying to obtain a better handle on what is going on in these scripts.  

Pamela
Doc1.pdf
Doc2.pdf

zhenjiang xu

unread,
May 3, 2013, 3:51:34 PM5/3/13
to qiime-forum
Hi Pamela,

both multiple_rarefactions and sing_rarefaction just randomly sample the sequences without replacement. what are the step sizes in both of the plots? It's probable that your step size is too large. try lower the step size to see the slope changes in a higher resolution.

Best,
Zech



--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

pmb0010

unread,
May 4, 2013, 6:21:36 PM5/4/13
to qiime...@googlegroups.com
Zech,

Thanks for the response.  Do you know if the an OTU is in higher abundance within your sample does it have a higher likelihood of getting sampled? (if there is an OTU that has an abundance of 50% and another OTU that is only 1% does it have the same chance of being picked?)

As far as the step sizes in both plots.  The first plot (where I used the alpha_rarefaction.py script) the min was 10 seq/sample and the max was 120,000 seq/sample and the step was 12,248.  These numbers were generated by the script.  The second plot I kept the min at 10 and set the max at 60,000 seq/sample and I changed the step to 3,000.  I will try to change the step again and see if I see any more resolution.  This is what I was attempting to do in the second plot. 

Any idea though why the point of slope change is shifted to the left in the second plot?  I would think that the slope should be in the same location or maybe even at the first step of 12,258 but it seems to be higher than that on the first plot.  While on the second plot the change in slope occurs less than 10,000 sequences.  The number of reads I have per sample differs greatly, and I am trying to determine what is the lowest amount of reads I can have in a sample and still get a deep enough sampling of the taxon present. 

Thanks again!

Pamela

Will Van Treuren

unread,
May 6, 2013, 7:26:24 AM5/6/13
to qiime...@googlegroups.com
Hi Pamela,

The answer to the question "Do you know if the an OTU is in higher abundance within your sample does it have a higher likelihood of getting sampled?", is yes. The probability that an OTU gets chosen during rarefaction is directly related to its abundance. The sampling algorithm is equivalent to a multinomial draw with probabilities updated after each draw. 

I think the reason that the slope changes at different points in your graphs is merely the product of a larger step size (in graph 1) then the number of sequences that captures the majority of the diversity. Rapid accumulation slows after a depth of 3000 in the 2nd plot. Any step size larger than 3000 then, will capture the majority of the diversity and will show the same pattern of abrupt slope change after the first point. 

For your data, it seems like 20,000 seqs/sample would be appropriate given that the diversity of some of the samples looks diminished at anything less.

Hope this helps,
Will 
Reply all
Reply to author
Forward
0 new messages