How To Plot A Rarefaction Curve In Excel

26 views

Skip to first unread message

Leigha Keplinger

unread,

May 4, 2024, 10:58:14 AM5/4/24

to previstule

I did it very old school and opened the rarefaction file in excel and then plotted no of new otus vs no of seqs.
I can open almost every output from mothur in one of excel, a text editor or internet explorer (for svg files)

In ecology, rarefaction is a technique to assess species richness from the results of sampling. Rarefaction allows the calculation of species richness for a given number of individual samples, based on the construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the number of samples. Rarefaction curves generally grow rapidly at first, as the most common species are found, but the curves plateau as only the rarest species remain to be sampled.[1]

How To Plot A Rarefaction Curve In Excel

Download Zip === https://t.co/vkgmAddgv5

The issue that occurs when sampling various species in a community is that the larger the number of individuals sampled, the more species that will be found. Rarefaction curves are created by randomly re-sampling the pool of N samples multiple times and then plotting the average number of species found in each sample (1,2, ... N). "Thus rarefaction generates the expected number of species in a small collection of n individuals (or n samples) drawn at random from the large pool of N samples.".[2]

The technique of rarefaction was developed in 1968 by Howard Sanders in a biodiversity assay of marine benthic ecosystems, as he sought a model for diversity that would allow him to compare species richness data among sets with different sample sizes; he developed rarefaction curves as a method to compare the shape of a curve rather than absolute numbers of species.[4]

One can plot the number of species as a function of either the number of individuals sampled or the number of samples taken. The sample-based approach accounts for patchiness in the data that results from natural levels of sample heterogeneity. However, when sample-based rarefaction curves are used to compare taxon richness at comparable levels of sampling effort, the number of taxa should be plotted as a function of the accumulated number of individuals, not accumulated number of samples, because datasets may differ systematically in the mean number of individuals per sample.

Richness estimators can also be used to compare between/among sites based on diversity indices or dissimilarity (beta diversity) or simply on rarefaction curves. Because richness is inherently sample-size dependent, however, any such comparison must be done at equivalent sample sizes, which is why we rarefy (and extrapolate) (EstimateS manual).

ALSO: this curve (and the estimators) can be obtained directy from EstimateS, preparing and treating each vector as an individual-based abundance data, and then plotting the results using EXCEL

In this talk, two types of standardization methods are reviewed: (1) Sample-size-based rarefaction and extrapolation methods aim to compare diversity estimates for equally-large samples determined by samplers. (2) Coverage-based rarefaction and extrapolation methods aim to compare diversity estimates for equally-complete samples; the sample completeness in this method is measured by sample coverage (the proportion of the total number of individuals that belong to the species detected in the sample), a concept originally developed by Alan Turing and I. J. Good in their cryptographic analysis during World War II. Contrary to intuition, sample coverage for the observed sample, rarefied samples, and extrapolated samples can be accurately estimated by the observed data themselves. These two types of standardization methods allow researchers to efficiently use all available data to make robust and detailed inferences about the sampled assemblages, and also to make objective comparisons among multiple assemblages. Hypothetical and real examples are presented for illustrating the use of the online software iNEXT (iNterpolation/EXTrapolation) to compute and plot seamless rarefaction/extrapolation sampling curves based on several diversity measures.

The other main use is to estimate the true (but unknown) richness (often called rarefaction). If your species richness curve is asymptotic, one can treat the value it converges to as an estimate of true richness. Even with structured sampling, that approach only works well if the vast majority of species have been sampled and the curve is nearly asymptotic. If there are many species that have not yet been detected, you likely just need more sampling.

MicrobiomeAnalyst allows users to perform different types of analyses on maker gene count table including: visual exploration through interactive stack barplot and pie chart, rarefaction curve and phylogenetic tree, community profiling through diversity analysis, clustering and correlation through interactive heatmaps, dendrogram and correlation network, comparison and classification through multi-factor comparision analysis, LEfSe and Random Forest, as well as functional prediction through PICRUSt, Tax4Fun and Tax4Fun2.

QIIME creates plots of alpha diversity vs. simulated sequencing effort, known as rarefaction plots, using the script make_rarefaction_plots.py. This script takes a mapping file and any number of rarefaction files generated by collate_alpha.py and creates rarefaction curves. Each curve represents a sample and can be colored by the sample metadata supplied in the mapping file.

This step generates a wf_arare/alpha_rarefaction_plots/rarefaction_plots.html that can be opened with a web browser, in addition to other files. The wf_arare/alpha_rarefaction_plots/average_tables/ folder, which contains the rarefaction averages for each diversity metric, so the user can optionally plot the rarefaction curves in another application, like MS Excel. The wf_arare/alpha_rarefaction_plots/average_plots/ folder contains the average plots for each metric and category and the wf_arare/alpha_rarefaction_plots/html_plots/ folder contains all the images used in the html page generated.

To view the rarefaction plots, open the file wf_arare/alpha_rarefaction_plots/rarefaction_plots.html in a web browser, typically by double-clicking on it. Once the browser window is open, select the metric PD_whole_tree and the category Treatment, to reveal a plot like the figure below. You can also turn on/off lines in the plot by (un)checking the box next to each label in the legend, or click on the triangle next to each label in the legend to see all the samples that contribute to that category. Below each plot is a table displaying average values for each measure of alpha diversity for each group of samples the specified category.

The rarefaction curve of annotated species richness is a plot (seeFigure 5.11 of the total number of distinctspecies annotations as a function of the number of sequences sampled.The slope of the right-hand part of the curve is related to the fractionof sampled species that are rare. On the left, a steep slope indicatesthat a large fraction of the species diversity remains to be discovered.If the curve becomes flatter to the right, a reasonable number ofindividuals is sampled: more intensive sampling is likely to yield onlyfew additional species. Sampling curves generally rise quickly at firstand then level off toward an asymptote as fewer new species are foundper unit of individuals collected.

The rarefaction curve is derived from the protein taxonomic annotationsand is subject to problems stemming from technical artifacts. Theseartifacts can be similar to the ones affecting amplicon sequencing(Reeder and Knight 2009), but the process of inferring species fromprotein similarities may introduce additional uncertainty.

The rarefaction view is available only for taxonomic data. Therarefaction curve of annotated species richness is a plot (see Figure5.18) of the total numberof distinct species annotations as a function of the number of sequencessampled. As shown in Figure5.18, multiple data setscan be included.

The slope of the right-hand part of the curve is related to the fractionof sampled species that are rare. When the rarefaction curve is flat,more intensive sampling is likely to yield only a few additionalspecies. The rarefaction curve is derived from the protein taxonomicannotations and is subject to problems stemming from technicalartifacts. These artifacts can be similar to the ones affecting ampliconsequencing (Reeder and Knight 2009), but the process of inferringspecies from protein similarities may introduce additional uncertainty.

Sampling curves generally rise very quickly at first and then level offtoward an asymptote as fewer new species are found per unit ofindividuals collected. These rarefaction curves are calculated from thetable of species abundance. The curves represent the average number ofdifferent species annotations for subsamples of the the completedataset.

If the rarefaction curve still presents a growing trend at its end, it means that the coverage is not enough to adequately represent the real microbial diversity of the sample. By contrast, if the curve shows a horizontal asymptote, it means that a good estimation of diversity was obtained.

Note that the results of the rarefaction technique give us a suggestion about the coverage, but they are not conclusive, i.e. rare OTUs that are present in the sample could not be yet observed even if the curve presents an asymptotic trend.

An accumulation or diversity curve (Figure 9) plots the cumulative number of distinct OTUs discovered as a function of the number of samples examined. I.e. The minimum, average, and maximum number of OTUs, when looking at 1, 2, ... N samples of the current dataset.

Both alpha and beta diversity analyses did not detect any differences between the microbial communities of the two groups. Alpha and beta diversity analyses values are reported in Additional file 1. The rarefaction curves reached a plateau, indicating good representation of the microbial community. The rarefaction curve plots are reported in Additional File 2.