16s Rrna For Bacterial Identification

0 views

Skip to first unread message

Jeannine Lander

unread,

Aug 5, 2024, 12:33:59 PM8/5/24

to bagtinecse

16Sand Internal Transcribed Spacer (ITS) ribosomal RNA (rRNA) sequencing are common amplicon sequencing methods used to identify and compare bacteria or fungi present within a given sample. Next-generation sequencing (NGS)-based ITS and 16S rRNA gene sequencing are well-established methods for comparing sample phylogeny and taxonomy from complex microbiomes or environments that are difficult or impossible to study.

The prokaryotic 16S rRNA gene is approximately 1500 bp long, with nine variable regions interspersed between conserved regions. Variable regions of the 16S rRNA gene are frequently used for phylogenetic classification of genus or species in diverse microbial populations.1 The ITS1 region of the rRNA cistron is a commonly used DNA marker for identifying fungal species in metagenomic samples.2

A key benefit of 16S and ITS ribosomal RNA NGS methods is that they provide a cost-effective technique to identify strains that may not be found using traditional methods. Unlike capillary sequencing or PCR-based approaches, next-generation sequencing is a culture-free method that enables analysis of the entire microbial community within a sample.

16S rRNA NGS allows microbiologists to achieve genus-level sensitivity for metagenomic surveys of bacterial populations. ITS analysis with NGS enables rapid fungal identification to help advance our understanding of the mycobiome. Furthermore, NGS offers the ability to combine multiple samples in a sequencing run.

Illumina offers products to support NGS-based 16S and ITS rRNA analysis studies, from library preparation to data analysis and interpretation. Our user-friendly workflow can help take the guesswork out of your experiments.

This method involves comprehensively sampling all genes in all organisms present in a given complex sample. It allows microbiologists to evaluate bacterial diversity and detect the abundance of microbes in various environments.

eDNA sequencing is an emerging method for studying biodiversity and monitoring ecosystem changes. For some sample types, using a combination of 16S or ITS sequencing with other approaches can help uncover the full breadth of diversity in an ecological sample.

16s rRNA sequencing is a culture-free method to identify and compare bacterial diversity from complex microbiomes or environments that are difficult to study. It is commonly used to identify bacteria present within a given sample down to the genus and/or species level. Specifically, it is an amplicon-based sequencing method that targets the 16s rRNA bacteria-specific genetic marker using a single amplicon focused on a single gene.

Because the 16s rRNA sequence is ubiquitous in bacteria and archaea, it can be used to identify a wide diversity of microbes within a single sample and single workflow. Through 16s rRNA sequencing, one can identify taxa present in a sample. This leads to a greater understanding of our microbial communities and their interactions with us.

Both the ribosome and its subunits are characterized by their sedimentation coefficients, expressed in Svedberg units (symbol: S). In this case, 16s means it takes 16 Svedberg units of time for the ribosome to sediment in a solution.

16S DNA refers to the gene in the bacterial genome that codes for the 16S rRNA. 16S rRNA is the rRNA that is transcribed from the 16S DNA gene. The Illumina 16S Metagenomic Sequencing Library Preparation protocol uses DNA as input, and the PCR primers target the variable regions V3 and V4 of the 16S DNA gene for the amplicon PCR.

The 16S Demonstrated Protocol provides an option for creating Illumina compatible libraries from the target of your choice. Fungi and other organisms do not have 16s rRNA genes, however, they have other conserved regions such as 18S and ITS regions. Any amplicon can be used to do similar diversity analysis studies.

At Illumina, our goal is to apply innovative technologies to the analysis of genetic variation and function, making studies possible that were not even imaginable just a few years ago. It is mission critical for us to deliver innovative, flexible, and scalable solutions to meet the needs of our customers. As a global company that places high value on collaborative interactions, rapid delivery of solutions, and providing the highest level of quality, we strive to meet this challenge. Illumina innovative sequencing and array technologies are fueling groundbreaking advancements in life science research, translational and consumer genomics, and molecular diagnostics.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

16S sequences have also been exploited using low-throughput methods to distinguish strains (sometimes called subspecies) based on polymorphisms within the gene. Single-nucleotide polymorphisms (SNPs) have been used to track strains of clinical relevance or, when they are stably linked to other parts of the bacterial haplotype, to predict phenotypic characteristics2. Thus, accurate and complete 16S sequences are of high utility in many applications. Until recently, however, accurate, full-length 16S sequences have been beyond the scope of high-throughput sequencing platforms.

Availability of third-generation technologies means that high-throughput sequencing of the full 16S gene is becoming commonplace. Circular consensus sequencing (CCS)3,4, combined with sophisticated denoising algorithms5,6,7,8 to remove PCR and sequencing error, mean it is now possible to discriminate between millions of sequence reads that differ by as little as one nucleotide across the entire gene. Together, these technological and methodological advances mean that for the first time it is becoming possible to exploit the full discriminatory potential of 16S in a high-throughput manner.

Here, we demonstrate that, in the face of such changes, historical assumptions need to be revisited. Using an in-silico dataset of sequences taken from public databases we show that commonly targeted 16S sub-regions, such as V4, are unable to match the taxonomic accuracy achieved when sequencing the full 16S gene. Using long-read sequencing of mock and in-vivo communities, we demonstrate that it is possible to accurately resolve the divergent copies of the 16S gene that exist within the same genome. Finally, we demonstrate that such intragenomic 16S gene copy variants are highly prevalent in taxa isolated from the human gut microbiome, suggesting they may be used to improve discrimination between species and even strains in 16S gene-based microbiome studies.

In-silico comparison of 16S rRNA variable regions. a Shannon entropy across the 16S gene based on the alignment of a single representative sequence for each known species present in the Greengenes database. Sequences were aligned against a single reference 16S gene for Escherichia coli K-12 MG1655 (NCBI Gene ID 947777). Gray panels depict variable regions defined by commonly used primer-binding sites (Supplementary Table 1). Variable regions considered in this study are shown as red lines (bottom). b Proportion of sequences for each variable region that could not be identified to species level when classifying each sequence against the reference database from which it was derived at a confidence threshold of 80% (RDP classifier). c Trees based on taxonomy of sequences present in the in-silico database. The same tree is provided for each variable region. The color of each branch reflects the proportion of sequences within each clade that could not be identified to species level. d The number of OTUs created when clustering sequences for each variable region at 99% sequence similarity. Dashed line indicates the number of unique sequences (>1% different) in the original database. Source data are provided as a Source Data file

We found that sub-regions differed substantially in the extent to which they could confidently discriminate between the full-length 16S sequences used to represent species (Fig. 1b). The V4 region performed worst, with 56% of in-silico amplicons failing to confidently match their sequence of origin at this taxonomic level. By contrast, when a full-length sequence with all variable regions was used, it was possible to classify nearly all sequences as the correct species (Supplementary Fig. 1a). Altering databases and classification confidence thresholds affected the proportion of in-silico amplicons that could be accurately matched, but did not influence prevailing trends (Supplementary Fig. 1a, b).

Finally, the choice of sub-region dramatically affected the number of OTUs formed when clustering in-silico amplicons to create OTUs. When clustering at 99% sequence identity, all sub-regions failed to recreate the number of distinct sequences present in the original database; however, the V4 region again performed worst (Fig. 1d). Notably, the relative number of OTUs produced by each sub-region was not consistent at different identity thresholds (97%, 98%, 99%, Supplementary Fig. 3), indicating that the behavior of clustering algorithms may be difficult to predict when the amount of information contained within a sequenced region is highly variable.

Clustering of 16S sequences into OTUs has historically served two purposes. First, it has removed minor artifactual sequence variants due to PCR amplification and sequencing errors when collapsing sequences into groups. Second, it has collapsed legitimate sequence variants that exist between closely related bacterial taxa. Although the latter may not always be desirable, it stands to reason that you cannot distinguish between bacterial taxa whose 16S sequences vary at a rate that is lower than the error encountered on a particular sequencing platform.

Recently, advances in CCS have dramatically improved error rates of long-read sequencing platforms. At the same time, computational methods have made it possible to distinguish between legitimate vs. artifactual sequence variation. These technological and methodological advances mean researchers now have the potential to perform high-throughput sequencing that can accurately detect single-nucleotide variants across the entire 16S gene.