Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Technological improvements have resulted in increased discovery of new microRNAs (miRNAs) and refinement and enrichment of existing miRNA families. miRNA families are important because they suggest a common sequence or structure configuration in sets of genes that hint to a shared function. Exploratory tools to enhance investigation of characteristics of miRNA families and the functions of family-specific miRNA genes are lacking. We have developed, miRNAVISA, a user-friendly web-based tool that allows customized interrogation and comparisons of miRNA families for hypotheses generation and comparison of per-species chromosomal distribution of miRNA genes in different families. This study illustrates hypothesis generation using miRNAVISA in seven species. Our results unveil a subclass of miRNAs that may be regulated by genomic imprinting and also suggest that some miRNA families may be species-specific, as well as chromosome- and/or strand-specific.
microRNA (miRNA) are a class of evolutionarily conserved endogenous non-coding RNAs that may act as gene regulators in both plants and animals1,2,3,4,5. Mature miRNA are excised from longer single-stranded precursor (pre-miRNA) transcripts that fold into hairpin-like RNA secondary structures6,7. Each pre-miRNA originates from a cistronic or polycistronic transcript (pri-miRNA) containing several hairpins that are thought to collaborate in providing a common function8. Without loss of generality, a pre-miRNA is considered as an independent miRNA gene9,10,11,12. The latest miRBase13,14 release 19 (R19) contains 21,264 experimentally validated miRNA genes (green bars in Fig. 1a) expressing 25,141 mature miRNA (red bars in Fig. 1a) in 193 species (Fig. 1c). About 50% of the species are from the animal kingdom whereas plant, viral and protist (chromalveolata and mycetozoa) kingdoms approximately represent 35%, 26% and 3% of the database entries, respectively. One miRNA gene can yield more than one mature miRNA.
(a) The green, red and blue bars show the number of miRNA genes (hairpins), classified miRNA genes and mature miRNA in each miRBase release, respectively. One miRNA gene can yield more than one mature miRNA and therefore the total number of the latter can exceed that of the former. The increase in number of validated mature miRNA and their genes is mainly a consequence of improved high throughput sequencing. (b) Increasing number of miRNA families with time. (c) Number of species where gene regulation by miRNA has been reported in different miRBase releases; also see Supplementary Fig. S1 online. The increase in number of miRNA families and coverage of species has benefited from increased miRNA gene numbers, community annotation and improved computational algorithms12,15,19.
As the number reported mature miRNA and their genes (Fig. 1a), miRNA families (Fig. 1b) and coverage of species (Fig. 1c) continues to grow almost exponentially, attention is now shifting to elucidating the function of these miRNAs and their influence in biochemical pathways and diseases. However, tools for exploration analysis of high-dimension categorical data that characterizes miRNA family categories are still lacking to enhance hypothesis driven investigations of their constituent genes, their functional roles and general properties. This study attempts to address this issue by providing a tool that can be used to evaluate the following questions. What attributes and characteristics are encoded in miRNA families? Which characteristics can be used to examine potent inter-miRNA family relationships and/or define clan(s) of miRNA families? How can the diversity of miRNA families across species, lineages and/or kingdoms be interrogated? How can the genomic distribution of family-annotated miRNA genes be summarized and compared in different genomes? What information can be inferred when jointly interrogating the genomic distribution of miRNA genes, organization (spatial co-location on specific chromosomes) and characteristics of miRNA families? Do family-specific miRNA genes exist as clusters? And if so, what are the general functions of clustered miRNA genes belonging to different miRNA families and what are the links between family-specific mature miRNAs and biochemical pathways and/or diseases?
Insights into intra- and inter-miRNA-family relationships are still scarce. At the gene level, several machine learning algorithms exist to find new members of miRNAs families based on sequence and/or structure conservation and clustered genes based on intergenic distance6,8,16,17,18,19. However, there is a need to jointly interrogate and integrate information about the genomic distribution of miRNA genes and their sequence and/or structure organization implied by miRNA family categories. Such interrogation is useful to better understand both the properties and function of family-annotated miRNA genes and to establish the characteristics that may define inter-miRNA family relationships.
Results from sequence-based or intergenic-based clustering methods imply common ancestry that is defined by weak miRNA sequence similarity and/or localization on a single spatially unique genomic region6. Lu et al. (2008)20 found evidence that miRNA derived from clustered miRNA genes tend to have similar functional roles and disease associations. Clustered genomic arrangements can however involve miRNA genes from different miRNA families (see Supplementary Table S1 online). Nonetheless, given that structural evolution is thought to be slower than sequence evolution6, the curation of miRNA families based on structural clustering can suggest more general functional commonalities in sets of genes than those hinted by sequence-/intergenic-based methods. That is, existence of miRNA family groupings (clans of families i.e. sub-classes) that have shared general characteristics. Currently, there are no tools in existence that can be used to explore the existence of such miRNA family sub-classes.
Only a handful of software tools exploit the information availed by miRNA family categories for predictive purposes or otherwise. For example, a recent study (Kamanu, T. K. K., PhD thesis, King Abdullah University of Science and Technology, 2012) has developed an miRNA gene discovery system based on miRNA family categories that enables species-independent prediction of unknown miRNA genes from arbitrary nucleotide sequences. Gerlach et al. (2009)12 and Ding et al. (2011)15 have proposed supervised models for predicting miRNA family membership given unlabeled or unclassified miRNA sequences. Our study aims to determine the genomic distribution of family-annotated miRNA genes in a given species. Knowledge about such species-specific miRNA gene distribution is important when modeling miRNA-regulation, especially for clustered miRNA, co-expressed miRNA genes and mirtrons21,22,23,24,25. miRNA gene distribution may influence the annotation of miRNA promoter regions which are yet to be fully understood26.
The aim of the miRNAVISA system is to provide a user-friendly interactive web interface to enable comparison and exploration of different miRNA families for the purpose of generating hypotheses about their properties. miRNAVISA allows inquiry of and comparison of the genomic distribution of family-annotated genes in a given species, as well as comparison between species. miRNA gene distribution for a given family are comparable in closely related species and can be used to provide clues about the miRNA function and roles in biochemical pathways. For instance, are miRNA and/or miRNA families chromosome-specific? The spatial location of miRNA genes has been implied to preferentially influence miRNA function in different chromosome-linked diseases such as Down's Syndrome27. Recently, human chromosome 19 miRNAs have been suggested to safeguard the integrity of fetal-maternal communication and therefore their mimics offer themselves as candidates for treatment of autoimmune diseases28,29.
miRNAVISA can also be used to query the existence of sub-classes of miRNA genes by generating data-dependent relationships that may define such miRNA sub-classes. In contrast to other tools such as miRNAMap30, which is specific to metazoan genomes and does not interrogate miRNA families, miRNAVISA can be used to interrogate the diversity of miRNA families across species and kingdoms.
Here, we demonstrate the use miRNAVISA for hypotheses generation about miRNA families, their genes and function in different species. Moreover, miRNAVISA can be used to infer new roles of miRNA genes in a given miRNA family.
miRNAVISA can enable a concise database summary given a set of sample families and species of interest. Figure 2 is a visualization of the cross-tabular distribution of family-annotated miRNA genes from miRBase R19 across 193 species and 1,543 miRNA families. The distribution of miRNA genes among 24 miRNA families in the seven sample species is shown against the remaining 186 species. The families were selected based on the family sizes and specific attributes of interest. For example, the let-7 miRNA family contains the founding members of the miRNA class and is among the largest families in the miRBase database; and the mir-515 and mir-548 miRNA families are known to be primate-specific28,29,31,32 (see also Figure 2).
The order preserved in Figure 2 is based on the magnitude of the family sizes (decreasing). Column-wise normalization was done based on family sizes marked in red. The total number of miRNA genes in the 23 selected miRNA families in each species is shown by the non-bracketed numerals on the right-most y-axis, while the total number of species-specific family-annotated miRNA genes is indicated in brackets. The difference between these numeric values represents genes in the remaining 1,519 miRNA families that are not included in our current analysis.
c80f0f1006