Vista Exercise Book

0 views

Skip to first unread message

Helen Drewski

unread,

Aug 3, 2024, 6:02:08 PM8/3/24

to heartjakuterc

Find the maximum percent conservation identity for which all of the exons on the LDL Receptor gene are conserved between Human (March 2006 assembly), Mouse and Dog. Retrieve the coordinates of the conserved regions.

Hint: the RefSeq name of the LDL Receptor gene is LDLR. Right-click on the curves to change parameters. You can select each curve and use the (Details) button to open the Text Browser and get detailed information regarding the alignment.

Identify the human coordinates of the non-coding regions in the HOXA3 gene that are conserved between Human (March 2006 assembly) and Chicken. Find the coordinates of the chicken genomic interval that aligns to human HOXA3.

Please remember that rVISTA has a 20Kb limit on the length of aligned sequences. If the sequence you want to analyze is larger than 20Kb, zoom in on the sequence until you have an interval smaller than 20Kb.

Submit the human-chicken HOXA3 alignment containing the HOXA3 5 utr to rVISTA, and search for HOXA4 transcription binding sites. Find how many clusters of at least 3 conserved HOXA4 binding sites within a 100bp window are present in this human-chicken alignment, and note their approximate location.

Whole Genome rVISTA is designed to aid the analysis of gene expression studies by scanning the regulatory regions of genes exhibiting similar expression patterns. In the current implementation, a genes regulatory region is defined as the sequences upstream of the transcription start site, up to 5kb.

The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. This is a continually growing resource that has tested 1091 noncoding sequences in transgenic mice as of August 2009.

How many of them are conserved in human/mouse/rat at 100% identity over 200 basepairs or more ("ULTRA"conservation criterion)? View the experimental data for a conserved region with a positive enhancer result.

Phylogenetic Shadowing is a strategy for the comparative analysis of multiple closely related species such as primates. In this exercise, we will use mVISTA to generate a multiple alignment of the sequence of 6 primate species and RankVISTA to quantitatively predict conserved regions across all species. We will also use the UCSC Browser and GenBank to retrieve the sequences required for this analysis, and gVISTA to generate the annotation file for the human sequence.

Perform Phylogenetic Shadowing analysis of the alpha-globin cluster regulatory region using the following sequences (all these sequences are available for download at VISTA_WashU.shtml):

Go to . Click on the "Browser" link located in the light blue line at the top of the page. Make sure "Human March 2006" is selected in the base genome box, and enter "LDLR" in the position box (note that you can only enter a RefSeq gene name or a chromosome coordinate in the position box). A new window will open with several matches to this gene name. Inspect the list to find the LDL Receptor Gene: in this case it is the first match. Click on it to load the human/mouse comparison in Vista Browser
Note: downloading the applet may take a while &#8722 be patient. If you experience any difficulties, ask one of the lab assistants to help you.

VISTA browser loads the human/mouse comparison by default. Identify the strand on which LDLR is transcribed, the coding exons and UTRs (they are marked on the annotation track above the curve, and colored according to the color legend in the lower left-hand corner). Are all the exons and UTRs conserved? No, 2 coding exons and 1 UTR are not conserved.

Try adding a second species evolutionarily closer to human, such as dog, to alignment, to improve the exon prediction. Select "Dog" from the second drop-down menu on the left ("select/add") to add the Human-Dog alignment. Accept the default values in the pop-menu for the display parameters for now. These values can be changed at any time by accessing the (Curve Parameters) button.
Are all the exons (coding/UTR) predicted by the human/dog comparison? Yes.

Try adjusting the parameters of the human/mouse comparison to emulate the human/dog comparison by requiring a lower amount of conservation for a region to be considered conserved. To do this, click on the curve you want to modify and select the button from the top menu. Alternatively, you can access "Curve Parameters" by right-clicking on the curve which you want to adjust and select "Parameters" in the pop-up menu. A description of the parameters is available from the "Help" pages at : navigate to 5.3, "Changing Curve Parameters". In this case, you will want to try lowering the "Cons Identity". Experiment with parameter values until you get all the exons to be marked as conserved. Lowering the conservation to 57% identifies all coding exons/UTR as conserved in the human/mouse comparison.

When looking at a highly conserved gene such as this one, it is useful to gain some evolutionary distance in order to identify the most strongly conserved regions. Add the chicken alignment to the display (use the second drop-down menu on the left, or the button from the top menu and then click on the "Track" drop-down menu). Identify regions that are highly conserved in all three species (human, mouse and chicken).
You will notice that some of the highly conserved sequences are non-coding (pink-colored). Those areas might seem like good candidates for further analysis.

Click on the second (human-chicken) curve to select it. Now click on the button ("alignment details") in the toolbar at the top of the screen. A new browser window, called "Text Browser", will open with detailed information regarding the segment of the human-chicken alignment you were looking at.

In this window, you can see detailed information about the aligned regions, including their genomic coordinates. The coordinates of the Chicken region that aligned to human can be found in the second column. A detailed description of all the options available from the Text Browser can be found in the Help pages (see link at 1.4).

To retrieve the coordinates of regions conserved between human and chicken, click on the "Get CNS: human-chicken" link or on the "CNS: human-chicken" link found in the Alignment column. The legend for this table is in the top line. The coordinates of conserved non-coding sequences are those marked as "non-coding". Note that clicking on the links on this page will give you the sequences of the conserved regions, with retrieval options that facilitate the design of PCR primers for further studying these sequences.

Click on the rVISTA link in the "Alignment" column. Enter your email as prompted. You have now started the rVISTA submission process. The default values filled in on the next screen are sufficient for our purposes, however, if you wish to learn about these options, a description is available at

Click on "Submit" to go to a list of possible Transcription Factor Binding sites (matrices) for which to check. There are a large number of matrices here; find the box labeled "HOXA4" and check it. Note that the program becomes slower the more Transcription Factor Binding sites you select. Click on "Submit". Within a few moments you should get an email with a link to a web page that contains your results.

The various visualization options are described at . Check the "conserved," "aligned," and "all" boxes in the "Binding sites to visualize" column. To identify conserved HOXA4 binding sites occurring in clusters of 3 or more, in the "Clustering" area select "Individual Clustering": sites=3, base pairs=100 (note that "Group Clustering" is not applicable in this case as we have only searched for 1 transcription factor). Click on "Submit" to look at the predicted transcription binding sites (shown as tick marks above a regular Vista curve). The conserved predicted sites are shown in green. Only conserved predicted sites occurring in clusters of 3 or more in a 100 bp window are shown.

To appreciate the differences between the various visualization options ("conserved," "aligned," and "all"), remove the clustering requirement by selecting "Individual Clustering": sites=1, base pairs=100 in the Visualization Options area at the bottom of the page and resubmit. Inspect the new plot and note how moving from "all" to "aligned" to "conserved" reduces the number of site predictions.

The top table in the "Results" page shows all transcription factors that are overrepresented in the 5000 bp upstream of all the 16 genes submitted for this exercise at a p-value cutoff of 0.005. Note that not all the 16 genes need to have binding sites for a given transcription factor. The bottom table shows the transcription factors overrepresented upstream of each gene individually.

To obtain a multiple sequence alignment for your sequences, submit your 6 sequences (the human and 5 non-human primate sequences) to the mVISTA server. Sequence can be uploaded in FASTA format from a local computer using the " Browse" button or, if available in GenBank, they can be retrieved by inputting the corresponding GenBank accession number in the " GENBANK identifier" field. Enter the human sequence as sequence#1.

Three genomic alignments programs are available in mVISTA. " LAGAN" is the only program that produces multiple alignments of finished sequences, and is the most appropriate choice for phylogenetic shadowing. Note that if some of the sequences are not ordered and oriented in a single sequence, your query will be redirected to AVID to obtain multiple - pairwise alignment. " AVID" and " Shuffle-LAGAN" are not appropriate genomic aligners for phylogenetic shadowing as they produce only all-against-all pair-wise alignments.

Clicking on the link found in the body of the email takes to the results page. It lists every organism you submitted, and provides you with three viewing options using each organism as base. These three options are: