UCSC the Vertebrate Multiz Alignment & Conservation (100 Species) protein sequences???

1,253 views
Skip to first unread message

Rini Pauly

unread,
Apr 17, 2015, 3:17:38 PM4/17/15
to gen...@soe.ucsc.edu

How do I obtain from UCSC the Vertebrate Multiz Alignment & Conservation (100 Species) protein sequences for only 5 species I am interested in say for example for locus: chr17:7578328-7578477?

 

I have tried all possible ways, but in vain.. Also attached in the image of the region I require.

 

Any help will be greatly appreciated.

 

~Thanks,

Rini

hgt_genome_563c_15b8a0.png

Jonathan Casper

unread,
Apr 17, 2015, 5:30:43 PM4/17/15
to Rini Pauly, gen...@soe.ucsc.edu

Hello Rini,

Thank you for your question about obtaining data from protein-coding regions of the 100-way multiple alignment for your species of interest. You can do this using the UCSC Table Browser. Here is an example of how to do this for genes from the UCSC Genes track, but you can also select other gene tracks during step 2 below.

1. Open the UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables.
2. Select the following options:

Clade: Mammal
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19)
Group: Genes and Gene Predictions
Track: UCSC Genes
Table: knownGene
Region: Position, and enter chr17:7578328-7578477 into the position box
Output format: CDS FASTA alignment from multiple alignment

3. Click the "Filters: create" button to go to the filters page.
4. In the "Free-form query" box for hg19.knownGene, add the text "cdsStart != cdsEnd". This will limit your results to only protein-coding genes.
5. Click "submit" to return to the main Table Browser page.
6. Click "Get output".
7. On the next page, select the MAF table "multiz100way", adjust the formatting options as desired for your output (e.g., nucleotide output instead of amino acids), and check the selection boxes for the species whose aligned sequence you want to obtain.
8. Click "Get output".

The result will be a list of protein sequences from your chosen species as aligned in the 100-way multiple alignment.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group


--


Jonathan Casper

unread,
Apr 17, 2015, 5:41:44 PM4/17/15
to Rini Pauly, gen...@soe.ucsc.edu
Hello Rini,

One followup - one of our engineers reminded me that the CDS FASTA option already filters out non-coding transcripts, so you can skip steps 3, 4, and 5 in the listed directions.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group

Reply all
Reply to author
Forward
0 new messages