Hi Varun,
Thank you for your question about obtaining sequence and coordinates
for a gene in multiple species from the 100-way vertebrate Multiz
alignment track.
You can obtain this information by using the "CDS Fasta" option on
the Table Browser:
1. Navigate to the Table Browser,
http://genome.ucsc.edu/cgi-bin/hgTables.
2. Make the following selections:
clade: Mammal
genome: Human
assembly: Dec. 2013 (GRCh38/hg38)
group: Genes and Gene Predictions
track: GENCODE v22
region: genome
Output format: CDS FASTA alignment from multiple alignment
3. Next to "identifiers (names/accessions)", click "paste list" or
"upload list".
4. Paste in or upload your list of identifiers.
5. Click "submit".
6. Click "get output".
7. In the drop-down next to "MAF table", select "multiz100way".
8. Under "Formatting options", check the box next to "Show
nucleotides".
9. Check the boxes next to those species you are interested in.
10. Click "get output".
Your output should look something like this:
>uc062ekq.1_hg38 543 chr22:39316844-39319597-
ATGTCTCACAGAAAGTTCTCCGCTCCCAGACATGGGTCCCTCGGCTTCCTGCCTCGGAAGCGCAGCAGCAGGCATCGTGGGAAGGTGAAGAGCTTCCCTAAGGATGACCCGTCCAAGCCGGTCCACCTCACAGCCTTCCTGGGATACAAGGCTGGCATGACTCACATCGTGCGGGAAGTCGACAGGCCGGGATCCAAGGTGAACAAGAAGGAGGTGGTGGAGGCTGTGACCATTGTAGAGACACCACCCATGGTGGTTGTGGGCATTGTGGGCTACGTGGAAACCCCTCGAGGCCTCCGGACCTTCAAGACTGTCTTTGCTGAGCACATCAGTGATGAATGCAAGAGGCGTTTCTATAAGAATTGctccgctcggctctgcccgatgagctccatccaggctccgctTGCTGGTGGAAAAGGCTCCTTAGAAGCCGGCAATGAGCTCCATCCCCACGCGGTGCCAGTGTGCCTTCCGCTCACCCCTCGGAGGGGTGATGAAGGCCTGCACCTGGTCCCCTCCCCAACTCTGCTCTGCTCCTGA
>uc062ekq.1_panTro4 543 chr22:37994613-38003827-
ATGTCTCACAGAAAGTTCT-CGCTCCCAGACAT-GGTCCCTCGGCTTCCTGCCTCGGAAGCGCAGCAGCAGGCATCGTGGGAAGGTGAAGAGCTTCCCTAAGGATGACCCGTCCAAGCCGGTCCACCTCACAGCCTTCCTGGGATACAAGGCTGGCATGACTCACATCGTGCGGGAAGTCGACAGGCCAGGATCCAAGGTGAACAAGAAGGAGGTGGTGGAGGCTGTGACCATTGTAGAGACACCACCCATGGTGGTTGTGGGCATTGTGGGCTACGTGGAAACCCCTCGAGGCCTCCGGACCTTCAAGACTGTCTTTGCTGAGCACATCAGTGATGAATGCAAGAGGCGTTTCTATAAGAATTGCTCCGCTCGGCTCTGCCCGATGAGCTCCATCCAGGCTCCGCTTGCCGGTGGAAAAGGCTCCTTAGAAGCCGGCAATGAGCTCCATCCCCACACGGTGCCAGTGTGCCTTCCGCTCACCCCTCGGAGGGGTGATGAAGGCCTGCACCTGGTCCCCTCCCCAACTCTGCTCTGCTCCTGA
I hope this is helpful. If you have any further questions, please
reply to
gen...@soe.ucsc.edu. All messages sent to that address are
archived on a publicly-accessible Google Groups forum. If your
question includes sensitive data, you may send it instead to
genom...@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group