extracting information from multiz alignment of 100 vertebrates track

467 views

Skip to first unread message

VG

unread,

Mar 16, 2016, 2:47:35 PM3/16/16

to gen...@soe.ucsc.edu

Hi Everyone,

I am trying to look for a gene sequence which is present in humans(hg38) in relation with other primates namely chimp, gorilla etc. I can see the conservation tracks.

I need to find the coordinates in chimp for my gene of interest.

So let's say I have RPL3 as my gene of interest(in humans), I want to get all the similar sequences from chimp genome to this RPL3 gene in human along with the corrdinates in chimp genome and it's sequence(which might include gaps or snps).

How can I do this.

Regards

Varun

Matthew Speir

unread,

Mar 17, 2016, 12:33:45 PM3/17/16

to VG, gen...@soe.ucsc.edu

Hi Varun,

Thank you for your question about obtaining sequence and coordinates for a gene in multiple species from the 100-way vertebrate Multiz alignment track.

You can obtain this information by using the "CDS Fasta" option on the Table Browser:

1. Navigate to the Table Browser, http://genome.ucsc.edu/cgi-bin/hgTables.

2. Make the following selections:
    clade: Mammal
    genome: Human
    assembly: Dec. 2013 (GRCh38/hg38)
    group: Genes and Gene Predictions
    track: GENCODE v22
    region: genome
    Output format: CDS FASTA alignment from multiple alignment

3. Next to "identifiers (names/accessions)", click "paste list" or "upload list".

4. Paste in or upload your list of identifiers.

5. Click "submit".

6. Click "get output".

7. In the drop-down next to "MAF table", select "multiz100way".

8. Under "Formatting options", check the box next to "Show nucleotides".

9. Check the boxes next to those species you are interested in.

10. Click "get output".

Your output should look something like this:

>uc062ekq.1_hg38 543 chr22:39316844-39319597-
ATGTCTCACAGAAAGTTCTCCGCTCCCAGACATGGGTCCCTCGGCTTCCTGCCTCGGAAGCGCAGCAGCAGGCATCGTGGGAAGGTGAAGAGCTTCCCTAAGGATGACCCGTCCAAGCCGGTCCACCTCACAGCCTTCCTGGGATACAAGGCTGGCATGACTCACATCGTGCGGGAAGTCGACAGGCCGGGATCCAAGGTGAACAAGAAGGAGGTGGTGGAGGCTGTGACCATTGTAGAGACACCACCCATGGTGGTTGTGGGCATTGTGGGCTACGTGGAAACCCCTCGAGGCCTCCGGACCTTCAAGACTGTCTTTGCTGAGCACATCAGTGATGAATGCAAGAGGCGTTTCTATAAGAATTGctccgctcggctctgcccgatgagctccatccaggctccgctTGCTGGTGGAAAAGGCTCCTTAGAAGCCGGCAATGAGCTCCATCCCCACGCGGTGCCAGTGTGCCTTCCGCTCACCCCTCGGAGGGGTGATGAAGGCCTGCACCTGGTCCCCTCCCCAACTCTGCTCTGCTCCTGA
>uc062ekq.1_panTro4 543 chr22:37994613-38003827-
ATGTCTCACAGAAAGTTCT-CGCTCCCAGACAT-GGTCCCTCGGCTTCCTGCCTCGGAAGCGCAGCAGCAGGCATCGTGGGAAGGTGAAGAGCTTCCCTAAGGATGACCCGTCCAAGCCGGTCCACCTCACAGCCTTCCTGGGATACAAGGCTGGCATGACTCACATCGTGCGGGAAGTCGACAGGCCAGGATCCAAGGTGAACAAGAAGGAGGTGGTGGAGGCTGTGACCATTGTAGAGACACCACCCATGGTGGTTGTGGGCATTGTGGGCTACGTGGAAACCCCTCGAGGCCTCCGGACCTTCAAGACTGTCTTTGCTGAGCACATCAGTGATGAATGCAAGAGGCGTTTCTATAAGAATTGCTCCGCTCGGCTCTGCCCGATGAGCTCCATCCAGGCTCCGCTTGCCGGTGGAAAAGGCTCCTTAGAAGCCGGCAATGAGCTCCATCCCCACACGGTGCCAGTGTGCCTTCCGCTCACCCCTCGGAGGGGTGATGAAGGCCTGCACCTGGTCCCCTCCCCAACTCTGCTCTGCTCCTGA

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group