getting sequence from multiz align track

79 views
Skip to first unread message

VG

unread,
Mar 20, 2017, 4:31:16 PM3/20/17
to UCSC Genome Browser Discussion List
Hi Everyone,
I am trying to look at a particular coordinate in hg19. I also want to see how it aligns with other species say zebrafish, chimp etc. Can you tell me how to extract the sequences for that particular coordinate I am interested in for other species like zebra fish chimp etc keeping hg19 coordinate as reference.

Thanks for the help.

Regards
Varun

Matthew Speir

unread,
Mar 23, 2017, 3:58:24 PM3/23/17
to VG, UCSC Genome Browser Discussion List
Hi Varun,

Thank you for your question about getting sequences from a Multiz
alignment in the UCSC Genome Browser.

You can get this information using the Table Browser. Here the steps you
would use to get this information from the 100-way Multiz alignment on hg38:

1. Navigate to the Table Browser, https://genome.ucsc.edu/cgi-bin/hgTables.
2. Make the following selections:
clade: Mammal
genome: Human
assembly: Dec. 2013 (GRCh38/hg38)
group: Comparative Genomics
track: Conservation
table: Multiz Align (multiz100way)
region: position or define a set of regions by clicking "define
regions"
output format: MAF - multiple alignment format
output file: enter a file name or leave blank to see results in the
browser

3. Click "get output".

Note that the results will include the alignment of all species in the
100-way alignment. If you output your results to a file, you can then
filter the species in the results using UNIX command line utilities such
as "grep".

I hope this is helpful. If you have any further questions, please reply
to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
> --
>

VG

unread,
Apr 28, 2017, 3:55:21 PM4/28/17
to Matthew Speir, UCSC Genome Browser Discussion List
Hi Matthew,
I did this for hg19 assembly for a particular region. When I look at the output, I see my sequence of around 30 bp split in 3 groups with first showing 3 bases, next group showing 15 bases and last group the rest of the aligned bases.

Some thing like this:

a score=377844.000000
s hg19.chr10                   79797184 3 + 135534747 TCT---
s panTro4.chr10                76656843 3 + 133524379 TCT---
i panTro4.chr10                C 0 C 0
s gorGor3.chr10                90764347 3 + 147764049 TCT---
i gorGor3.chr10                C 0 C 0
s ponAbe2.chr10                76323633 3 - 133410057 TCT---
i ponAbe2.chr10                C 0 C 0
s nomLeu3.chr18                35267871 3 + 104879549 TCT---

a score=4374105.000000
s hg19.chr10                   79797187 18 + 135534747 GGATTA-----CAGAAGGTAACA
s panTro4.chr10                76656846 18 + 133524379 GGATTA-----CAGAAGGTAACA
i panTro4.chr10                C 0 C 0
s gorGor3.chr10                90764350 18 + 147764049 GGATTA-----CAGAAGGTAACA
i gorGor3.chr10                C 0 C 0
s ponAbe2.chr10                76323636 18 - 133410057 GGATTA-----CAGAAGGTAACA
i ponAbe2.chr10                C 0 C 0
s nomLeu3.chr18                35267874 18 + 104879549 GGATTA-----CAGAAGGTAACA
i nomLeu3.chr18                C 0 C 0


a score=50743.000000
s hg19.chr10                   79797205 5 + 135534747 TG-T-TT
s panTro4.chr10                76656864 5 + 133524379 TG-T-TT
i panTro4.chr10                C 0 C 0
s gorGor3.chr10                90764368 5 + 147764049 TG-T-TT
i gorGor3.chr10                C 0 C 0
s ponAbe2.chr10                76323654 5 - 133410057 TG-T-TT
i ponAbe2.chr10                C 0 C 0
s nomLeu3.chr18                35267892 5 + 104879549 TG-T-TT
i nomLeu3.chr18                C 0 C 0

My question is why not the alignment is on one line?? why is it broken into bases of 3, 18 and 5

Regards
Varun

Jairo Navarro Gonzalez

unread,
May 1, 2017, 3:24:34 PM5/1/17
to VG, Matthew Speir, UCSC Genome Browser Discussion List

Dear Varun,

Thank you for using the UCSC Genome Browser and your question about the MAF file formatting.

The reason you are seeing the MAF file broken into bases of 3, 18 and 5 is because of how the data is stored in the MAF file itself. The MAF blocks are broken whenever any one of the sequences changes orientation, chromosome, or position. If there are a lot of sequences in the alignment, the blocks can get really small. To learn more about MAF data files, please read our MAF format documentation.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAN7A_QxjU%2Bqz1HdDUvkPmHkHW29Q%3DBRi7%2BZqFok5BKaN8UwFjw%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

VG

unread,
May 1, 2017, 4:11:39 PM5/1/17
to UCSC Genome Browser Discussion List

---------- Forwarded message ----------
From: VG <gupta5...@gmail.com>
Date: Mon, May 1, 2017 at 3:40 PM
Subject: Re: [genome] getting sequence from multiz align track
To: Jairo Navarro Gonzalez <jnav...@ucsc.edu>


Hi,
Is there a way to report only alignment of few species along with hg19 coordinates. Let's say mouse and zebrafish only

Thanks

Regards
Varun

Jairo Navarro Gonzalez

unread,
May 3, 2017, 11:52:33 AM5/3/17
to VG, UCSC Genome Browser Discussion List

Hello Varun,

If you output your results to a file, you can then filter the species in the results using UNIX command line utilities such as grep. For example, the following command should filter for only mouse and zebrafish:

grep "mm10\|danRer7" your_output_file

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genomics Institute

Reply all
Reply to author
Forward
0 new messages