retrieve the multiple seq alignment with defined regions.

243 views
Skip to first unread message

wei wu

unread,
Jul 29, 2014, 4:14:52 PM7/29/14
to gen...@soe.ucsc.edu
Dear UCSC staffs:

I have 50000 genomic coordinates from RNA-seq,which most are noncoding.
I want to retrieve these particular regions from multiz46way in MAF
format for structure prediction using RNAz. there is limitation of 1000
defined regions from Table brower download. I am wondering how to use
command line to perform this task.

thank you very much for your help,

Sincerely,

Wei Wu,
Postdoc
University of Calgary

Jonathan Casper

unread,
Aug 1, 2014, 2:45:35 PM8/1/14
to wei wu, gen...@soe.ucsc.edu

Hello Wei,

Thank you for your question about retrieving regions from the 46-way multiple alignment. We recommend that you download the MAF files yourself from the 46-way alignment directory at http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz46way/maf/. You can then use the kent tool "mafsInRegion" to select only the alignments in your regions of interest. The usage message for the mafsInRegion tool is as follows:

mafsInRegion - Extract MAFS in a genomic region
usage:
    mafsInRegion regions.bed out.maf|outDir in.maf(s)
options:
    -outDir - output separate files named by bed name field to outDir
    -keepInitialGaps - keep alignment columns at the beginning and of a block that are gapped in all species

For example, if your 50000 regions are stored in BED format in a file named "rnaseq.bed", and the compressed MAF files downloaded from UCSC (with names like chr17.maf.gz) are all stored in the "fromUCSC/" directory, then you can run the following command to place your aligned regions into the file "myresults.maf".
mafsInRegion rnaseq.bed myresults.maf fromUCSC/*.maf.gz

Please note that this is a lot of data to download from UCSC - over 30GB for the compressed files. We suggest you begin by reading the README file at http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz46way/README. It provides suggestions for which files you may find interesting and how best to download them. You can find the mafsInRegion tool as part of the kent tools, available at http://hgdownload.soe.ucsc.edu/admin/exe/. We provide compiled executables for several computer architectures, and source code if you need to compile it yourself.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group





--


Reply all
Reply to author
Forward
0 new messages