Dear UCSC Genome Browser team,
(This is probably a duplicate email, but I cannot find my previous email appearing on the mailing list)
I've been trying to figure out the methods to do an alignment extraction for days, and I hope you can help. Here is my project: I have a list of human genes (total ~ 19,000 genes), with corresponding chromosome ID and gene’s range.
#gene_symbol #chr_id #start #end
MRPL10 chr17 45898638 45910907
OR10V1 chr11 59478389 59483318
PTPN12 chr7 77165352 77271388
…
…
Now, I want to extract the alignment for each of the gene (including introns, 2000bp upstream and 2000bp downstream regions) from hg19 multiz100way maf files (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz100way/maf/), and eventually get the alignment as fasta format, in which each block presents 100 species’ sequences, and the blocks of output alignment should have the same coordinates with their source MAF, even for the ending blocks.
After some research on the mailing list, I found a couple of very useful instructions:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/F_YjGiYMcDY/1Sk_3yxRVxcJ suggesting using a tool named "mafsInRegion"; and another discussion from https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/GJ7iKzJ2e0k/oBdKkalta5cJ mentioned a faster way to do that, which need a sorted .bed input file.
However, I am still have some questions.
Thank you for your help in advance.
Ju