Hi Swaraj,
Thank you for your question about the danRer7 MAF files. The BLAT
results are describing the position using
positive strand coordinates, while the MAF files are using negative
strand coordinates. You can see this for
yourself by doing some simple calculations.
First you can see that the chr20 size minus BLAT qStart coordinate
is equal to the MAF coordinate plus the
region size of 100:
63025520 - 4883106 = 58142314 + 100
Working from the opposite direction, you can see that subtracting
the negative strand based coordinates from
the MAF file will get you the positive strand based BLAT
coordinates:
63025520 - 58142314 = 4883206 = tEnd = chr20 pos-strand
coordinate of start
63025520 - 58142314 - 100 = 4883106 = tStart = chr20 pos-strand
coordinate of end
Note that for BLAT, the PSL results of non-translated DNA queries,
the strand will either be + or -. If it is -, it
just means that the query was reverse-complemented before searching.
The results will always be in positive
coordinates for the tStart and tEnd, but the meaning of start and
end are reversed.
If you are just interested in the alignments between hg19 and
danRer7, you can just use the pairwise alignment
between the two directly. One of our engineers shared the following
strategy for extracting these from the
chain files themselves:
Obtain the files hg19.2bit, danRer7.2bit and
hg19.danRer7.all.chain.gz from hgdownload:
Generate the hg19 and danRer7 chromosome sizes files:
twoBitInfo
hg19.2bit stdout | sort -k2,2n > hg19.chrom.sizes
twoBitInfo danRer7.2bit stdout | sort -k2,2n >
danRer7.chrom.sizes
Then use the chainToAxt and axtToMaf utilities to convert the
pairwise alignment from
the chain format to the MAF format:
chainToAxt
hg19.danRer7.all.chain.gz hg19.2bit danRer7.2bit stdout \
| axtToMaf stdin hg19.chrom.sizes danRer7.chrom.sizes
hg19.danRer7.maf
The hg19.danRer7.maf file will contain all the alignments of
danRer7 to hg19 in maf format.
Be aware that this hg19.danRer7.maf file will likely be over 10
GB in size. You can then use
the mafsInRegion utility on this hg19.danRer7.maf file to
extract any alignments from danRer7
to hg19 in specific locations.
If you want a more restrictive set of alignments, use the
liftOver chain file hg19ToDanRer7.over.chain.gz
instead of all the alignments. You can find the liftOver chain
file here:
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/hg19ToDanRer7.over.chain.gz
In general, all the different alignment formats can be converted
into each other via a pipeline of:
aTob | cTod |
... | xToy > wasA.nowY
I hope this is helpful. If you have any further questions, please
reply to
gen...@soe.ucsc.edu. All messages
sent to that address are archived on a publicly-accessible Google
Groups forum. If your question includes
sensitive data, you may send it instead to
genom...@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group