Hello David,
Thank you for your question about converting our .net.axt.gz files to MAF format. We start the alignment process by building a number of individual smaller alignments (chains), and then combining them into one larger single-coverage alignment (the net). This means that the .net.axt.gz files are already single coverage, so the MAF files generated by axtToMaf are also automatically single-coverage without needing an additional step. If you are interested, more information about chains and nets is available on our wiki at http://genomewiki.ucsc.edu/index.php/Chains_Nets, as well as on the description page of many Chain/Net tracks like this one: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=primateChainNet.
One of our engineers suggests the following example of chain/net/MAF construction from the hg19/mm10 alignment might be helpful. The path names can be converted into local path names for chrom.sizes, 2bit and all.chain.gz files.
#!/bin/csh -efx
cd /hive/data/genomes/hg19/bed/lastzMm10.2012-03-07/axtChain
# Make nets ("noClass", i.e. without rmsk/class stats which are added later):
chainPreNet hg19.mm10.all.chain.gz /scratch/data/hg19/chrom.sizes
/scratch/data/mm10/chrom.sizes stdout \
| chainNet stdin -minSpace=1 /scratch/data/hg19/chrom.sizes /scratch/data/mm10/chrom.sizes
stdout /dev/null \
| netSyntenic stdin noClass.net
# Make liftOver chains:
netChainSubset -verbose=0 noClass.net hg19.mm10.all.chain.gz stdout \
| chainStitchId stdin stdout | gzip -c > hg19.mm10.over.chain.gz
# Make axtNet for download: one .axt per hg19 seq.
netSplit noClass.net net
cd ..
mkdir axtNet
foreach f (axtChain/net/*.net)
netToAxt $f axtChain/chain/$f:t:r.chain \
/scratch/data/hg19/nib /scratch/data/mm10/nib stdout \
| axtSort stdin stdout \
| gzip -c > axtNet/$f:t:r.hg19.mm10.net.axt.gz
end
# Make mafNet for multiz: one .maf per hg19 seq.
mkdir mafNet
foreach f (axtNet/*.hg19.mm10.net.axt.gz)
axtToMaf -tPrefix=hg19. -qPrefix=mm10. $f \
/scratch/data/hg19/chrom.sizes /scratch/data/mm10/chrom.sizes \
stdout \
| gzip -c > mafNet/$f:t:r:r:r:r:r.maf.gz
end
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--