Hi Bremen,
You have the basic flow correct. Some details added below:
> 2. Align both chromosomes with blastz
We now use lastz, an improved version of blastz
(
http://www.bx.psu.edu/miller_lab/, search for lastz) but blastz is
sufficient. The output format of blastz, LAV, must be converted to
either PSL or AXT so that axtChain can read the alignments (we have
lavToAxt and lavToPsl programs; PSL is more compact). lastz can
produce axt directly with the --output=axt option.
> 3. Chain using axtChain (QUESTION: I see this program takes two directories
> as arguments. If I wish only to compare two chromosomes, would these
> directories have only one file each?)
Yes, that would work. However, you can also use the -faQ and -faT
options, and give fasta files instead of directories. To see a
description of all axtChain options, run "axtChain" with no arguments.
Here is an example usage of -faQ and -faT:
axtChain -faQ -faT in.axt oneChrom.fa otherChrom.fa chroms.chain
Another alternative is to convert your fasta files into the compact
format 2bit like this:
faToTwoBit oneChrom.fa oneChrom.2bit
-- then you can give axtChain [and lastz but not blastz] the .2bit file
instead of the directory, and don't have to pass -faQ / -faT.
> 4. Get size of each chromosome for chain filtering using faSize
Yes. Note that the sequence name and size must appear in a
tab-separated file, so use the -detailed flag like this:
faSize -detailed oneChrom.fa > oneChrom.sizes
> 5. Sort and filter chains
> a. use chainMergeSort using chain from step 3
> b. prenet with chainPreNet using chain from 5.a., size of target
> chromosome gotten from step 4, and size of query chromosome from
> step 4 as arguments
> 6. Netting?
Yes. We pipe the output of chainPreNet to chainNet (and pipe that to
netSyntenic) like this:
chainPreNet chroms.chain oneChrom.sizes otherChrom.sizes stdout \
| chainNet stdin -minSpace=1 oneChrom.sizes otherChrom.sizes stdout /dev/null \
| netSyntenic stdin
chroms.net
> 7. Convert to .maf and use phyloFit
Yes. For historical reasons we use netToAxt | axtToMaf, we don't have
a netToMaf. You can see an example of how phyloFit was run for the
D. melanogaster Conservation track in our source tree
(kent/src/hg/makeDb/doc/dm2.txt, search for "PHASTCONS 15WAY" and then
search for phyloFit).
If you have only two species, there is not much phylogenetic
information for phyloFit, but I suppose it could still make a
substitution rate model. If you have more than two species, you will
also need multiz from
http://www.bx.psu.edu/miller_lab/ .
> 8. Run phastCons to get .wig output which can be uploaded as a conservation
> track
Yes. Adam Siepel also has a new method of scoring conservation,
phyloP, which we offer in addition to phastCons scores on newer
Conservation tracks. I have no idea whether one would be more
appropriate than the other when working with only two species.
> Since I am only comparing two chromosomes at a time, I think only a single
> chain is generated which doesn't have to be netted. Am I missing anything?
Netting is still necessary, in order to get single-coverage alignments
on which conservation scores are computed. axtChain usually produces
many chains that cover the same position on the reference
genome/chromosome. The netting process selects the highest-scoring
chain as the top level, and then fills in gaps (unaligned areas) using
the next highest-scoring chain, and then fills in that chain's gaps
using the next highest-scoring chain and so on. It is kind of like
extracting a global alignment from a sea of chained local alignments.
Max Haeussler contributed a very useful genomewiki page about
reconstructing our alignment & conservation pipeline (perhaps you have
seen it already?):
http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto
Hope that helps, and please send more questions to
gen...@soe.ucsc.edu
as you have them,
Angie
----- "Bremen Braun" <
bremen...@gmail.com> wrote:
> _______________________________________________
> Genome maillist -
Gen...@lists.soe.ucsc.edu
>
https://lists.soe.ucsc.edu/mailman/listinfo/genome