Good Evening Dr. Pool:
The procedure to prepare for constructing phyloP predictions is mostly
one of bookkeeping and keeping track of where files are and what is
being paired with what. Consistent naming schemes and directory hierarchy
layouts of query and target genome assemblies is critical to the
management of these procedures. There are a lot of moving parts.
And outline of the procedure is as follows:
1. identify query species genomes to align to target genome
2. all genomes, both query and target, must be adequately repeat
masked to avoid blow ups during pair wise alignments
3. Run pair-wise lastz alignments of all query genome assemblies to the
target genome assembly
4. Collect together the resulting maf files to prepare for the multiz
operation. You will need to have at least a binary taxonomic
phylogenic tree to guide the multiz operation. NCBI taxonomy
is usually good enough for this purpose even if it isn't perfect.
5. The resulting maf file outputs from multiz are the source to
the phyloP opearation.
You can see examples of running pair-wise lastz alignments in document
files such as:
https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/makeDb/doc/macFas5/lastzRuns.txt
And typical multiz/phastCons/phyloP procedures in:
https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/makeDb/doc/hg38/multiz30way.txt
There are many more such documented examples in the source tree. The most recent documented
examples are usually the best since the procedures improve over time.
If you use genome assemblies from the GenArk collection:
https://hgdownload.soe.ucsc.edu/hubs/
or from the assemblies we host directly in the genome browser:
https://hgdownload.soe.ucsc.edu/downloads.html
they will be adequately RepeatMasked, plus their naming schemes will allow
you to display the results on these browsers with a track hub setup from
your results.
Please do not hesitate to request assistance with these procedures.
--Hiram
>
www.thepoollab.org<
https://www.thepoollab.org/>