Compiling custom bigwig files for Phylop conservation score estimation

Allan-Hermann Pool

unread,

Sep 15, 2022, 6:44:49 PM9/15/22

to gen...@soe.ucsc.edu

Hi,

My name is Allan-Hermann Pool (assistant professor at University of Texas Southwestern Medical Center) and I am emailing to ask whether you may have instructions on how to compile a bigwig alignment file with custom selected species for phyloP conservation score calculations. I would like to use the latter to calculate enhancer conservations with bigWigAverageOverBed function. You have very helpfully made available several phylop bigwig multispecies comparisons (https://hgdownload.cse.ucsc.edu/goldenpath/hg38/phyloP4way/) but I was not able to locate detailed instructions or code as to how these were generated. The closest overview on the track data I found here: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons100way but this is a very general overview. Would you know if detailed instructions or a tutorial on how to do it for any species combination of choice is available anywhere for this purpose?

Thank you in advance and all the best,

Allan-Hermann Pool, Ph.D.

Assistant Professor

Department of Neuroscience

University of Texas Southwestern Medical Center

5323 Harry Hines Blvd., NB4.114

Dallas, TX 75390-9111

allan-her...@utsouthwestern.edu

+1-214-648-7753

www.thepoollab.org

UT Southwestern

Medical Center

The future of medicine, today.

Hiram Clawson

unread,

Sep 16, 2022, 2:37:25 AM9/16/22

to Allan-Hermann Pool, gen...@soe.ucsc.edu

Good Evening Dr. Pool:

The procedure to prepare for constructing phyloP predictions is mostly
one of bookkeeping and keeping track of where files are and what is
being paired with what. Consistent naming schemes and directory hierarchy
layouts of query and target genome assemblies is critical to the
management of these procedures. There are a lot of moving parts.

And outline of the procedure is as follows:

1. identify query species genomes to align to target genome
2. all genomes, both query and target, must be adequately repeat
masked to avoid blow ups during pair wise alignments
3. Run pair-wise lastz alignments of all query genome assemblies to the
target genome assembly
4. Collect together the resulting maf files to prepare for the multiz
operation. You will need to have at least a binary taxonomic
phylogenic tree to guide the multiz operation. NCBI taxonomy
is usually good enough for this purpose even if it isn't perfect.
5. The resulting maf file outputs from multiz are the source to
the phyloP opearation.

You can see examples of running pair-wise lastz alignments in document
files such as:

https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/makeDb/doc/macFas5/lastzRuns.txt

And typical multiz/phastCons/phyloP procedures in:

https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/makeDb/doc/hg38/multiz30way.txt

There are many more such documented examples in the source tree. The most recent documented
examples are usually the best since the procedures improve over time.

If you use genome assemblies from the GenArk collection:

https://hgdownload.soe.ucsc.edu/hubs/

or from the assemblies we host directly in the genome browser:

https://hgdownload.soe.ucsc.edu/downloads.html

they will be adequately RepeatMasked, plus their naming schemes will allow
you to display the results on these browsers with a track hub setup from
your results.

Please do not hesitate to request assistance with these procedures.

--Hiram

> www.thepoollab.org<https://www.thepoollab.org/>

Allan-Hermann Pool

unread,

Sep 19, 2022, 12:57:22 PM9/19/22

to Hiram Clawson, gen...@soe.ucsc.edu

Excellent, thank you, Hiram, for the fast and thorough reply! Will give it a try.

Allan

Get Outlook for Android

From: Hiram Clawson <hi...@soe.ucsc.edu>
Sent: Friday, September 16, 2022 1:37:20 AM
To: Allan-Hermann Pool <Allan-Her...@UTSouthwestern.edu>; gen...@soe.ucsc.edu <gen...@soe.ucsc.edu>
Subject: Re: [genome] Compiling custom bigwig files for Phylop conservation score estimation

EXTERNAL MAIL

CAUTION: This email originated from outside UTSW. Please be cautious of links or attachments, and validate the sender's email address before replying.

Reply all

Reply to author

Forward