liftover files for reciprocal best alignment.

96 views
Skip to first unread message

Jianhai Chen

unread,
Sep 26, 2017, 11:35:19 AM9/26/17
to gen...@soe.ucsc.edu
Dear  experts in gen...@soe.ucsc.edu,

I am a heavy user of your genomic database. I research about mammal evolution. I have a few questions, hoping that you can help.

Firstly, I wanna confirm from you. In this site (http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/), there are “hg30Toxxx.over.chain.gz” files, which are liftover files.  Using this kind of files I can generate reciprocal best net based on the codes in http://genomewiki.ucsc.edu/index.php/HowTo:_Syntenic_Net_or_Reciprocal_Best, right? My main objective is to get reciprocal best alignment between human and other species, so I wanna confirm whether only the files with names like xxx.over.chain.gz can be used as input files for analyzing.

Secondly, in this site (http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/), I find that there are 44 liftover files for 38 species (hg38Toxxx.over.chain.gz file). But currently, in http://hgdownload.cse.ucsc.edu/goldenPath/hg38/multiz100way/, the multiple alignment includes 100 genomes. How Can I get hg38Toxxx.over.chain.gz for other species except for the 44 files that has already been provided ? My currently strategy is to search files like “hg19Toxxx.over.chain.gz" in (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/), then using “liftover” tools to convert hg19 coordinates into hg38 coordinates. Am I right? Another option is to seek your kindly help to offer hg38ToXXX.over.chain.gz files for other species to me. It will be very great if you can provide the files for other species including “hg38ToTupBel.over.chain.gz, hg38ToDipOrd.over.chain.gz, hg38ToCavPri.over.chain.gz, hg38ToSpeTri.over.chain.gz, hg38ToOchPri.over.chain.gz, hg38ToMyoluc.over.chain.gz, hg38ToPteVam.over.chain.gz, hg38ToEriEur.over.chain.gz, hg38ToSorAra.over.chain.gz, hg38ToLoxAfr.over.chain.gz, hg38ToProCap.over.chain.gz, hg38ToEchTel.over.chain.gz, hg38ToDasNov.over.chain.gz, hg38ToChoHof.over.chain.gz, hg38ToTaeGut.over.chain.gz, hg38ToAnoCar.over.chain.gz, hg38ToTetNig.over.chain.gz, hg38ToFr.over.chain.gz, hg38ToGasAcu.over.chain.gz, hg38ToOryLat.over.chain.gz, hg38ToPetMar.over.chain.gz"


Thanks so much,

All the best!

Jianhai Chen

Evolution and Ecology, University of Chicago


Hiram Clawson

unread,
Sep 26, 2017, 12:02:30 PM9/26/17
to Jianhai Chen, gen...@soe.ucsc.edu
Good Morning:

You are correct, the reciprocalBest is constructed from
the over.chain file. In the case where you can not find
an over.chain file, that can be constructed from the 'all.chain'
file via the commands, for example:

======================================================================================
# Make nets ("noClass", i.e. without rmsk/class stats which are added later):
chainPreNet hg38.manLeu1.all.chain.gz /hive/data/genomes/hg38/chrom.sizes
/hive/data/genomes/manLeu1/chrom.sizes stdout \
| chainNet stdin -minSpace=1 /hive/data/genomes/hg38/chrom.sizes
/hive/data/genomes/manLeu1/chrom.sizes stdout /dev/null \
| netSyntenic stdin noClass.net

# Make liftOver chains:
netChainSubset -verbose=0 noClass.net hg38.manLeu1.all.chain.gz stdout \
| chainStitchId stdin stdout | gzip -c > hg38.manLeu1.over.chain.gz
======================================================================================

However, not all the 'all.chain' files or 'over.chain' files are available
on hgdownload for all of the 100-way alignments. You may find the rest
of the files you need from the hgdownload-test server:
http://hgdownload-test.soe.ucsc.edu/goldenPath/hg38/liftOver/

An example procedure constructing a reciprocal best to hg38 is included below.

You can find some of the reciprocal best files have already been
constructed. Check for these directories at hgdownload:

goldenPath/hg38/vsAptMan1/reciprocalBest goldenPath/hg38/vsBisBis1/reciprocalBest
goldenPath/hg38/vsCanFam3/reciprocalBest goldenPath/hg38/vsChlSab2/reciprocalBest
goldenPath/hg38/vsDanRer10/reciprocalBest goldenPath/hg38/vsEquCab2/reciprocalBest
goldenPath/hg38/vsFelCat8/reciprocalBest goldenPath/hg38/vsGorGor4/reciprocalBest
goldenPath/hg38/vsGorGor5/reciprocalBest goldenPath/hg38/vsHetGla2/reciprocalBest
goldenPath/hg38/vsMacEug2/reciprocalBest goldenPath/hg38/vsMacFas5/reciprocalBest
goldenPath/hg38/vsMicMur2/reciprocalBest goldenPath/hg38/vsMm10/reciprocalBest
goldenPath/hg38/vsMusFur1/reciprocalBest goldenPath/hg38/vsNasLar1/reciprocalBest
goldenPath/hg38/vsOrnAna2/reciprocalBest goldenPath/hg38/vsOryCun2/reciprocalBest
goldenPath/hg38/vsOviAri3/reciprocalBest goldenPath/hg38/vsPanPan1/reciprocalBest
goldenPath/hg38/vsPanTro5/reciprocalBest goldenPath/hg38/vsPapAnu2/reciprocalBest
goldenPath/hg38/vsPonAbe2/reciprocalBest goldenPath/hg38/vsRheMac3/reciprocalBest
goldenPath/hg38/vsRheMac8/reciprocalBest goldenPath/hg38/vsRn6/reciprocalBest
goldenPath/hg38/vsSaiBol1/reciprocalBest goldenPath/hg38/vsTarSyr2/reciprocalBest
goldenPath/hg38/vsTriMan1/reciprocalBest goldenPath/hg38/vsTurTru2/reciprocalBest
goldenPath/hg38/vsVicPac2/reciprocalBest goldenPath/hg38/vsXenTro7/reciprocalBest

And there are many more on the hgdownload-test server:
http://hgdownload-test.soe.ucsc.edu/goldenPath/hg38/vs*/reciprocalBest/

vsAilMel1/reciprocalBest vsAllMis1/reciprocalBest vsAmaVit1/reciprocalBest
vsAnaPla1/reciprocalBest vsAnoCar2/reciprocalBest vsApaSpi1/reciprocalBest
vsAptMan1/reciprocalBest vsAquChr2/reciprocalBest vsAraMac1/reciprocalBest
vsAstMex1/reciprocalBest vsBisBis1/reciprocalBest vsCalJac3/reciprocalBest
vsCamFer1/reciprocalBest vsCanFam3/reciprocalBest vsCapHir1/reciprocalBest
vsCavPor3/reciprocalBest vsCerSim1/reciprocalBest vsCheMyd1/reciprocalBest
vsChiLan1/reciprocalBest vsChlSab2/reciprocalBest vsChrAsi1/reciprocalBest
vsChrPic2/reciprocalBest vsColAng1/reciprocalBest vsColLiv1/reciprocalBest
vsConCri1/reciprocalBest vsCriGri1/reciprocalBest vsCriGriChoV1/reciprocalBest
vsDanRer10/reciprocalBest vsDanRer11/reciprocalBest vsEchTel2/reciprocalBest
vsEleEdw1/reciprocalBest vsEptFus1/reciprocalBest vsEquCab2/reciprocalBest
vsEriEur2/reciprocalBest vsFalChe1/reciprocalBest vsFalPer1/reciprocalBest
vsFelCat8/reciprocalBest vsFicAlb2/reciprocalBest vsFr3/reciprocalBest
vsGadMor1/reciprocalBest vsGalGal4/reciprocalBest vsGalGal5/reciprocalBest
vsGalVar1/reciprocalBest vsGasAcu1/reciprocalBest vsGeoFor1/reciprocalBest
vsGorGor3/reciprocalBest vsGorGor4/reciprocalBest vsGorGor5/reciprocalBest
vsHapBur1/reciprocalBest vsHetGla2/reciprocalBest vsJacJac1/reciprocalBest
vsLatCha1/reciprocalBest vsLepOcu1/reciprocalBest vsLepWed1/reciprocalBest
vsLoxAfr3/reciprocalBest vsMacEug2/reciprocalBest vsMacFas5/reciprocalBest
vsMacNem1/reciprocalBest vsManLeu1/reciprocalBest vsManPen1/reciprocalBest
vsMayZeb1/reciprocalBest vsMelGal1/reciprocalBest vsMelGal5/reciprocalBest
vsMelUnd1/reciprocalBest vsMesAur1/reciprocalBest vsMicMur1/reciprocalBest
vsMicMur2/reciprocalBest vsMicMur3/reciprocalBest vsMicOch1/reciprocalBest
vsMm10/reciprocalBest vsMonDom5/reciprocalBest vsMusFur1/reciprocalBest
vsMyoDav1/reciprocalBest vsMyoLuc2/reciprocalBest vsNasLar1/reciprocalBest
vsNeoBri1/reciprocalBest vsNomLeu3/reciprocalBest vsOchPri3/reciprocalBest
vsOctDeg1/reciprocalBest vsOdoRosDiv1/reciprocalBest vsOrcOrc1/reciprocalBest
vsOreNil2/reciprocalBest vsOreNil3/reciprocalBest vsOrnAna1/reciprocalBest
vsOrnAna2/reciprocalBest vsOryAfe1/reciprocalBest vsOryCun2/reciprocalBest
vsOryLat2/reciprocalBest vsOtoGar3/reciprocalBest vsOviAri3/reciprocalBest
vsPanHod1/reciprocalBest vsPanPan1/reciprocalBest vsPanPan2/reciprocalBest
vsPanTro4/reciprocalBest vsPanTro5/reciprocalBest vsPapAnu2/reciprocalBest
vsPapAnu3/reciprocalBest vsPelSin1/reciprocalBest vsPetMar2/reciprocalBest
vsPonAbe2/reciprocalBest vsPseHum1/reciprocalBest vsPteAle1/reciprocalBest
vsPteVam1/reciprocalBest vsPunNye1/reciprocalBest vsRheMac3/reciprocalBest
vsRheMac8/reciprocalBest vsRhiBie1/reciprocalBest vsRhiRox1/reciprocalBest
vsRn6/reciprocalBest vsSaiBol1/reciprocalBest vsSarHar1/reciprocalBest
vsSorAra2/reciprocalBest vsSpeTri2/reciprocalBest vsSusScr11/reciprocalBest
vsSusScr3/reciprocalBest vsTaeGut2/reciprocalBest vsTakFla1/reciprocalBest
vsTarSyr2/reciprocalBest vsTetNig2/reciprocalBest vsTriMan1/reciprocalBest
vsTupBel1/reciprocalBest vsTupChi1/reciprocalBest vsTurTru2/reciprocalBest
vsVicPac2/reciprocalBest vsXenLae2/reciprocalBest vsXenTro7/reciprocalBest
vsXenTro9/reciprocalBest vsXipMac1/reciprocalBest vsZonAlb1/reciprocalBest

--Hiram

On 9/26/17 7:05 AM, Jianhai Chen wrote:
> Dear experts in gen...@soe.ucsc.edu<mailto:gen...@soe.ucsc.edu>,
============================================================================================

#!/bin/csh -efx
# This script was automatically generated by /cluster/bin/scripts/doRecipBest.pl
# It is to be executed on hgwdev in /hive/data/genomes/hg38/bed/lastzManLeu1.2017-09-25/axtChain .
# It nets in both directions to get reciprocal best chains and nets.
# This script will fail if any of its commands fail.

cd /hive/data/genomes/hg38/bed/lastzManLeu1.2017-09-25/axtChain

# Swap hg38-best chains to be manLeu1-referenced:
chainStitchId hg38.manLeu1.over.chain.gz stdout \
| chainSwap stdin stdout \
| chainSort stdin manLeu1.hg38.tBest.chain

# Net those on manLeu1 to get manLeu1-ref'd reciprocal best net:
chainPreNet manLeu1.hg38.tBest.chain \
/hive/data/genomes/{manLeu1,hg38}/chrom.sizes stdout \
| chainNet -minSpace=1 -minScore=0 \
stdin /hive/data/genomes/{manLeu1,hg38}/chrom.sizes stdout /dev/null \
| netSyntenic stdin stdout \
| gzip -c > manLeu1.hg38.rbest.net.gz

# Extract manLeu1-ref'd reciprocal best chain:
netChainSubset manLeu1.hg38.rbest.net.gz manLeu1.hg38.tBest.chain stdout \
| chainStitchId stdin stdout \
| gzip -c > manLeu1.hg38.rbest.chain.gz

# Swap to get hg38-ref'd reciprocal best chain:
chainSwap manLeu1.hg38.rbest.chain.gz stdout \
| chainSort stdin stdout \
| gzip -c > hg38.manLeu1.rbest.chain.gz

# Net those on hg38 to get hg38-ref'd reciprocal best net:
chainPreNet hg38.manLeu1.rbest.chain.gz \
/hive/data/genomes/{hg38,manLeu1}/chrom.sizes stdout \
| chainNet -minSpace=1 -minScore=0 \
stdin /hive/data/genomes/{hg38,manLeu1}/chrom.sizes stdout /dev/null \
| netSyntenic stdin stdout \
| gzip -c > hg38.manLeu1.rbest.net.gz

# Clean up the one temp file and make md5sum:
rm manLeu1.hg38.tBest.chain
md5sum *.rbest.*.gz > md5sum.rbest.txt

# Create files for testing coverage of *.rbest.*.
netToBed -maxGap=1 manLeu1.hg38.rbest.net.gz manLeu1.hg38.rbest.net.bed
netToBed -maxGap=1 hg38.manLeu1.rbest.net.gz hg38.manLeu1.rbest.net.bed
chainToPsl manLeu1.hg38.rbest.chain.gz \
/hive/data/genomes/{manLeu1,hg38}/chrom.sizes \
/hive/data/genomes/manLeu1/manLeu1.2bit /hive/data/genomes/hg38/hg38.2bit \
manLeu1.hg38.rbest.chain.psl
chainToPsl hg38.manLeu1.rbest.chain.gz \
/hive/data/genomes/{hg38,manLeu1}/chrom.sizes \
/hive/data/genomes/hg38/hg38.2bit /hive/data/genomes/manLeu1/manLeu1.2bit \
hg38.manLeu1.rbest.chain.psl

# Verify that all coverage figures are equal:
set tChCov = `awk '{print $19;}' hg38.manLeu1.rbest.chain.psl | sed -e 's/,/\n/g' | awk 'BEGIN
{N = 0;} {N += $1;} END {printf "%d\n", N;}'`
set qChCov = `awk '{print $19;}' manLeu1.hg38.rbest.chain.psl | sed -e 's/,/\n/g' | awk 'BEGIN
{N = 0;} {N += $1;} END {printf "%d\n", N;}'`
set tNetCov = `awk 'BEGIN {N = 0;} {N += ($3 - $2);} END {printf "%d\n", N;}'
hg38.manLeu1.rbest.net.bed`
set qNetCov = `awk 'BEGIN {N = 0;} {N += ($3 - $2);} END {printf "%d\n", N;}'
manLeu1.hg38.rbest.net.bed`
if ($tChCov != $qChCov) then
echo "Warning: hg38 rbest chain coverage $tChCov != manLeu1 $qChCov"
endif
if ($tNetCov != $qNetCov) then
echo "Warning: hg38 rbest net coverage $tNetCov != manLeu1 $qNetCov"
endif
if ($tChCov != $tNetCov) then
echo "Warning: hg38 rbest chain coverage $tChCov != net cov $tNetCov"
endif

mkdir experiments
mv *.bed *.psl experiments
# Make rbest net axt's download
mkdir ../axtRBestNet
netToAxt hg38.manLeu1.rbest.net.gz hg38.manLeu1.rbest.chain.gz \
/hive/data/genomes/hg38/hg38.2bit /hive/data/genomes/manLeu1/manLeu1.2bit stdout \
| axtSort stdin stdout \
| gzip -c > ../axtRBestNet/hg38.manLeu1.rbest.axt.gz
# Make rbest mafNet for multiz
mkdir ../mafRBestNet
axtToMaf -tPrefix=hg38. -qPrefix=manLeu1. ../axtRBestNet/hg38.manLeu1.rbest.axt.gz \
/hive/data/genomes/{hg38,manLeu1}/chrom.sizes \
stdout \
| gzip -c > ../mafRBestNet/hg38.manLeu1.rbest.maf.gz
cd ../mafRBestNet
md5sum *.maf.gz > md5sum.txt
cd ../axtRBestNet
md5sum *.axt.gz > md5sum.txt

Jianhai Chen

unread,
Sep 26, 2017, 12:30:38 PM9/26/17
to Hiram Clawson, gen...@soe.ucsc.edu
Dear Hiram,

Thanks So much. You really help me a lot.

I found all the files I needed.

All the best,

Jianhai Chen


在 2017年9月26日,上午11:02,Hiram Clawson <hi...@soe.ucsc.edu> 写道:


Hiram Clawson

unread,
Dec 30, 2017, 1:14:21 AM12/30/17
to Jianhai Chen, gen...@soe.ucsc.edu
Good Evening Jianhai:

You can always find out exactly what our procedures are from the
source tree documents in:
http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=tree;f=src/hg/makeDb/doc

in files like .../doc/*/multiz*

for example:

http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob;f=src/hg/makeDb/doc/hg38/multiz30way.txt

Note the section in that documentation with the heading:

#############################################################################
# hgPal downloads (DONE - 2017-11-06 - Hiram)

There is a script there using mafGene to extract the gene sequences from
the gene track on the browser and the given MAF file.

Also please note, it is always best to use the email mail lists for
the browser group so that anyone on staff can answer your queries at
any time: http://genome.ucsc.edu/contacts.html
e.g.: gen...@soe.ucsc.edu (copied above)

--Hiram

On 12/29/17 6:52 PM, Jianhai Chen wrote:
> Dear Hiram,
>
> Thanks so much for your previous help. I have created the whole genome alignment for a lot of bird genomes based on your previous suggestion. But I have another question about alignment, hope you can give some suggestions.
>
> Now I need to extract the coding region alignment from MAF file. For example, I need amino acid alignment and CDS alignment. Is there any tools to achieve that ? I have gene annotation for all species.
>
> In the download section of UCSC browser, I found 100way alignments, which have the similar alignment files for mammals. For example, knownCanonical.exonAA.fa.gz, knownCanonical.exonNuc.fa.gz, knownGene.exonAA.fa.gz, knownGene.exonNuc.fa.gz, refGene.exonAA.fa.gz, refGene.exonNuc.fa.gz are alignment of specific regions and their translation alignment. How to get these kind of files using multiple-species-genome maf file of other species, like a lot of birds I used, which are not in UCSC database ?
>
> All the best,
>
> Jianhai Chen
> Evolution and Ecology, University of Chicago
>
>
> On Sep 26, 2017, at 11:02, Hiram Clawson <hi...@soe.ucsc.edu<mailto:hi...@soe.ucsc.edu>> wrote:
>
> Good Morning:
>
> You are correct, the reciprocalBest is constructed from
> the over.chain file. In the case where you can not find
> an over.chain file, that can be constructed from the 'all.chain'
> file via the commands, for example:
>
> ======================================================================================
> # Make nets ("noClass", i.e. without rmsk/class stats which are added later):
> chainPreNet hg38.manLeu1.all.chain.gz /hive/data/genomes/hg38/chrom.sizes /hive/data/genomes/manLeu1/chrom.sizes stdout \
> | chainNet stdin -minSpace=1 /hive/data/genomes/hg38/chrom.sizes /hive/data/genomes/manLeu1/chrom.sizes stdout /dev/null \
> | netSyntenic stdin noClass.net<http://noclass.net/>
>
> # Make liftOver chains:
> netChainSubset -verbose=0 noClass.net<http://noclass.net/> hg38.manLeu1.all.chain.gz stdout \
> Dear experts in gen...@soe.ucsc.edu<mailto:gen...@soe.ucsc.edu><mailto:gen...@soe.ucsc.edu>,

Jianhai Chen

unread,
Jan 2, 2018, 12:02:04 PM1/2/18
to gen...@soe.ucsc.edu
Dear genome browser staff,

when I run the command mafGene, using the following command:
mafGene -chrom=2L -exons gm.fa 2Lbestsingle.maf gm48.GenePred order.list stdout | gzip -c > 2Lbestsingle.AA.maf

I got the following error notice:

can't find database gm in hg.conf, should have a default named "db"

It seems that I need to produce database for multiple genome fasta files first, right ? How to produce this kind of database ?

All the best,
Jianhai


> On Dec 30, 2017, at 00:16, Jianhai Chen <jianh...@uchicago.edu> wrote:
>
> Thanks so much. I will try your suggestion.
>
> Good night!
>
> All the best,
>
> Jianhai

Jianhai Chen

unread,
Jan 2, 2018, 12:02:04 PM1/2/18
to Hiram Clawson, gen...@soe.ucsc.edu
Thanks so much. I will try your suggestion.

Good night!

All the best,

Jianhai

> On Dec 30, 2017, at 00:14, Hiram Clawson <hi...@soe.ucsc.edu> wrote:
>

Hiram Clawson

unread,
Jan 2, 2018, 1:58:59 PM1/2/18
to Jianhai Chen, gen...@soe.ucsc.edu
Good Morning Jianhai:

You will need a database for your genomes in order to run the mafGene command.
Each database will need to have four tables:

chromInfo
yourGeneTable
extFile
yourMultizNwayTable

The chromInfo table is created from your .2bit file:

twoBitInfo dbName.2bit stdout \
awk '{printf "%s\t%d\t/gbdb/dbName/dbName.2bit\n", $1, $2}' \
> dbName.chromInfo.tab

hgLoadSqlTab dbName chromInfo ~/kent/src/hg/lib/chromInfo.sql \
dbName.chromInfo.tab

The gene table is constructed from the genePred file for the genes:

hgLoadGeneProed -genePredExt dbName yourGeneTable geneFile.gp

The extFile and multiz table are loaded with the hgLoadMaf command:

hgLoadMaf -pathPrefix=/gbdb/dbName/multizNway/maf dbName multizNway

Assuming your maf file(s) is/are in the directory: /gbdb/dbName/multizNway/maf/*.maf

--Hiram

Jianhai Chen

unread,
Jan 2, 2018, 2:18:08 PM1/2/18
to Hiram Clawson, gen...@soe.ucsc.edu
Happy new year, Hiram,

Thanks so much. You are so helpful.

I will try the codes you provided.

All the best,
Jianhai Chen

Reply all
Reply to author
Forward
0 new messages