reciprocal chain

294 views
Skip to first unread message

Denas, Olgert

unread,
Sep 14, 2012, 11:37:11 AM9/14/12
to gen...@soe.ucsc.edu
Hi all,
I am having problems with human-mouse reciprocal best chains, which, if I understand correctly, are single coverage on both species.

I wrote a makefile that follows doRecipBest.pl, but the resulting chains are not single coverage on mouse. Am I doing or have I assumed) something wrong?

many thanks,
olgert

Here is the makefile

TSIZES = ../../originalPeaks/hg19.chrom.sizes
QSIZES = ../../originalPeaks/mm9.chrom.sizes

# start with human-ref'ed chains
INPUT_CHAINS = ../hg19ToMm9.over.chain

all : hg19.mm9.rBest.chain mm9.hg19.rBest.chain

# swap chains to be query-referenced
mm9.hg19.tBest.chain : $(INPUT_CHAINS)
chainStitchId $< stdout \
| chainSwap stdin stdout \
| chainSort stdin $@

# net swapped chains, tareget positions should be now single coverage
mm9.hg19.rBest.net : mm9.hg19.tBest.chain $(QSIZES) $(TSIZES)
chainPreNet $< $(QSIZES) $(TSIZES) stdout \
| chainNet -minSpace=1 -minScore=0 stdin $(QSIZES) $(TSIZES) stdout /dev/null\
| netSyntenic stdin $@

# extract chains from nets (query-referenced), this should have single coverage in mouse
mm9.hg19.rBest.chain : mm9.hg19.rBest.net
netChainSubset $< mm9.hg19.tBest.chain stdout \
| chainStitchId stdin $@

# swap chains to target-referenced
hg19.mm9.rBest.chain : mm9.hg19.rBest.chain
chainSwap $< stdout \
| chainSort stdin $@




________________________________

This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).

Brooke Rhead

unread,
Sep 18, 2012, 10:21:21 PM9/18/12
to Denas, Olgert, gen...@soe.ucsc.edu
Hi Olgert,

One of our engineers created the script below and tested it here for
you. It appears to be working fine. The results are available at:
http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/vsMm10/reciprocalBest/.

If you have further questions, please reply to gen...@soe.ucsc.edu.

#!/bin/csh -efx
# This script was automatically generated by
/cluster/bin/scripts/doRecipBest.pl
# It is to be executed on swarm in
/hive/data/genomes/hg19/bed/lastzMm10.2012-03-07/axtChain .
# It nets in both directions to get reciprocal best chains and nets.
# This script will fail if any of its commands fail.

cd /hive/data/genomes/hg19/bed/lastzMm10.2012-03-07/axtChain

# Swap hg19-best chains to be mm10-referenced:
chainStitchId hg19.mm10.over.chain.gz stdout \
| chainSwap stdin stdout \
| chainSort stdin mm10.hg19.tBest.chain

# Net those on mm10 to get mm10-ref'd reciprocal best net:
chainPreNet mm10.hg19.tBest.chain \
/hive/data/genomes/{mm10,hg19}/chrom.sizes stdout \
| chainNet -minSpace=1 -minScore=0 \
stdin /hive/data/genomes/{mm10,hg19}/chrom.sizes stdout /dev/null \
| netSyntenic stdin stdout \
| gzip -c > mm10.hg19.rbest.net.gz

# Extract mm10-ref'd reciprocal best chain:
netChainSubset mm10.hg19.rbest.net.gz mm10.hg19.tBest.chain stdout \
| chainStitchId stdin stdout \
| gzip -c > mm10.hg19.rbest.chain.gz

# Swap to get hg19-ref'd reciprocal best chain:
chainSwap mm10.hg19.rbest.chain.gz stdout \
| chainSort stdin stdout \
| gzip -c > hg19.mm10.rbest.chain.gz

# Net those on hg19 to get hg19-ref'd reciprocal best net:
chainPreNet hg19.mm10.rbest.chain.gz \
/hive/data/genomes/{hg19,mm10}/chrom.sizes stdout \
| chainNet -minSpace=1 -minScore=0 \
stdin /hive/data/genomes/{hg19,mm10}/chrom.sizes stdout /dev/null \
| netSyntenic stdin stdout \
| gzip -c > hg19.mm10.rbest.net.gz

# Clean up the one temp file and make md5sum:
rm mm10.hg19.tBest.chain
md5sum *.rbest.*.gz > md5sum.rbest.txt

# Create files for testing coverage of *.rbest.*.
netToBed -maxGap=1 mm10.hg19.rbest.net.gz mm10.hg19.rbest.net.bed
netToBed -maxGap=1 hg19.mm10.rbest.net.gz hg19.mm10.rbest.net.bed
chainToPsl mm10.hg19.rbest.chain.gz \
/hive/data/genomes/{mm10,hg19}/chrom.sizes \
/hive/data/genomes/mm10/mm10.2bit /hive/data/genomes/hg19/hg19.2bit \
mm10.hg19.rbest.chain.psl
chainToPsl hg19.mm10.rbest.chain.gz \
/hive/data/genomes/{hg19,mm10}/chrom.sizes \
/hive/data/genomes/hg19/hg19.2bit /hive/data/genomes/mm10/mm10.2bit \
hg19.mm10.rbest.chain.psl

# Verify that all coverage figures are equal:
set tChCov = `awk '{print $19;}' hg19.mm10.rbest.chain.psl | sed -e
's/,/\n/g' | awk 'BEGIN {N = 0;} {N += $1;} END {printf "%d\n", N;}'`
set qChCov = `awk '{print $19;}' mm10.hg19.rbest.chain.psl | sed -e
's/,/\n/g' | awk 'BEGIN {N = 0;} {N += $1;} END {printf "%d\n", N;}'`
set tNetCov = `awk 'BEGIN {N = 0;} {N += ($3 - $2);} END {printf "%d\n",
N;}' hg19.mm10.rbest.net.bed`
set qNetCov = `awk 'BEGIN {N = 0;} {N += ($3 - $2);} END {printf "%d\n",
N;}' mm10.hg19.rbest.net.bed`
if ($tChCov != $qChCov) then
echo "Warning: hg19 rbest chain coverage $tChCov != mm10 $qChCov"
endif
if ($tNetCov != $qNetCov) then
echo "Warning: hg19 rbest net coverage $tNetCov != mm10 $qNetCov"
endif
if ($tChCov != $tNetCov) then
echo "Warning: hg19 rbest chain coverage $tChCov != net cov $tNetCov"
endif

mkdir experiments
mv *.bed *.psl experiments
# Make rbest net axt's download: one .axt per hg19 seq.
netSplit hg19.mm10.rbest.net.gz rBestNet
chainSplit rBestChain hg19.mm10.rbest.chain.gz
cd ..
mkdir axtRBestNet
foreach f (axtChain/rBestNet/*.net)
netToAxt $f axtChain/rBestChain/$f:t:r.chain \
/hive/data/genomes/hg19/hg19.2bit /hive/data/genomes/mm10/mm10.2bit
stdout \
| axtSort stdin stdout \
| gzip -c > axtRBestNet/$f:t:r.hg19.mm10.net.axt.gz
end

# Make rbest mafNet for multiz: one .maf per hg19 seq.
mkdir mafRBestNet
foreach f (axtRBestNet/*.hg19.mm10.net.axt.gz)
axtToMaf -tPrefix=hg19. -qPrefix=mm10. $f \
/hive/data/genomes/{hg19,mm10}/chrom.sizes \
stdout \
| gzip -c > mafRBestNet/$f:t:r:r:r:r:r.maf.gz
end


--
Brooke Rhead
UCSC Genome Bioinformatics Group
Reply all
Reply to author
Forward
0 new messages