Whole genome alignment of dm6 to droSim1

69 views
Skip to first unread message

Aimei Dai

unread,
Nov 19, 2019, 12:18:57 PM11/19/19
to gen...@soe.ucsc.edu
To whom it may concern,

I am a PhD student from China. I want to do the whole genome alignment of Dsim to Dyak since the axt file between those two spcies cannot be download from the UCSC genome browser websites. To make it precise, I do the whole genome alignment of dm6 to droSim1 by using the runLastzChain.sh from http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto#doBlastzChainNet.pl, and the 2bit files were downloaded from UCSC websites. But I got much less aligned regions than yours. The parameters in runLastzChain.sh were as follows:

export chainFar="-minScore=3000 -linearGap=loose"
export lastzFar="C=0 E=30 H=2000 K=2200 L=4000 M=50 O=400 Q=~/whole_genome_alignment/DoBlastzChainNet/data/lastz/HoxD55.q T=2 Y=3400"
export chainParams="$chainFar"
export lastzParams="$lastzFar"
export TNAME=dm6
export QNAME=droSim1

After runLastzChain.sh, I run "chainPreNet ${TNAME}.${QNAME}.all.chain.gz ${TNAME}.chrom.sizes ${QNAME}.chrom.sizes stdout | chainNet -minSpace=1 stdin ${TNAME}.chrom.sizes ${QNAME}.chrom.sizes stdout ./dev/null | netSyntenic stdin noClass.net

netToAxt -verbose=0 noClass.net ${TNAME}.${QNAME}.all.chain.gz ${TNAME}.2bit ${QNAME}.2bit stdout | axtSort stdin stdout | gzip -c > ${TNAME}.${QNAME}.net.axt.gz" to get .net.axt file. Could you please help me to find the problems? I will appreciate it very much.

Best wishes!
Aimei DAI

----
Aimei DAI
State Key Laboratory of Biocontrol
School of Life Science
Sun Yat-sen University
Guangzhou, China

Luis Nassar

unread,
Nov 19, 2019, 8:26:28 PM11/19/19
to Aimei Dai, gen...@soe.ucsc.edu

Hello Aimei,

Thank you for your interest in the Genome Browser and for your question regarding genome alignments.

The protocol you have followed (http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto) is an older method, and we now recommend our new pipeline instead (http://genomewiki.ucsc.edu/index.php/DoBlastzChainNet.pl). The new method DoBlastzChainNet.pl automatically does many of the steps, though it does have other dependencies. That being said, we will also offer advice as best we can for the old method.

Comparing the parameters used by us in our alignments with the ones you supplied yields only one difference. You used a blastz param of M=50, when we used M=0. An M=50 will exclude positions that appear in >50 alignments to some {sequence, strand} from being used as seeds in subsequent searches. You could try to set it to M=0 and see if that provides closer results.

Another possible source of difference could be where you got the underlying genome file. Where both of these genome files from our download server? E.x. http://hgdownload.soe.ucsc.edu/goldenPath/dm6/bigZips/?

Finally, this pipeline has give many users issues in the past. You may also search our mailing list archives to troubleshoot issues that come up, here is an example of a similar question.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

Training videos & resources: http://genome.ucsc.edu/training/index.html
Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining

Our alignment parameters:

##aligner=lastz.v1.03.52 L=4000  Y=3400  H=2000  Q=/cluster/data/blastz/HoxD55.q  K=2200
##matrix=lastz.v1.03.52 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91
##gapPenalties=lastz.v1.03.52 O=400 E=30
##blastzParms=O=400,E=30,K=2200,L=4000,M=0

# D. melanogaster vs. D. simulans
BLASTZ=/cluster/bin/penn/lastz-distrib-1.03.52/bin/lastz
BLASTZ_H=2000
BLASTZ_Y=3400
BLASTZ_L=4000
BLASTZ_K=2200
BLASTZ_Q=/cluster/data/blastz/HoxD55.q

# TARGET - D. melanogaster
SEQ1_DIR=/hive/data/genomes/dm6/dm6.2bit
SEQ1_CHUNK=5000000
SEQ1_LAP=10000
SEQ1_LEN=/hive/data/genomes/dm6/chrom.sizes

# QUERY - D. simulans
SEQ2_DIR=/hive/data/genomes/droSim1/droSim1.2bit
SEQ2_CHUNK=5000000
SEQ2_LAP=10000
SEQ2_LEN=/hive/data/genomes/droSim1/chrom.sizes

BASE=/hive/data/genomes/dm6/bed/lastzDroSim1.2014-08-29
TMPDIR=/dev/shm

chain command parameters:

#!/bin/csh -ef
zcat ../../pslParts/$1*.psl.gz \
| axtChain -psl -verbose=0 -scoreScheme=/cluster/data/blastz/HoxD55.q  -minScore=3000 
-linearGap=loose stdin \
/hive/data/genomes/dm6/dm6.2bit \
/hive/data/genomes/droSim1/droSim1.2bit \
stdout \
| chainAntiRepeat /hive/data/genomes/dm6/dm6.2bit \
/hive/data/genomes/droSim1/droSim1.2bit \
stdin $2

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/MN2PR22MB18372B72561E06C6E233300AA64C0%40MN2PR22MB1837.namprd22.prod.outlook.com.

Aimei Dai

unread,
Nov 20, 2019, 12:20:49 PM11/20/19
to Luis Nassar, gen...@soe.ucsc.edu
Hi Luis,

Thanks for your kindly reply. I have problems in setting up the parasol batch system when running DoBlastzChainNet.pl. So I use the runLastzChain.sh. I re-run runLastzChain.sh using the parameters you told me, and I think I have found where the problems are. I got messages when run jobList: "chr2R:0-5020000 isn't in liftSpec file line 10 of stdin
chr3R:0-5020000 isn't in liftSpec file line 5 of stdin ...", and they are empty in the chain file. How can I solve this problem? 

Thanks again.

Aimei

----
Aimei DAI
State Key Laboratory of Biocontrol
School of Life Science
Sun Yat-sen University
Guangzhou, China

From: Luis Nassar <lrna...@ucsc.edu>
Sent: Wednesday, November 20, 2019 9:25 AM
To: Aimei Dai <daia...@live.com>
Cc: gen...@soe.ucsc.edu <gen...@soe.ucsc.edu>
Subject: Re: [genome] Whole genome alignment of dm6 to droSim1
 

Brian Lee

unread,
Nov 22, 2019, 3:18:24 PM11/22/19
to Aimei Dai, Luis Nassar, gen...@soe.ucsc.edu
Dear Aimei,

Thank you for using the UCSC Genome Browser, we have some questions that might help us figure out what is happening.

1) Where did you get the .2bit files? Were both of these genome files from our download server?
E.x. http://hgdownload.soe.ucsc.edu/goldenPath/dm6/bigZips/?
http://hgdownload.soe.ucsc.edu/goldenPath/droSim1/bigZips/?

2) Do you see warnings or errors when running ./constructLiftFile.pl? If not, can you send the contents of target.lift and query.lift?

3) For your input FASTA, are you completely sure it is ours (e.g. dm6.2bit and droSim1.2bit)?

If it was from some other source, that could explain the problem, and it would be good to know what masking was applied before alignment.  Both dm6 and droSim1 have chr2R (and chr2L) sequences so the "isn't in liftSpec file line 10 of stdin" is evidence that the genome file used differs from ours.

Thank you again for your inquiry and for using the UCSC Genome Browser. If you have any further public questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Aimei Dai

unread,
Nov 25, 2019, 11:06:12 AM11/25/19
to Brian Lee, gen...@soe.ucsc.edu
Dear Brian,

Thank you for your kindly reply. I found the problem. The formats of first fragments (0-5020000) of  chr2R, chr2L, chr3R, chr3L and chrX in the .lift file created by constructLiftFile.pl is different to other fragments. I changed the constructLiftFile.pl and solved this problem, but other chromosomes (such as chr4) arised the same problem. The two lift files were attached. Thank you again. By the way, the 2bit files were downloaded from the sites you provided.

Sincerely,
Aimei

----
Aimei DAI
State Key Laboratory of Biocontrol
School of Life Science
Sun Yat-sen University
Guangzhou, China

From: Brian Lee <bria...@soe.ucsc.edu>
Sent: Saturday, November 23, 2019 4:17 AM
To: Aimei Dai <daia...@live.com>
Cc: Luis Nassar <lrna...@ucsc.edu>; gen...@soe.ucsc.edu <gen...@soe.ucsc.edu>
query.lift
target.lift

Hiram Clawson

unread,
Nov 26, 2019, 7:48:58 PM11/26/19
to Aimei Dai, Brian Lee, gen...@soe.ucsc.edu
Good Afternoon Aimei:

Can you please list the commands you are using to perform
this procedure and what source files you use for inputs.

The script runLastzChain.sh can be used with the '-debug' flag
to allow it to create the shell scripts for each step of the
procedure without actually performing the procedures. Then,
you can work through each script individually to perform
the steps. This would give you a guideline of how the steps
are performed in our procedures.

If you are actually working on dm6 to droSim1, you can pick
up the results of that alignment from our test server:
https://hgdownload-test.gi.ucsc.edu/goldenPath/dm6/vsDroSim1/
(note, this server does have outdated security certificates)

--Hiram
Reply all
Reply to author
Forward
0 new messages