Dear colleagues,
I am following the “Whole_genome_alignment_howto” wiki page (http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto) to learn how to generate whole genome alignment. I hope you can offer some explanation helping me figure out some steps that I got confused and stuck.
At step 2, in the last command hgLoadChain ci2 chainCioSav2 all.pre.chain
Is this step necessary for creating alignment, or is it just a command line for visualizing the track?
In this step and a following step, ci2, cioSav2 are needed as input. Where can I get these database? I found ci2’s database link, but I am not sure which file is the database this command refer to? http://hgdownload.soe.ucsc.edu/goldenPath/ci2/database
Plus, where can I find the track file chainCioSav2?
Thanks a lot.
Hello, XiaoJu.
The hgLoadChain command is used to visualize the alignment and it is necessary insofar as wanting to actually view the alignment once you have created it.
Regarding the ci2 and cioSav2 databases, tables and files, some of these exist only on our development server and some of them don’t even exist there. The wiki page was constructed merely as an example of the procedures to follow and assumes that you have your own mirror installed that includes these databases. I assume your interest is not specifically in the ci2 or cioSav2 assemblies, but in following the procedure. Also note the “Example, step1: Alignments with Blastz” section which links to a script, http://genomewiki.ucsc.edu/images/9/93/RunLastzChain_sh.txt, which will run independent of database requirements.
Please contact us again at gen...@soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
---
Steve Heitner
UCSC Genome Bioinformatics Group
--
--
Thank you Luvina and UCSC Genome group,
Your hint helped me to connect the dots. Now I understand why we need install UCSC genome browser for aligning sequences. I actually did install the genome browser mirror with a script before. To share my learning experience to the others who might also be interested in creating alignment, I will keep corresponding this email.
To install ucsc genome browser, there is the script, which is also mentioned in UCSC’s official installation instruction website:
download from Github https://github.com/maximilianh/browserInstall/blob/master/browserInstall.sh
It makes installation much easier, and it also offers command line to download and import database you are referring. The details instruction can be found in the repo's README.
Back to the alignment generation.
In the Whole genome alignment HowTo step by step tutorial (http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto), after chaining steps, there are multiple more steps: Netting and Maffing, and Phastcons. However, in the scripting method (http://genomewiki.ucsc.edu/images/9/93/RunLastzChain_sh.txt) it seems end at the chaining step, resulted with a all.chain.gz
My questions are:
Thank you in advance for your help.
Have a great weekend.
JuChains and nets introduction page (http://genomewiki.ucsc.edu/index.php/Chains_Nets) explains these two terms. "Net is hierarchical collection of chains". If I want to go to multiz step to generate multiple alignment, do I have to do netting, or I can start from all.chain file? The LiftOver files download from UCSC are chain files but not netted, right?
the Whole genome alignment HowTo tutorial actually only shows how to generate pairwise alignment, and then generate .maf format of pairwise alignment, is that right?
With results from pairwise alignment (Liftover), do you have any suggestion for me to learn how to create multiple alignment?
however for two species alignment, how do I define the tree?
--
Hello Ju,
Our engineers reviewed the commands in runLastzChain.sh and it looks like the B=0 is a mistake. Thank you for bringing it to our attention! In a different script, blastz-run-ucsc, we use B=0 to handle each strand separately for lineage specific repeat snipping. We hardly ever do that anymore though, and the same reasoning does not apply to runLastzChain.sh because it does not include a separate run on the reverse strand. Again, thank you for pointing this out. The versions of this script provided on our wiki and in the kent tree have been updated to remove the B=0 setting.
One of our engineers also notes that the other supplied lastz parameters are all subject to decisions based on the pairs involved. There is no rote formula that determines lastz parameters; there are a lot of tuning possibilities. The inclusion of B=0 was definitely a mistake on our part, but you will probably also need to adjust the other parameters for your particular alignment problem.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--