Hi All,
I’m attempting to carry out pairwise alignment of a scaffold assembly (~55,000 sequences) against multiple species genomes (some of these are also scaffold assemblies) using lastZ. I have attempted to recreate the UCSC pipeline from the information on the wiki: (http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto) and. I have adapted the RunLastzChain_sh code to run on our cluster, however the number of individual runs required to align this genome to mm10 (split into 20Mb non-overlapping sequences), for example, is prohibitively large (~ 10, million).
The wiki page above contains the following information on how UCSC deal with scaffold genome assemblies: “However, if both genomes are spread across many more scaffolds you have to play around with the sequences; otherwise it will be millions of blastz-runs. At UCSC they first join the smaller scaffolds into one big chromsome and then run blastz on the "ChrUn"-virtual-chromosome. (this will create some false nets going from one scaffold to another”.
I wondering would any of the UCSC folks be able to elaborate on how they carry out whole genome alignments when a genome is spread over a large number of scaffolds? For example, dasNov3 v mm10?
Any advice at all would be greatly appreciated!
Thanks a million,
Colin
Email Disclaimer
"This e-mail and any files transmitted with it are confidential and are intended solely for use by the addressee. Any unauthorised dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender and delete the message. Any views or opinions presented in this e-mail may solely be the views of the author and cannot be relied upon as being those of Dublin City University. E-mail communications such as this cannot be guaranteed to be virus-free, timely, secure or error-free and Dublin City University does not accept liability for any such matters or their consequences. Please consider the environment before printing this e-mail."
Séanadh Ríomhphoist
"Tá an ríomhphost seo agus aon chomhad a sheoltar leis faoi rún agus is lena úsáid ag an seolaí agus sin amháin é. Tá cosc iomlán ar scaipeadh, dháileadh nó chóipeáil neamhúdaraithe ar an teachtaireacht seo agus ar aon cheangaltán atá ag dul leis. Má tá an ríomhphost seo faighte agat trí dhearmad cuir sin in iúl le do thoil don seoltóir agus scrios an teachtaireacht. D’fhéadfadh sé gurb iad tuairimí an údair agus sin amháin atá in aon tuairimí no dearcthaí atá curtha i láthair sa ríomhphost seo agus níor chóir glacadh leo mar thuairimí nó dhearcthaí Ollscoil Chathair Bhaile Átha Cliath. Ní ghlactar leis go bhfuil cumarsáid ríomhphoist den sórt seo saor ó víreas, in am, slán, nó saor ó earráid agus ní ghlacann Ollscoil Chathair Bhaile Átha Cliath le dliteanas in aon chás den sórt sin ná as aon iarmhairt a d’eascródh astu. Cuimhnigh ar an timpeallacht le do thoil sula gcuireann tú an ríomhphost seo i gcló."
Hello Colin,
I'm sorry to hear that you're still having script issues. We are confused about this problem as well; are you able to provide us with some example files that cause the problem? You can send them to me privately to avoid sharing them with the mailing list if you prefer.
If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--