Whole Genome Alignment of Worms

31 views
Skip to first unread message

邵毅

unread,
Sep 21, 2016, 10:37:59 AM9/21/16
to gen...@soe.ucsc.edu
Dear UCSC group members: Hi

I really admires your work and use it a lot. Recently, I am working on the worm genomes. I need the whole genome alignment between worm genomes. I am interested in current assemblies of worm genomes, such as: ce11 for C. elegans, caePb3 for C. brenneri, cb3/cb4 for C. briggsae, caeRem3/caeRem4 for C. remanei, caeJap4 for C. japonia, caeSp111 for C. tropicalis, caeAng2 for C. angaria and Caenorhabditis_sp_5-JU800-1.0 for Caenorhabditis sp5 ju800. The problems are:

1) Most of the genome data are missed in the FTP goldenPath, let alone the chain file for alignment. Would you provide me a link or something so I could download the genome (better softmasked) and whole genome alignments? I saw there is a multiz26way for ce11 to other genomes mentioned above. So there definately should be alignment of ce11 to XXX worms, right? Can I have access for the alignments files or chain files or axt files, please?
2) I am not sure if cb4 for C. briggsae is correspond to Ensembl Metazoa's C. briggsae assembly CB4 (http://metazoa.ensembl.org/Caenorhabditis_briggsae/Info/Index) (http://www.ebi.ac.uk/ena/data/view/GCA_000004555.3)  if not, which assembly is, please? Cause I based the analysis on Ensembl's annotation.
3) Similar question like (2), but for C. remanei, for I do not know which assembly CaeRem4 is. The webpage reads CaeRem4 is WS225. And it should be an updated version of CaeRem3. But CaeRem3 is already WUGSC 15.0.1, as is with the Ensembl metazoa (http://metazoa.ensembl.org/Caenorhabditis_remanei/Info/Index) whose annotation correspond to WS250, hence the assembly should be current as well. So I am confused here, which assembly to correspond to Ensembl metazoa. Is CaeRem3 and CaeRem4 the same assembly?
4) Is there any suggestions for the parameters of whole genome alignment using LASTZ program. I would like to do that for C. briggsae to all the other worms and C. remanei to all the other worms. I notice that there are parameters for Cb3 to some worms and caeRem2 to some worms. Would it be safe to use the same parameters even with the assembly updated? Also, how about the missing pairs, for example: caeRem vs caeJap/caeSp111/caeAng2/Caenorhabditis_sp_5-JU800-1.0 and cb vs caeJap4/caeSp111/caeAng2/Caenorhabditis_sp_5-JU800-1.0?

I know that's a lot to ask. But I would really appreciate your help.

Best Regards,
--
Shao Yi, Graduate Student
Computational & Evolutionary Genomics Group
Institute of Zoology, CAS
 


 

Michael Paulini

unread,
Sep 21, 2016, 10:57:39 AM9/21/16
to 邵毅, gen...@soe.ucsc.edu
Hi Shao Yi,

we got the output of a Progressive Cactus run on all WormBase genomes in HAL format here:

you can use the usual cactus/hal tools to extract the data of your choice.

the only caveat is that the chromosome random was split in the last release, and I have to come up with a new HAL file for it.

Michael
--------------------------
WormBase



 

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Matthew Speir

unread,
Sep 22, 2016, 6:03:01 PM9/22/16
to 邵毅, gen...@soe.ucsc.edu

Hi Shao Yi,

Thank you for your questions about the worm multiple alignment in the UCSC Genome Browser.

As Michael has pointed out, WormBase does provide alignments for these assemblies on their FTP site.

For your first and fourth questions, you can find the downloads for these pairwise alignments and descriptions of the lastz parameters used on our test download server: http://hgdownload-test.soe.ucsc.edu/goldenPath/ce11/.

Please note that these downloads come with the following disclaimer:

Data and tools on this site are under development, have not been reviewed for quality, and are subject to change at any time. 

For your second and third questions, you can typically find information about the releases on the assembly description pages. You can find that information on the test server description pages for these assemblies:

    - caeRem4: http://genome-test.soe.ucsc.edu/cgi-bin/hgGateway?db=caeRem4
    - cb4: http://genome-test.soe.ucsc.edu/cgi-bin/hgGateway?db=cb4

Since these assemblies are still only staged on our test server, some of these details pages may be incomplete. Additionally, information about the construction of the genome browser for these assemblies can be found in the Genome Browser source tree here: http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=tree;f=src/hg/makeDb/doc, in the directory or file that corresponds to the UCSC assembly name (e.g. cb4). I did some investigating for these two assemblies in our database and came up with the following:

For our cb4:
    Our cb4 sequence was downloaded as part of the WormBase WS225 release. You can find more information on their WS225 release here: http://www.wormbase.org/about/wormbase_release_WS225#10--10. You can download the files from this release here: ftp://ftp.wormbase.org/pub/wormbase/releases/WS225/.
For our caeRem4:
    It appears that the primary difference between caeRem3 and caeRem4 in the UCSC Genome Browser is that the caeRem3 assembly contains all of the contigs merged into a single "chrUn" with artificial 1000 bp gaps inserted between the contigs where as the caeRem4 assembly has individual contigs. Both are based on the WUGSC 15.0.1

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Michael Paulini

unread,
Sep 23, 2016, 11:47:08 AM9/23/16
to Matthew Speir, 邵毅, gen...@soe.ucsc.edu
On 22 September 2016 at 23:02, Matthew Speir <msp...@soe.ucsc.edu> wrote:

For our cb4:
    Our cb4 sequence was downloaded as part of the WormBase WS225 release. You can find more information on their WS225 release here: http://www.wormbase.org/about/wormbase_release_WS225#10--10. You can download the files from this release here: ftp://ftp.wormbase.org/pub/wormbase/releases/WS225/.
We did actually split the chrUn of the cb4 assembly into separate sequence in out WS254 as people used the artificial "chromosome" in ways it was never intended to (synteny / analysis results spanning the gaps).
For our caeRem4:
    It appears that the primary difference between caeRem3 and caeRem4 in the UCSC Genome Browser is that the caeRem3 assembly contains all of the contigs merged into a single "chrUn" with artificial 1000 bp gaps inserted between the contigs where as the caeRem4 assembly has individual contigs. Both are based on the WUGSC 15.0.1
One of the side effects of changing the sequence names is that is of course that none of the WormBase tracks would work anymore on the caeRem4 version and people need to translate identifiers and coordinates between INSDC/WormBase and UCSC.

As long as we document the differences and what works with what, I don't think it causes too many problems for experienced users, as you have the same for other species a.e. human, but ideally we would use the same sequence and assembly identifiers.


Michael
Reply all
Reply to author
Forward
0 new messages