Mouse T2T genomes now in INSDC

56 views
Skip to first unread message

Thomas Keane

unread,
Jun 24, 2024, 12:30:43 PM (8 days ago) Jun 24
to thib...@ncbi.nlm.nih.gov, ma...@ucsc.edu, Fergal Martin, gen...@soe.ucsc.edu, hcla...@ucsc.edu, lea...@ebi.ac.uk, bai...@ebi.ac.uk
Hi UCSC/Ensembl/RefSeq folks,

I'm happy to let you know that the two mouse T2T genomes for the C57BL/6J and CAST/EiJ strains are now available from INSDC via:

C57BL/6J GCA_964188535

CAST/EiJ GCA_964188545

Some notes:
- these assemblies came from a F1 cross mouse (C57BL6/J x CAST/EiJ), where the reads were separated by strain during the assembly process.

- C57BL/6J is the paternal hap so lacks an X chromosome and we didn't assemble the Y chr (likely it is in pieces in the unassembled scaffolds).

- CAST/EiJ is the maternal hap so has an X chromosome and no Y chromosome

- the C57BL/6J genome is substantially more complete than GRCm39, all of the chromosomes are T2T and close every gap in GRCm39 (except two!). If you are interested, here is a talk from TAGC conference which gives a good overview of the genomes.

- we did not include a MT chromosome (they are already available for both strains, I can flag the accessions if useful)

- we did do a gene build with a combination of LiftOff and Breaker 3, this was good enough for us for paper writing, but likely not comprehensive enough for the Genome Browsers. There is plenty of strain specific RNA-Seq for both strains, happy to point you to this if helpful.

- we are in the final stages of preparing the manuscript, I would hope to submit it by the end of the summer/August. It would be really excellent if the genomes were available or close to being available for when the paper is published.

Please ask if you have more questions, I really hope you can find the time to look at loading these but completely understand there's a long queue of new and interesting genomes!

Rgds,
Thomas

Omics Section Head
Research and Services Team Leader
EMBL-EBI

Hiram Clawson

unread,
Jun 24, 2024, 1:04:41 PM (8 days ago) Jun 24
to Thomas Keane, thib...@ncbi.nlm.nih.gov, ma...@ucsc.edu, Fergal Martin, gen...@soe.ucsc.edu, hcla...@ucsc.edu, lea...@ebi.ac.uk, bai...@ebi.ac.uk
Good Morning Thomas:

Thanks for the notice.

Do you know if the GRC will be considering these assemblies as
their standard reference ?

https://www.ncbi.nlm.nih.gov/grc/mouse

--Hiram

On 6/24/24 8:41 AM, Thomas Keane wrote:
> Hi UCSC/Ensembl/RefSeq folks,
>
> I'm happy to let you know that the two mouse T2T genomes for the C57BL/6J and
> CAST/EiJ strains are now available from INSDC via:
>
> C57BL/6J GCA_964188535 <https://www.ebi.ac.uk/ena/browser/view/GCA_964188535>
>
> CAST/EiJ GCA_964188545 <https://www.ebi.ac.uk/ena/browser/view/GCA_964188545>
>
> Some notes:
> - these assemblies came from a F1 cross mouse (C57BL6/J x CAST/EiJ), where the
> reads were separated by strain during the assembly process.
>
> - C57BL/6J is the paternal hap so lacks an X chromosome and we didn't assemble
> the Y chr (likely it is in pieces in the unassembled scaffolds).
>
> - CAST/EiJ is the maternal hap so has an X chromosome and no Y chromosome
>
> - the C57BL/6J genome is substantially more complete than GRCm39, all of the
> chromosomes are T2T and close every gap in GRCm39 (except two!). If you are
> interested, here is a talk
> <https://docs.google.com/presentation/d/1mdJ3NWJ3h-9OFO_u2bkEfAbR359LuVuFv-6n6dzOFtY/edit#slide=id.g2b89f17b62e_0_213> from TAGC conference which gives a good overview of the genomes.

Thomas Keane

unread,
Jun 25, 2024, 12:51:34 PM (7 days ago) Jun 25
to Hiram Clawson, thib...@ncbi.nlm.nih.gov, ma...@ucsc.edu, Fergal Martin, gen...@soe.ucsc.edu, hcla...@ucsc.edu, lea...@ebi.ac.uk, bai...@ebi.ac.uk
Hi Hiram,

The GRC certainly know about the project, I presented to the Sanger GRC folk a few times last year. They are next on my list to recontact and let them know that these T2T genomes are now available. Will let you know if they have any thoughts/plans for future adoption/maintenance of these genomes.

Rgds, Thomas

Fergal Martin

unread,
Jun 25, 2024, 12:53:33 PM (7 days ago) Jun 25
to Thomas Keane, Hiram Clawson, thib...@ncbi.nlm.nih.gov, ma...@ucsc.edu, gen...@soe.ucsc.edu, hcla...@ucsc.edu, lea...@ebi.ac.uk, bai...@ebi.ac.uk
I can weight in on this in that I am the EBI GRC lead. There are currently no plans to change the mouse reference, for much the same reasons that we have not changed our selection of the human reference:
- The current reference is high quality (even if it is not as good)
- There are a lot of resources/projects mapped to the current reference that will not be remapped to a new reference
- There has not been much demand in the community for a new reference
- It’s very hard to get people to switch reference (though the mouse community embraces this a bit more than most)
- There’s a question as to whether pangenomics will become widespread and usable enough that references will be less important and thus not worth going through the process of updating to a new reference at this point
- Specifically from a GENCODE perspective, porting annotation between GRCm38 and GRCm39 was relatively because they essentially used the same underlying contig sets. Porting to a different assembly chain is a lot less trivial and risks putting mapping errors into tricky regions, particularly gene clusters

Anyway, those are some barriers, but that being said it is still worth raising the issue for discussion with the broader GRC group to see what people think. We have monthly calls, so you could likely come on one of those and present. If this is of interest let me know and I will get Valerie to put it on the agenda.

All the best,

Fergal

Benedict Paten

unread,
Jun 25, 2024, 6:16:47 PM (7 days ago) Jun 25
to Fergal Martin, Thomas Keane, Hiram Clawson, thib...@ncbi.nlm.nih.gov, ma...@ucsc.edu, gen...@soe.ucsc.edu, hcla...@ucsc.edu, lea...@ebi.ac.uk, bai...@ebi.ac.uk
Jumping in here, I think the plan for mouse should be the same as for human. We will not replace GRCh38 or, indeed, T2T-CHM13, rather we will integrate these references with new T2T assemblies to create a pangenome. The arguments for doing this with mouse are, from a genetic diversity point of view, at least as strong as for human. In this way the coordinates of existing references will not cease to be useful, but can instead be integrated with coordinates representing non-reference and highly divergent haplotypes. This is something we will put considerable effort into for this next round of the pangenome.

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/47B8349C-8FEC-4EEC-8595-568F8E72A88D%40ebi.ac.uk.

Matthew Speir

unread,
Jun 28, 2024, 5:26:35 PM (4 days ago) Jun 28
to Thomas Keane, gen...@soe.ucsc.edu, hcla...@ucsc.edu
Hello, Thomas. 

Thank you for letting us know that these two genomes were added to INSDC.

We have started importing these genomes and we'll send you an email once they're available. 

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
---

Matthew Speir

UCSC Genome Browser, User Support


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
Reply all
Reply to author
Forward
0 new messages