Hic Format

93 views
Skip to first unread message

jay anh

unread,
Aug 31, 2021, 7:15:44 AM8/31/21
to 3D Genomics
Hi Juicer team,
in your document of HiC file layout , the chromosome ("0-0" and "1-1") length "chrLength" is defined as "int", which means the maximum length is 2147483647. it looks juicer can not deal with 3G genome, however, interestingly, juicer can correctly cope with 2.8G gnome and juicerbox can show 2.8G scale in the figure. can you please tell me how juicer manage big genome (>2.1G)?

thanks
Jay

Neva Durand

unread,
Aug 31, 2021, 7:22:16 AM8/31/21
to jay anh, 3D Genomics
Hello,

The total human genome size is 3B but the biggest chromosome in the human genome is chromosome 1 at 248,956,422. In Juicebox, you are only ever looking at binned values; the whole genome view is binned at many megabases.

Since other genomes do have chromosome lengths longer than the max integer size, we're moving to long in the next hic file format release.

Best
Neva

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/5195abeb-801c-4dfc-9ad2-e23bba6c8d78n%40googlegroups.com.


--
Neva Cherniavsky Durand, Ph.D. | she, her, hers
Assistant Professor |  Molecular and Human Genetics
Aiden Lab | Baylor College of Medicine

jay anh

unread,
Aug 31, 2021, 10:03:41 AM8/31/21
to 3D Genomics
i am not using human genome. i am using juicer and 2d-dna to "de nove" assembly an animal genome which has 2.8G scaffold sequence, the output from juicer, there are only two chromosome named "0-0" and "1-1".  the chromsome "1-1" has the total genome (2.8G). my question is why juicerbox can still show me 2.8G (> 2147483647) in vertical and horizontal axes, although chrLength is "int"?
thanks

Olga Dudchenko

unread,
Sep 1, 2021, 12:41:56 AM9/1/21
to 3D Genomics
Hello Jay,

Not sure I understand your question. You are sayign that you used 3d-dna for de novo assembly of a genome. 3D-DNA can produce two types of maps, the "assembly" maps and the "sandboxed" maps. The first one is the default, in which the whole genome is treated without breaking thinks into individual chromosomes, as a single unit with coordinates running from 0 to total assembly length. There will be only one chromosome named "assembly". (I do no know what you are referring to when you are saying 0-0 and 1-1). Indeed, up to very recently Juicebox/Juicebox Assembly Tools could not handle chromosomes longer than int, which would have been a problem for all genome-wide assembly maps (given that, e.g., a typical mammal is 3Gb> int). 3D-DNA/JBAT however do not operate an single-basepair resolution, but rather is always dealing with bins of a particular size. (3D-DNA in particular typically builds maps down to 1kb resolution). For this reason the int issue can be worked around via scaling. 3D-DNA does it automatically, when it will remap the whole genome to fit, rescaling all other relevant calculations so that one gets an stritckly equivalent of what one would expect from a genome when working with bins. When you load the assembly file in JBAT the scaling is recognized, and the labels are corrected in Juicebox to represent the original unscaled genome. You can find some references to this workaround on dnazoo.org assembly pages.

Hope this helps,
Olga
Reply all
Reply to author
Forward
0 new messages