3D-DNA output file format

343 views
Skip to first unread message

Xiang Zhou

unread,
Feb 28, 2020, 6:23:24 PM2/28/20
to 3D Genomics
I just want to post the QA here to benefit the community.

My original question to Olga is: Could point me to any resources describing the format / specifications of the following files:
1. *.cprops
2. *.assembly
3. *.scaffold_track.txt
4. *.superscaf_track.txt
If not, a simpler question is: How can i get the number and size of the unplaced contigs (contigs not part of a chromosome), thanks!

Reply from the author:

.cprops is a legacy format, now engulfed by assembly.
scaffold_track and superscaffold_track are secondary to assembly and are kept for .js users. They replicate the annotations that get loaded using the .assembly file in Juicebox Assembly Tools. These are 0-based 2D annotations outlining the boundaries of draft sequences (scaffold_track), or, potentially, their fragments, and large scaffolds composed of the draft sequences, including chormosome-lenght scaffolds (superscaffold_track)

There is no special annotation for chromosome-length scaffolds as compared to other scaffolds. In DNA Zoo this is solved by placing the chromosome-length ones first, and recording the karyotype in the README.json file. So, one can easily separate.

Further discussion on this is welcomed. Thanks!

Jarrod Guppy

unread,
Mar 23, 2022, 3:59:36 PM3/23/22
to 3D Genomics
Hi,

I was wondering if you are able to share the format for the .assembly files again? The link posted earlier sees to be dead.

Thanks heaps for your help here,

Olga Dudchenko

unread,
Mar 25, 2022, 1:13:47 AM3/25/22
to 3D Genomics
Can't seem to find it either. Just describing quickly here:

Here's an example draft, consisting of 3 sequences each 100bp in length.
---------------

Typical draft.assembly:

>seq1 1 100

>seq2 2 100

>seq3 3 100

1

2

3

----------------


Let's imagine that assembly analysis suggested that the second sequence has a misjoin in the middle, and one piece heeds to be concatenated to seq1, forming chr1, and the reverse complement of the second half needs to be concatenated to seq3, forming chr2. Note that misjoin detection and correction is not done at single bp resolution, resulting in a cut-out stretch spanning the misjoin, placed at the end as part of non-chromosomal scaffolds.


Typical rawchrom.assembly:

>seq1 1 100

>seq2:::fragment_1 2 40

>seq2:::fragment_2:::debris 3 20

>seq2:::fragment_1 4 40

>seq3 5 100

1 2

5 -4

3


The _HiC.assembly is rawchrom with removed Ns from the edges and added gaps between sequences scaffolded into a single scaffold.


Best,

Olga


10331...@qq.com

unread,
Apr 3, 2022, 3:26:05 PM4/3/22
to 3D Genomics
sometime, in my `.assembly` file , i find a text called `>hic_gap` , could you tell me this mean what ?

Olga Dudchenko

unread,
Apr 19, 2022, 12:57:27 AM4/19/22
to 3D Genomics
I believe I have replied to this in a different thread. Thanks, -Olga
Reply all
Reply to author
Forward
0 new messages