Hi,
I have mapped hic data to our chromosome-level genome assembly and would like to add contig information to the assembly file.
I wrote a short wrapper script which can transform a bed file of contig locations into a .assembly-like file.
Unfortunately, there are some issues with JuiceBox reading the file. I must miss something about the structure, which Juicebox expects.
I created fragments from the inital chromosomes, like:
inital assembly file:
>ssa01 1 174673756
>ssa09 2 160669013
>ssa10 3 126124147
>ssa13 4 114504115
>ssa11 5 111966310
modified assembly file:
>ssa01:::fragment_1 1 129591
>ssa01:::fragment_2 2 2857436
>ssa01:::fragment_3 3 781246
>ssa01:::fragment_4 4 44728362
>ssa01:::fragment_5 5 56399376
>ssa01:::fragment_6 6 1104360
>ssa01:::fragment_7 7 362161
>ssa01:::fragment_8 8 1212759
>ssa01:::fragment_9 9 3047345
I made sure identifiers (second column) appeared only once and total length added up to the original sizes.
Could you advise me on which input characteristics JuiceBox assembly files have to have for a given .hic file?
I could imagine other people having interestin in such back-and-forth transition between assembly and bed files as well and would be happy to share the code once it works.
By just manually manipulating an assembly file, i can make JuiceBox display pretty much everything (see screenshot below), so its just a matter of finding the semantics =)
original edited
Thank you,
Michel