Hi,
I am assembling a plant genome (2N=2*19, GS=330Mb), the draft genome is quite complete and at a pseudo-chromosome level with 130 scaffolds. The output of juicer shows similar numbers of scaffolds as the draft but after running 3d-dna using default, it splitted into 7144 scaffolds in the resolved.assembly, 1444 scaffolds in FINAL.assembly but no final.HiC. Only 0.assembly has the same scaffold number but it is before the any processing like misjoin and polish, right?
When I look back to the statistics showing in the juicer version, only 62% paired reads are used then there are only 39M Hi-C contacts. I am not sure if it is too low for the default program. I have two questions:
1. Do you have any idea why 3ddna will produce more scaffolds than before? Is it proper enough to just use 0.hic version? (I am also quite confused by multiple output file, I found the manual less informative, is there detailed infos?)
2. How many Hi-C contact reads would you suggest as a good HiC dataset? Is the low ratio in my data because of the high repeat ratio of this species?
Best wishes,
Xiaomeng