Thank you very much for developing such a nice tool for the community.
I am trying to use 3d-DNA pipeline to assemble a plant genome which is known to be 2.6 gb large. I got a 2.7 gb assembly from ONT long reads and the contig N50 is around 660 kb. Then I aligned Hi-C reads to the assembly using juicer. Below is the statistics of the hi-c alignment.
Sequenced Read Pairs: 539,516,521
Normal Paired: 213,369,608 (39.55%)
Chimeric Paired: 100,242,870 (18.58%)
Chimeric Ambiguous: 218,009,336 (40.41%)
Unmapped: 7,894,707 (1.46%)
Ligation Motif Present: 373,605,690 (69.25%)
Alignable (Normal+Chimeric Paired): 313,612,478 (58.13%)
Unique Reads: 290,267,405 (53.80%)
PCR Duplicates: 23,239,355 (4.31%)
Optical Duplicates: 105,565 (0.02%)
Library Complexity Estimate: 2,008,809,080
Intra-fragment Reads: 25,254,264 (4.68% / 8.70%)
Below MAPQ Threshold: 190,171,454 (35.25% / 65.52%)
Hi-C Contacts: 74,841,687 (13.87% / 25.78%)
Ligation Motif Present: 57,935,480 (10.74% / 19.96%)
3' Bias (Long Range): 65% - 35%
Pair Type %(L-I-O-R): 24% - 26% - 25% - 24%
Inter-chromosomal: 37,878,703 (7.02% / 13.05%)
Intra-chromosomal: 36,962,984 (6.85% / 12.73%)
Short Range (<20Kb): 33,783,522 (6.26% / 11.64%)
Long Range (>20Kb): 3,178,444 (0.59% / 1.10%)
After the alignment, I tried 3d-DNA pipeline to scaffold the assembly with the Hi-C data.
I used --editor-repeat-coverage 5, --editor-coarse-resolution 100000, --splitter-coarse-stringency 30 and I got the results. Please see attached 0.hic map.
As you can see the map is only 1400 Mb but my assembly is 2700 Mb. What could be wrong? And do you have any more suggestions on which parameter I could tune as now I don't see any chromosome-scaffold patterns. Thank you very much.