3d-dna . Tweeking parameters from 0.hic wig and bed files

173 views
Skip to first unread message

Ricardo Guerreiro

unread,
Sep 12, 2019, 9:07:48 AM9/12/19
to 3D Genomics
Hi,

I've been reading many posts here and am still unsure how I can tweek parameters towards a better assembly. In annex I have my 0 iteration of the assembly ( 1- an overview and 2- a zoomed picture).

It seems as if there is still possible scaffolding, but small misassemblies (zoom) probably disrupt the connection patterns between scaffolds. I loaded bed and wig files but am not sure how to take decisions from here. It seems to find all mismatches very well but it even finds them where it should not, based on drops in the depletion score.

I understand that the -r serves to correct misassemblies, but it seems to pick them up and throw them in a garbage bin. Increasing this parameter makes the garbage area bigger and bigger, and many contigs remain away from scaffolds where they belong.

The last two annexed pictures are the raw-chromossome version of a default run (-r 3) and a -r 6 run (same color scale: 0-500-1000). Can you honestly evaluate which is better? It seems very subjective to me.

I'm not sure more -r rounds are good. Unless I can somehow better recover excluded contigs, perhaps with the seal stage?

Maybe I should be content with this and move to manual curation.


Resume of points:
 - 3d-dna could scaffold more?
 - Finding misassemblies is too sensitive?
 - More rounds are exuding good contigs from assembly?


Kind regards,
Ricardo

PS: I really admire you guys, great work!



r0_morescaffoldingpossible.png
evaluating_r0performance.png
canu4_arcs2x_polished_garbage.png
canu4_arcs2x_final_3rounds.png
canu4_arcs2x_final_6rounds.png

Olga Dudchenko

unread,
Sep 13, 2019, 10:34:47 PM9/13/19
to 3D Genomics
Hello Ricardo,

Thanks for the kind words.

First, I want to comment that, while I understand how this may seem like a logical thing, increasing the number of iterations for error-correction is not a universally good strategy. As you've mentioned, it is guaranteed to remove more and more sequence into debris, leaving less and less signal to work with in the main assembly. The reason to have several rounds is to address closely located misjoins, and except for a very few really heavily bad cases 2 cycles addresses the majority (and is hence the default).

The guide to help understand what's going on is your r0_morescafoldingpossible.png pic. Your cov track is scaled weirdly, but I would bet that it is extremely non-uniform, leading to lots of sequence annotated as higher than expected coverage (repeats_wide.at.step.0.bed). You probably want to increase editor-repeat-coverage parameter. The depletion score annotates things reasonably, as far as I can tell from your zoom in.

Hope this helps and good luck,

Olga

Ricardo Guerreiro

unread,
Sep 17, 2019, 8:39:04 AM9/17/19
to 3D Genomics

Great, thank you for the help!

Increasing editor-repeat-coverage parameter to 3 created a better assembly. I tried 2.5 but it didn't accept it. My coverage varied often between 0.5 and 2.5.

There is a problem that still persists.  There are many wrong chromossome assignments on the edges of the blue squares (like the one I annex here). This looks like it was a decision based on coarse resolutions. Does it make sense changing resolution parameters?


Kind regards,
Ricardo
edges_resolution_problem.png

Olga Dudchenko

unread,
Sep 18, 2019, 8:25:30 AM9/18/19
to 3D Genomics
Are you sure the assembly file you load matches the hic map?
Olga
Reply all
Reply to author
Forward
0 new messages