Hi,
I have finished the 3D-DNA pipeline, and I have gotten excellent results. The scaffolds are in the range of the chromosomes as expected. I'm sharing an screenshot as can be seen:
I do have a few questions though:
1. There are a lot of debris sequences, the stats of the input and output assemblies are here:
| file |
Canu_MP |
Canu_MP_3D-DNA |
| num_seqs |
3517 |
5026 |
| sum_len |
822873405 |
824737905 |
| min_len |
1011 |
1000 |
| avg_len |
233970.3 |
164094.3 |
| max_len |
18816920 |
80397223 |
| Q1 |
22152 |
14647 |
| Q2 |
41836 |
25000 |
| Q3 |
124424 |
47000 |
| sum_gap |
175253 |
2039753 |
| N50 |
1369959 |
51413987 |
After the scaffolding while the chromosomes are in near perfect order, I believe that the debris scaffolds are not getting places causing a lot of false-joins. I'm feeling a lot of apprehension about joining and curating 5015 scaffolds in the assembly. I believe probably there has to be a better way of integrating the same, or atleast significantly reduce the same. Would I have to rescaffold these?
2. Based on the final output from 3D-DNA there are a lot of contigs below 1Mb in size. Is it recommended to mess about with them to include them in the assembly? Because as I see it, scaffolds over 1Mb contain nearly 160Mb in the 5015 scaffolds. The parameters that can be used are the following:
- --polisher-input-size (default 1Mb)
- --splitter-input-size (default 1Mb)
3. I went through a couple of other threads where in you have suggested to change the following:
- --editor-repeat-coverage (Misjoin editor threshold repeat coverage, default 2 to 2.5 or 3)
- -r (number of iterative rounds for misjoin correction)
What do you suggest do I do in this case?
Any help is appreciated!