Unexpected FINAL assembly output

324 views
Skip to first unread message

Nolan Hartwick

unread,
Jun 3, 2019, 8:46:42 PM6/3/19
to 3D Genomics
I'm attempting to run juicer+3ddna on an arabidopsis dataset to get a sense of the workflow and become familiar with the tool. I'm getting rather strange output though. I've attached a snip of my 3ddna output hic+assembly file in juicebox. While there are some errors, lets ignore them for now. There are some pretty clear chromosomes and the superscaffolds that are identified by 3ddna seem mostly reasonable. The problem comes when I attempt to output the finalized scaffolded assembly using the "run-asm-pipeline-post-review.sh" script. The assembly I get out of this doesn't match up with the juicebox visualization at all. I'm expecting roughly a 8 main scaffolds each roughly 15 mega bases in size. In stead I get a FINAL fasta file that looks as follows....

HiC_scaffold_1  115472800
HiC_scaffold_2  106259
HiC_scaffold_3  104967
HiC_scaffold_4  55898
HiC_scaffold_5  53716
HiC_scaffold_6  44144
HiC_scaffold_7  42900
HiC_scaffold_8  41191
HiC_scaffold_9  40699
HiC_scaffold_10 40541
HiC_scaffold_11 40423
HiC_scaffold_12 40352
HiC_scaffold_13 34060
HiC_scaffold_14 31959

...I'm 100% certain that my "run-asm-pipeline-post-review.sh" is generating this assembly. I've no idea what is going wrong here. I've pasted the stdout for the command below....

 
/local/ifs3_scratch/CORE/nhartwic/3d-dna/run-asm-pipeline-post-review.sh -r /home/nhartwic/scratch/core/arabidopsis/hic/3ddna_wkdir/Sail232.V333.pass.miniasm.rawchrom.assembly /h
 -r|--review flag was triggered, treating file /home/nhartwic/scratch/core/arabidopsis/hic/3ddna_wkdir/Sail232.V333.pass.miniasm.rawchrom.assembly as a JB4A review file for draft
###############
Finilizing output:
:) -p flag was triggered. Running with GNU Parallel support parameter set to true.
:) -q flag was triggered, starting calculations for 1 threshold mapping quality
:) -i flag was triggered, building mapq without
:) -c flag was triggered, will remove temporary files after completion
...Remapping contact data from the original contig set to assembly
...Building track files
...Building the hic file
Not including fragment map
Start preprocess
Writing header
Writing body
..
Writing footer

Finished preprocess
HiC file version: 8

Calculating norms for zoom BP_2500000
Calculating norms for zoom BP_1000000
Calculating norms for zoom BP_500000
Calculating norms for zoom BP_250000
Calculating norms for zoom BP_100000
Calculating norms for zoom BP_50000
Calculating norms for zoom BP_25000
Calculating norms for zoom BP_10000
Calculating norms for zoom BP_5000
Calculating norms for zoom BP_1000
Writing expected
Writing norms
Finished writing norms
... -s flag was triggered, treating all contigs/scaffolds shorter than 15000 as unattempted.
... -l flag was triggered. Output will appear with headers of the form Sail232.V333.pass.miniasm_hic_scaffold_#.
... -g flag was triggered, making gap size between scaffolded draft sequences to be equal to 500.
Analyzing the merged assembly
...trimming N overhangs
...adding gaps


...Any assistance here is appreciated.
raw_3ddna_output.PNG

Nolan Hartwick

unread,
Jun 3, 2019, 8:50:34 PM6/3/19
to 3D Genomics
To be clear, All I posted was the sizes and names of each entry in the FINAL.fasta file along with their sizes in base pairs. The actual output looks like a normal fasta. The thing that is obviously wrong about the output is that I have a single entry of size ~115 mega base (and some weird smaller entries) instead of several entries of size ~15 mega base as is implied by the juicebox visualization.

Olga Dudchenko

unread,
Jun 4, 2019, 4:57:12 AM6/4/19
to 3D Genomics
Hi Nolan,

Did you do review on windows machine? Older version of JBAT uses the default line separator which is not \n by default. Open in some text editor that shows the hidden characters and check in there is anything weird in addition to \n. See e.g. this discussion thread:


Olga

Nolan Hartwick

unread,
Jun 4, 2019, 2:54:12 PM6/4/19
to 3D Genomics
The files that I'm currently using are pre-review. None of these files were ever touched by a windows machine and no carriage return characters are present.

Nolan Hartwick

unread,
Jun 6, 2019, 5:58:44 PM6/6/19
to 3D Genomics
Quick Update. Juicer+3ddna ran fine with another dataset I had and produced a totally reasonable assembly. Output matched what I expected from Juicebox anyway. I've attached the assembly file from the arabidopsis test (the one that is producing incorrect output). Nothing seems weird here but the scaffolded assembly produced by this file is throwing allmost all contigs into HiC_scaffold_1.

Any assistance is appreciated.
Sail232.V333.pass.miniasm.rawchrom.assembly

Olga Dudchenko

unread,
Jun 14, 2019, 10:08:45 PM6/14/19
to 3D Genomics
Nolan,

If you are not doing review, why are you running post-review script?

Is the file you are sharing "Sail232..." is the one that produces a single sequence? The file itself seems fine to me, everything checks out. What is the full command you are running with post-review. Are you passing along the original fasta file? If yes, then the issue perhaps is with the original fasta file. I cannot reproduce without it however. If you can share it I can try to reproduce and see what is the problem.

Best,
Olga

Nolan Hartwick

unread,
Jun 21, 2019, 1:31:11 PM6/21/19
to 3D Genomics
> If you are not doing review, why are you running post-review script?
The short answer is that I'm trying to become familiar with this software suite.

> Is the file you are sharing "Sail232..." is the one that produces a single sequence?
It doesn't produce a single sequence. It produces a set of sequences with lengths described in my original post. Almost all contigs are put into a single sequence.

>  What is the full command you are running with post-review. Are you passing along the original fasta file?
Yes. Full command is detailed below...

bash /home/nhartwic/packages/3d-dna/run-asm-pipeline-post-review.sh \
    -r ~/scratch/core/arabidopsis/hic/3ddna_wkdir/Sail232.V333.pass.miniasm.rawchrom.assembly \
    ~/scratch/core/arabidopsis/hic/Sail232.V333.pass.miniasm.fasta \
    ~/scratch/core/arabidopsis/hic/juicer_wkdir/aligned/merged_nodups.txt

Here is a link to the fasta file. You should be able to download it without issue...

Olga Dudchenko

unread,
Jun 21, 2019, 3:41:22 PM6/21/19
to 3D Genomics
I am sorry but I cannot reproduce your issue. Here are the sizes I get with the files you've sent, top of the list, all seem to make sense. Are there any error messages?

HiC_scaffold_1 1 13071845
HiC_scaffold_2 2 9155310
HiC_scaffold_3 3 3155559
HiC_scaffold_4 4 2739286
HiC_scaffold_5 5 19119500
HiC_scaffold_6 6 14842000
HiC_scaffold_7 7 13937768
HiC_scaffold_8 8 13948232
HiC_scaffold_9 9 10984000
HiC_scaffold_10 10 12999845
HiC_scaffold_11 11 308154
HiC_scaffold_12 12 106259
HiC_scaffold_13 13 104967
HiC_scaffold_14 14 55898
HiC_scaffold_15 15 53716
HiC_scaffold_16 16 44144
HiC_scaffold_17 17 42900

Nolan Hartwick

unread,
Jun 21, 2019, 6:31:09 PM6/21/19
to 3D Genomics
I don't remember any errors. Unfortunately, I don't have the log files to hand to easily reference. I'll run it again when I have a moment and see what I get. Thanks for all the help in any case.
Reply all
Reply to author
Forward
0 new messages