Problem with editting assembly in JuiceBox

324 views
Skip to first unread message

yh...@cornell.edu

unread,
Apr 22, 2019, 2:01:15 PM4/22/19
to 3D Genomics

I generated .assembly file outside the 3d-dna pipeline, using generate-assembly-file-from-fasta.awk. The hic file visualized in juicebox is generated from juicer.sh using the same fasta file that generates the .assembly file as the reference genome. However, when I load the .assembly file into Juicebox, I got something like the follwoing: the editing box only appears on the first chromosome, but none of the others. What's causing this problem here?

chrom1.png

chrom2.png


Olga Dudchenko

unread,
Apr 23, 2019, 3:36:22 AM4/23/19
to 3D Genomics
Hello,

See page 5 of genome assembly cookbook (dnazoo.org/methods) on how to visualize draft genome assemblies. Your assembly is not visualize in the format compatible with Juicebox assembly tools.

Best,
Olga

yh...@cornell.edu

unread,
Apr 24, 2019, 3:00:56 PM4/24/19
to 3D Genomics
Hi, this might be dumb, but I just tried it following the instructions on the cookbook, but I got the following message
awk: /home/yh362/tools/3d-dna-master/lift/lift-input-mnd-to-asm-mnd.awk:61: fatal: error reading input file `-': Bad file descriptor
awk: /home/yh362/tools/3d-dna-master/lift/lift-input-mnd-to-asm-mnd.awk:61: fatal: cannot open file `0 Nitab01 29 0 16 Nitab01 250 1 60 101M AAGCACAGGACAACCAGTATTATCCCCCAAAGCTTACCTTCAAAGCCCCCGAAACCTATCCATATGATCCACATTCTGACCTACCAGGGAAAACAGAAAAA 2 101M GGAAGAGATGTTCAGAAAAATAAAAAGCATAGAACAATCCTTCAGGGATATGAGAGGGTTAGGAGGTCAGGTAAGTGTAGCTTACAAAGACTTGTGTTTGT HWI-ST1115:238:C421HACXX:7:1106:9377:94141/1 HWI-ST1115:238:C421HACXX:7:1106:9377:94141/2 ' for reading (File name too long)

the command I have used is:
 ./visualize/run-assembly-visualizer.sh ~/work/hi-c-Ntab-UnpairedRemoved/references/Ntab.v5.genome.assembly ~/work/hi-c-Ntab-UnpairedRemoved/aligned/merged_nodups.txt

Olga Dudchenko

unread,
Apr 24, 2019, 4:47:45 PM4/24/19
to 3D Genomics
Looks like the code has trouble reading the .assembly file and parcing it into .cpops and .asm. Is your .assembly file readable and non-empty? Is .cprops created? (Check permissions perhaps.)

Best,
Olga

yh...@cornell.edu

unread,
Apr 24, 2019, 5:24:20 PM4/24/19
to 3D Genomics
I checked the permission of .assembly, it is indeed readable and it not empty. Below is how the .assembly looks like
>Nitab01 1 193998466
>Nitab02 2 127715613
>Nitab03 3 174128732
>Nitab04 4 138068049
>Nitab05 5 160803906
>Nitab06 6 192782312
>Nitab07 7 154253237
>Nitab08 8 175743238
>Nitab09 9 115498687
>Nitab10 10 150596706
>Nitab11 11 145641717
>Nitab12 12 124301120
>Nitab13 13 144070200
>Nitab14 14 115792696
>Nitab15 15 125483021
>Nitab16 16 149004669
>Nitab17 17 176934997
>Nitab18 18 150274196
>Nitab19 19 155547736
>Nitab20 20 148891655
>Nitab21 21 106939007
>Nitab22 22 214199688
>Nitab23 23 148581656
>Nitab24 24 108526473
>Nitab0S 25 166192375
>Nitab0T 26 42634236
>Nitab0U 27 192238180
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Olga Dudchenko

unread,
Apr 24, 2019, 5:33:41 PM4/24/19
to 3D Genomics
Do you have GNU Parallel?

yh...@cornell.edu

unread,
Apr 24, 2019, 5:34:35 PM4/24/19
to 3D Genomics
Yes, I have

Olga Dudchenko

unread,
Apr 24, 2019, 9:31:32 PM4/24/19
to 3D Genomics
Hmm..

Let's do step by step.

1)  awk -f ${path-to-3d-dna}/utils/convert-assembly-to-cprops-and-asm.awk Ntab.v5.genome.assembly

do .cprops and .asm get created? are they non-empty?

2) bash ${path-to-3d-dna}/lift/lift-input-mnd-to-asm-mnd.sh Ntab.v5.genome.cprops Ntab.v5.genome.asm Ntab.v5.genome

does a temp....mnd.txt file get created? Is it non-empty?

Best,
Olga

yh...@cornell.edu

unread,
Apr 24, 2019, 10:51:52 PM4/24/19
to 3D Genomics
Hi, thank you for timely reply! For the first command, yes, .cprops and .asm are created and they are not empty.
For the second command, I am not sure what the third argument should be? If I simply use "Ntab.v5.genome", it gives an error. I checked the GitHub source code, and it seems like the third argument is expecting a mnd file? Hope you can clarify that. 
Best,
Yilei

Olga Dudchenko

unread,
Apr 25, 2019, 10:09:41 AM4/25/19
to 3D Genomics
Sorry, not sure why this did not finish typing. It should be the path to your merged_nodups file.

Thanks,
Olga

yh...@cornell.edu

unread,
Apr 25, 2019, 4:11:48 PM4/25/19
to 3D Genomics
Hi, for the lift-input-mnd-to-asm-mnd.sh, it seems like it doesn't create any files but simply prints out a whole lot of lines. But when checking the previous running of the visualizer, I noticed that the temp....mnd.txt file is created and is not empty(16G). So I am guessing some wrapper program directs the output of lift-input-mnd-to-asm.sh to the temp...mnd.txt file?Thanks.

Olga Dudchenko

unread,
Apr 25, 2019, 9:12:31 PM4/25/19
to 3D Genomics
That's fine, the it gives output into stdout. (I was under impression from your previous output that this part was not working for you.). Then lets try to move forward. Dump all into a file: 

bash lift-input-mnd-to-asm.sh <cprops> <asm> <original_mnd> > temp.mnd.txt

now run
(calculate total length of assembly)
totlength=`awk 'FILENAME==ARGV[1]{len[$2]=$3;len[-$2]=$3;next}{for(i=1;i<=NF;i++){total+=len[$i]}}END{print total}' <cprops_file> <asm_file>`

(use juicer tools to visualize output)
bash ${path-to-3ddna}/visualize/juicebox_tools.sh pre -q 1 temp.mnd.txt out.hic <(echo "assembly "$((totlength / scale)))

Better type than copy in case weird characters.

Best,
Olga

yh...@cornell.edu

unread,
Apr 26, 2019, 3:08:30 PM4/26/19
to 3D Genomics
Got an error at the third step
yh362@mocha:~/tools/3d-dna-master/lift$ bash ../visualize/juicebox_tools.sh pre -q 1 temp.mnd.txt out.hic <(echo "assembly"$((totlength / scale)))           -bash: totlength / scale: division by 0 (error token is "scale")
Not including fragment map
Start preprocess
Writing header
Writing body
java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment.
        at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.mergeAndWriteBlocks(Preprocessor.java:1466)
        at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.access$000(Preprocessor.java:1237)
        at juicebox.tools.utils.original.Preprocessor.writeMatrix(Preprocessor.java:651)
        at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:373)
        at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:283)
        at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:108)
        at juicebox.tools.HiCTools.main(HiCTools.java:86)

Olga Dudchenko

unread,
Apr 27, 2019, 11:48:30 AM4/27/19
to 3D Genomics
Show a head of your temp.mnd.txt.

yh...@cornell.edu

unread,
Apr 27, 2019, 12:28:29 PM4/27/19
to 3D Genomics

yh362@mocha:~/tools/3d-dna-master/lift$ head -n 10 temp.mnd.txt
0 assembly 1145899487 0 16 assembly 797230688 1 0 101M R1 33 53H48M R2 Id1 Id2
0 assembly 1145899701 0 0 assembly 797793521 1 29 65M R1 60 73M R2 Id1 Id2
0 assembly 1145899544 0 0 assembly 797850606 1 25 101M R1 32 60M39S R2 Id1 Id2
16 assembly 1145899600 0 16 assembly 800135539 1 10 101M R1 60 101M R2 Id1 Id2
0 assembly 1145899438 0 16 assembly 809969613 1 60 101M R1 0 23S78M R2 Id1 Id2
0 assembly 1145899762 0 16 assembly 809971216 1 60 101M R1 7 101M R2 Id1 Id2
16 assembly 1145899926 0 16 assembly 795132329 1 33 101M R1 60 101M R2 Id1 Id2
16 assembly 1145899781 0 0 assembly 797848925 1 25 101M R1 60 101M R2 Id1 Id2
16 assembly 1145899942 0 16 assembly 800254793 1 0 41M R1 31 34S67M R2 Id1 Id2
16 assembly 1145899860 0 0 assembly 800269513 1 60 23S78M R1 60 101M R2 Id1 Id2

Olga Dudchenko

unread,
Apr 29, 2019, 7:35:47 PM4/29/19
to 3D Genomics
This looks fine to me..
And your scale is ?

totlength=`awk 'FILENAME==ARGV[1]{len[$2]=$3;len[-$2]=$3;next}{for(i=1;i<=NF;i++){total+=len[$i]}}END{print total}' <cprops_file> <asm_file>`
scale=$(( 1 + $totlength / 2100000000 ))

what's the scale? Make a manual tab-separated file

echo "assembly "$((totlength/scale) > temp.txt

and try to run bash ../visualize/juicebox_tools.sh pre -q 1 temp.mnd.txt out.hic temp.txt

What are you system specks/bash? My guess is that something's weird is going on in interpreting subshells, because the file on the surface seems quite alright.

Olga

Yilei Huang

unread,
May 7, 2019, 11:27:19 PM5/7/19
to 3D Genomics
Hi, thank you for your help. I think I finally got it to work, although I am not sure why it didn't work previously.
Reply all
Reply to author
Forward
0 new messages