Re: JUICER issues at the end of the pipeline

2,456 views
Skip to first unread message
Message has been deleted

Muhammad Shamim

unread,
Apr 3, 2022, 3:40:31 PM4/3/22
to 3D Genomics
What Juicer Tools jar version did you use?
On Wednesday, March 23, 2022 at 2:59:52 PM UTC-5 nolan.r...@outlook.fr wrote:

Hi Dev, 


I would like to contact you following several errors that occur at the end of the pipeline and that I can not correct despite research on the internet or on the group.


I'm doing a HiC analysis in order to reproduce results obtained via your pipeline by another research team and so far, everything was working perfectly (or at least, I managed to solve the problems) but now I have some errors.

* First, issues :


- No reads in Hi-C contact matrices


- Unknown command: statistics


- GPUs are not installed so HiCCUPs cannot be run


- Either inter.hic or inter_30.hic were not created. Check for results

* What I've done:


- For Statistics problem, I download the latest juicer_tools but idk why it doesn't work.


- I don't have any ideas for the first one and second one error in the previous list of issues.


* To specify, I used JUICER CPU.



I send it as well the output with errors :


Picked up _JAVA_OPTIONS: -Xmx1024m -Xms1024m Exception in thread "main" java.lang.RuntimeException: Unknown command: statistics at juicebox.tools.HiCTools.main(HiCTools.java:98) Picked up _JAVA_OPTIONS: -Xmx1024m -Xms1024m Exception in thread "main" java.lang.RuntimeException: Unknown command: statistics at juicebox.tools.HiCTools.main(HiCTools.java:98) Picked up _JAVA_OPTIONS: -Xmx1024m -Xms512m Not including fragment map Error while reading graphs file: java.io.FileNotFoundException: /media/nolan/DATA/Documents/project/HiC/aligned/inter_hists.m (No such file or directory) Start preprocess Writing header Writing body java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment. at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.mergeAndWriteBlocks(Preprocessor.java:1650) at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.access$000(Preprocessor.java:1419) at juicebox.tools.utils.original.Preprocessor.writeMatrix(Preprocessor.java:832) at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:582) at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:346) at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:116) at juicebox.tools.HiCTools.main(HiCTools.java:96) real 19m17.843s user 17m28.881s sys 0m38.002s Picked up _JAVA_OPTIONS: -Xmx1024m -Xms512m Calculating norms for zoom BP_2500000 Calculating norms for zoom BP_1000000 Calculating norms for zoom BP_500000 Calculating norms for zoom BP_250000 Calculating norms for zoom BP_100000 Calculating norms for zoom BP_50000 Calculating norms for zoom BP_25000 Calculating norms for zoom BP_10000 Calculating norms for zoom BP_5000 Calculating norms for zoom BP_2000 Calculating norms for zoom BP_1000 Calculating norms for zoom BP_500 Calculating norms for zoom BP_200 Calculating norms for zoom BP_100 Writing expected Writing norms Finished writing norms real 0m0.328s user 0m0.512s sys 0m0.087s Tue Mar 22 15:48:57 CET 2022 Picked up _JAVA_OPTIONS: -Xmx1024m -Xms512m Not including fragment map Error while reading graphs file: java.io.FileNotFoundException: /media/nolan/DATA/Documents/project/HiC/aligned/inter_30_hists.m (No such file or directory) Start preprocess Writing header Writing body java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment. at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.mergeAndWriteBlocks(Preprocessor.java:1650) at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.access$000(Preprocessor.java:1419) at juicebox.tools.utils.original.Preprocessor.writeMatrix(Preprocessor.java:832) at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:582) at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:346) at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:116) at juicebox.tools.HiCTools.main(HiCTools.java:96) real 18m13.224s user 16m20.493s sys 0m36.571s Picked up _JAVA_OPTIONS: -Xmx1024m -Xms512m Calculating norms for zoom BP_2500000 Calculating norms for zoom BP_1000000 Calculating norms for zoom BP_500000 Calculating norms for zoom BP_250000 Calculating norms for zoom BP_100000 Calculating norms for zoom BP_50000 Calculating norms for zoom BP_25000 Calculating norms for zoom BP_10000 Calculating norms for zoom BP_5000 Calculating norms for zoom BP_2000 Calculating norms for zoom BP_1000 Calculating norms for zoom BP_500 Calculating norms for zoom BP_200 Calculating norms for zoom BP_100 Writing expected Writing norms Finished writing norms real 0m0.342s user 0m0.532s sys 0m0.068s /media/nolan/DATA/juicer/scripts/common/juicer_tools is post-processing Hi-C for mm10 Data read from /media/nolan/DATA/Documents/project/HiC/aligned/inter_30.hic. Motifs read from /media/nolan/DATA/juicer/references/motif ARROWHEAD: Picked up _JAVA_OPTIONS: -Xmx1024m -Xms512m Reading file: /media/nolan/DATA/Documents/project/HiC/aligned/inter_30.hic Unable to assess map sparsity; continuing with Arrowhead Default settings for 10kb being used max 0.0 0 domains written to file: /media/nolan/DATA/Documents/project/HiC/aligned/inter_30_contact_domains/10000_blocks.bedpe Arrowhead complete HiCCUPS: GPUs are not installed so HiCCUPs cannot be run (-: Postprocessing successfully completed, maps too sparse to annotate or GPUs unavailable (-: ***! Error! Either inter.hic or inter_30.hic were not created Either inter.hic or inter_30.hic were not created. Check for results

I hope I can fix these problems with your help.


Thanks in advance,


Nolan.

Lia OBINU

unread,
Jun 7, 2022, 1:01:04 AM6/7/22
to 3D Genomics
Hi,
I had two similar issues using juicer_tools.jas v.2.13.07 and juicer2 CPU version at this stage:

(-: Finished sorting all sorted files into a single merge.
(-:  Mark duplicates done successfully

...

java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment.
...
Calculating norms for zoom BP_2500000java.lang.NullPointerException
...
java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment.
...
Calculating norms for zoom BP_2500000java.lang.NullPointerException
...
.//scripts/common/juicer_postprocessing.sh: option requires an argument -- g
Usage: .//scripts/common/juicer_postprocessing.sh [-h] -j <juicer_tools_file_path> -i <hic_file_path> -m <bed_file_dir> -g <genome ID>

***! Error! Either inter.hic or inter_30.hic were not created
Either inter.hic or inter_30.hic were not created.  Check  for results

I got the "No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment." twice. I read that could be a problem of discrepancy between the names of the sequences in the fasta and the names of the sequences in the merged_nodups.txt file, but I didn't get the merged_nodups.txt file in the aligned directory. I also used the -z and -p assembly options in my command and I didn't use the -q option. The command looks like this:

./scripts/juicer.sh -z ./references/draft_assembly.fasta -t 32 -p assembly -D ./


About the second error:

.//scripts/common/juicer_postprocessing.sh: option requires an argument -- g
Usage: .//scripts/common/juicer_postprocessing.sh [-h] -j <juicer_tools_file_path> -i <hic_file_path> -m <bed_file_dir> -g <genome ID>


***! Error! Either inter.hic or inter_30.hic were not created
Either inter.hic or inter_30.hic were not created.  Check  for results


I don't understand why I should use the -g option if I used -z and -p. Moreover, it says "Either inter.hic or inter_30.hic were not created." but I have those two files in the aligned directory. Here the files that I got in the aligned directory:

ls -lh
total 18G

281 Jun  1 15:26 header
9.9K Jun  1 19:23 inter_30.hic
7.7K Jun  1 19:21 inter_30_hists.m
2.0K Jun  1 19:21 inter_30.txt
9.9K Jun  1 19:22 inter.hic
7.7K Jun  1 19:20 inter_hists.m
2.0K Jun  1 19:20 inter.txt
1.7G Jun  1 18:58 merged1.txt
1.6G Jun  1 19:16 merged30.txt
15G Jun  1 19:18 merged_dedup.bam


I will really appreciate any help. Thank you in advance!

Cheers,


Lia


--
Dona il  5x1000 all'Università degli Studi di Sassari
codice fiscale: 00196350904

Moshe Olshansky

unread,
Jun 8, 2022, 10:19:31 PM6/8/22
to 3D Genomics
Hi Lia,

First of all, from the size of the .hic files you can see that they only contain the header, so there is no "real" data in them.

When you used -p assembly option, is assembly a file listing all your chromosomes/contigs and there sizes? If so, how many of them do you have?
Do you have the juicer_tools.jar file? If so, try to run something like
java -jar juicer_tools.jar pre merged1.txt trial.hic assembly
and see what happens.

Lia OBINU

unread,
Jun 9, 2022, 8:02:44 AM6/9/22
to 3D Genomics
Hi,
Thank you so much for your reply!

I managed to obtain the merged_nodups.txt file using juicer 1.6 instead of juicer 2.
I was told that the "-p assembly" option tells Juicer that I don't know the chromosome sizes yet as I am doing de novo assembly. So, "assembly" should not be a file, am I wrong?
I tried also to run the juicer test and it worked perfectly, so I think the installation is ok.
Anyway, I still have some warnings/errors in the pipeline when using my data:

./scripts/juicer.sh -z ./references/draft_assembly.fasta -t 32 -p assembly -D ./
(-: Looking for fastq files...fastq files exist
(-: Aligning files matching .../fastq/*_R*.fastq*
 to genome hg19 with site file .//restriction_sites/hg19_none.txt
(-: Created .../aligned.
bwa mem ...
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 2149540 sequences (320000280 bp)...
[M::process] read 2149104 sequences (320000172 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (8953, 418269, 8715, 8962)
...
[M::mem_process_seqs] Processed 483344 reads in 480.672 CPU sec, 15.027 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem ...
[main] Real time: 4065.778 sec; CPU: 129198.060 sec
(-:  Align of ... done successfully

(-: Finished sorting all sorted files into a single merge.
Picked up _JAVA_OPTIONS: -Xmx16384m
Picked up _JAVA_OPTIONS: -Xmx16384m
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARN [2022-06-06T12:51:41,619]  [Globals.java:138] [main]  Development mode is enabled
Using 1 CPU thread(s) for primary task
Using 10 CPU thread(s) for secondary task
Not including fragment map

Start preprocess
Writing header
Writing body
.java.lang.NullPointerException
    at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:743)
    at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:436)
    at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:165)
    at juicebox.tools.HiCTools.main(HiCTools.java:94)
Picked up _JAVA_OPTIONS: -Xmx16384m
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARN [2022-06-06T13:01:12,709]  [Globals.java:138] [main]  Development mode is enabled
Using 1 CPU thread(s) for primary task
Using 10 CPU thread(s) for secondary task
Not including fragment map

Start preprocess
Writing header
Writing body
.java.lang.NullPointerException
    at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:743)
    at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:436)
    at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:165)
    at juicebox.tools.HiCTools.main(HiCTools.java:94)
.//scripts/common/juicer_tools is post-processing Hi-C for hg19
Data read from .../aligned/inter_30.hic.
Motifs read from .//references/motif

ARROWHEAD:

Picked up _JAVA_OPTIONS: -Xmx16384m
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARN [2022-06-06T13:05:48,013]  [Globals.java:138] [main]  Development mode is enabled
Reading file: .../aligned/inter_30.hic
Using 1 CPU thread(s) for primary task

Unable to assess map sparsity; continuing with Arrowhead
Default settings for 10kb being used
No valid chromosome matrices at given resolution


HiCCUPS:

GPUs are not installed so HiCCUPs cannot be run

(-: Postprocessing successfully completed, maps too sparse to annotate or GPUs unavailable (-:
(-: Pipeline successfully completed (-:
Run cleanup.sh to remove the splits directory
Check .../aligned for results

This is what I have in my aligned directory now:

total 61G
227 Jun  6 10:29 header
26G Jun  6 12:25 merged_sort.txt
116M Jun  6 12:43 opt_dups.txt
21G Jun  6 12:43 merged_nodups.txt
4.5G Jun  6 12:43 dups.txt
2.1K Jun  6 12:49 inter.txt
8.6K Jun  6 12:49 inter_hists.m
11G Jun  6 12:49 abnormal.sam
332M Jun  6 12:49 unmapped.sam
1 Jun  6 12:51 collisions.txt
1 Jun  6 12:51 collisions_nodups.txt
2.1K Jun  6 13:01 inter_30.txt
8.2K Jun  6 13:01 inter_30_hists.m
11K Jun  6 13:05 inter_30.hic
4.0K Jun  6 13:05 inter_30_contact_domains
11K Jun  6 15:43 inter.hic

I know that my .hic files are empty and I am not able to visualise them at this step using juicebox, but since I got the merged_nodups.txt file I tried to run the 3d-dna pipeline to scaffold my draft assembly, and it worked without any error. This was the command:

awk -f .../3d-dna/utils/wrap-fasta-sequence.awk .../references/draft_assembly.fasta > ./draft_wrap.fasta

bash .../3d-dna/run-asm-pipeline.sh ./draft_wrap.fasta .../aligned/merged_nodups.txt 


I am working with Arabidopsis data, so I expected 5 chromosomes and this is how I visualise the .rawchrom.hic + .rawchrom.assembly in JBAT:

Screenshot from 2022-06-09 12-13-58.png


I am really new to this, but it looks good, right? I can see the 5 chromosomes, even if at the bottom right corner there are other small things classified as chromosomes, what do you think about that? here the picture:


Screenshot from 2022-06-09 12-14-38.png

I am sorry for asking so many questions, but this is the first time that I am using these tools, and thank you so much for having created them!
  • Is there anything wrong in my pipeline or do you think it worked well? (considering that what I am really interested in is the final scaffolded assembly at the end of the 3d-dna pipeline)
  • If it worked well, do you think it is necessary to manually improve the assembly using JBAT? (I watched the videos, and I cannot really individuate patches as the ones described in the assembly cookbook (misjoins, translocations, inversions...) but the background is not completely clean anyway).
  • What are exactly the things that I see at the bottom right corner? should I move/remove them somehow?
I really thank you so much in advance.

Lia

Moshe Olshansky

unread,
Jun 12, 2022, 10:11:52 PM6/12/22
to 3D Genomics
Hi Nolan,

Sorry for the delay.

What was your original juicer command?
Can you do ls -ltrh to your aligned folder?

Regards,
Moshe.

Moshe Olshansky

unread,
Jun 12, 2022, 10:23:08 PM6/12/22
to 3D Genomics
Hi Lia,

I think that you are mixing between the --assembly flag which tells juicer that it is for assembly purposes and -p flag which specifies the location of the file listing chromosomes/contigs and their lengths.
I think that your reads are aligned to your fasta file but since you did not use the -g flag it assumes that it is hg19 with chromosomes 1,2,...,22,X,Y,MT, and since the chromosomes/contigs in your fasta file have different names and lengths it fails in the subsequent steps.
I suggest that you issue
scripts/juicer.sh -help
and carefully read the instructions.

Best regards,
Moshe. 

Lia OBINU

unread,
Jun 13, 2022, 6:30:24 AM6/13/22
to 3D Genomics
Hi Moshe,

Thank you so much. I carefully read the instructions, but the message
[genomeID] must be defined in the script, e.g. "hg19" or "mm10" (default "hg19"); alternatively, it can be defined using the -z command

suggests that if I use the -z option then I don't have to use the -g. From your response instead I see that I must use both. Am I right?
I am using a de novo assembly of Arabidopsis, obtained with flye using ONT reads and polished with racon using HiFi reads.
If I use -z path/to/my/draft/denovo_assembly.fasta, what should I indicate in the -g option?
Is the -p chromosome.size file compulsory? What kind of file should it be? if I generate the .fai file of my de novo assembly can it be used in the -p option?
As regards the --assembly flag I didn't find any explanation anywhere. Could you please explain me how it works?

Thank you so much.

Cheers,

Lia

Lia OBINU

unread,
Jun 13, 2022, 7:21:17 AM6/13/22
to 3D Genomics
Hi Moshe,
 
Sorry, probably I understood now.
My purpose is to scaffold my de novo assembly (still at contig level), so I do not have a chromosome size file at this step.
Since I am running juicer 1.6 I should use -p assembly as shown in the cookbook, but apparently I must use both the flags -g and -z. Am I correct if I run the command indicating my draft de novo assembly in both the -g and -z flags, as follows?

./scripts/juicer.sh -g ./references/draft.fasta -z ./references/draft.fasta -t 32 -p assembly -s none -D ./ -S early 

Thank you so much in advance.


Lia

Moshe Olshansky

unread,
Jun 14, 2022, 9:45:15 PM6/14/22
to 3D Genomics
Hi Lia,

What version of juicer.sh are you using? I do not see anything where -p assembly means anything except that assembly is the file containing the list of chromosomes/contigs and their lengths.
Please also note the -S early option - for assembly purposes you do not need to create the hic map. All you need is the merged_nodups.bam.

Best regards,
Moshe.

Lia OBINU

unread,
Jun 15, 2022, 5:45:13 AM6/15/22
to 3D Genomics
Hi Moshe,

I am using juicer 1.6 and the juicer_tool.jar 2.13.07. My purpose is to scaffold my draft assembly, and I do not have any chromosome size file so far.
I am following what the genome assembly cookbook says here:
Screenshot 2022-06-15 at 10-04-43 manual_180319 - manual_180322.pdf.png
-p flag is mandatory otherwise the pipeline will not start. But at this step I only have a draft assembly and I do not have a chromosome size file. At this purpose, I found here in the forum the suggestion to use -p assembly (that is the same thing that is suggested in the cookbook): https://groups.google.com/g/3d-genomics/c/weZEqRDEq_g/m/UjVKlfGBBAAJ
I assume the -p assembly option has been substitute by the --assembly flag in juicer2, but I didn't find any documentation about this.
As I see, I don't need a merged_nodups.bam to scaffold the assembly with 3d-dna, but the merged_nodups.txt, that is what I obtained and I used to run 3d-dna.

So, running this command:

./scripts/juicer.sh -g ./references/draft.fasta -z ./references/draft.fasta -t 32 -p assembly -s none -D ./ -S early

I didn't have any error, and I proceeded with 3d-dna pipeline (again), running these commands:
awk -f /../3d-dna/utils/wrap-fasta-sequence.awk /../juicer/references/draft.fasta > ./draft_wrapped.fasta

bash /../3d-dna/run-asm-pipeline.sh ./draft_wrapped.fasta /../juicer/aligned/merged_nodups.txt

that worked well without any error.
So, I assume so far everything is correct now, am I right? Please let me know if I have misunderstood something.

Anyway, the Hi-C in JBAT look like this:
Screenshot from 2022-06-09 12-13-58.png

Screenshot from 2022-06-09 12-14-38.png
Assuming that everything so fa was correct, I am now wondering: what can I do to manually improve the scaffolding of the assembly using JBAT? I am not able to individuate patterns of inversions, translocation, etc., as explained in the genome assembly cookbook. And also, what are exactly those small "chromosomes" at the bottom right corner of the picture? I am expecting 5 chromosomes for Arabidopsis, and they are well visible, but I do not understand what those small things classified as chromosomes.

I really appreciate your help.

Kind regards,

Lia

Lia OBINU

unread,
Jun 15, 2022, 5:49:31 AM6/15/22
to 3D Genomics
Just to be clear: the images in the previous message are the rawchrom.hic + rawchrom.FINAL.assembly that I obtained after running the 3d-dna pipeline, and not the .hic files produced by juicer, that this time have not been created since I used the -S early option.

Levi Bauer

unread,
Jun 15, 2022, 3:49:39 PM6/15/22
to 3D Genomics
Hi Lia,
I may be misunderstanding what help you're looking for, but you can generate a chromosome size file even though you have a draft assembly. The -p option is looking for a file that has all of your contigs and their size, it isn't actually "chromosome size" but more so "contig size". I'm not a part of the Aiden lab/don't have the knowledge to say whether it's fine for you to use your merged_nodups file as is, but you could always try it again using a chrom.size file and see if the contact map looks different. Here's how I generate the chrom.sizes file: 
bioawk -c fastx '{print $name"\t"length($seq)}' references/pilon_asmv3_redo.fasta > chrom.sizes

Here's a youtube video from the Aiden lab describing how to navigate JBAT and they also show some common examples of fixes. https://www.youtube.com/watch?v=Nj7RhQZHM18 
If you don't like clicking random links on forums (don't blame you) you can search "juicebox hic" on youtube and it should come up. It's hard to see any specific fixes from a zoomed out perspective, but an example fix I see is your second scaffold has a contig at 25 Mbp that should be flipped. 
I've posted some screenshots of fixes I've made below.

The little contigs/scaffolds in the bottom right are the debris/junk section. Basically 3d-dna threw those contigs out for some reason or another, and they all collect in the corner. You can try to manually incorporate some of them if you prefer.

Hope I at least answered some of your questions.

photo 1: contigs in the middle have weak signal with their neighbors (you can see a white plus sign), but strong signal with the first contig (strong red signal to the left of the middle contigs). So I grab those middle contigs and move them next to the first.
dmetel_contactmap_fix_step0.PNG
photo 2: ok, so now I've moved those middle contigs to the upper left and most things are on the diagonal. However, if you look at what is now the middle contig, you can see a "bowtie motif", basically signal off the upper right and lower left corners. This is indicating the the contig needs to be rotated about the axis, so the top-left will become the bottom-right. 
dmetel_contactmap_fix_step1.PNG

photo 3: fixed, note how there's much less off-diagonal signal than in the first photo. 
dmetel_contactmap_fix_step2.PNG

Levi Bauer

unread,
Jun 15, 2022, 4:00:58 PM6/15/22
to 3D Genomics
Never mind about the chrom.sizes, you don't need it/your hic map is fine.

On Wednesday, June 15, 2022 at 5:49:31 AM UTC-4 lob...@uniss.it wrote:

Lia OBINU

unread,
Jun 16, 2022, 6:16:16 AM6/16/22
to 3D Genomics
Hi Levi,

Thank you so much for your help!
I think I understood a bit better how to improve manually the assembly, even if it is really difficult to individuate the motifs described in the video and in the assembly cookbook in order to correct them.
Many thanks again!

Cheers,

Lia

Moshe Olshansky

unread,
Jun 19, 2022, 6:38:23 AM6/19/22
to 3D Genomics
Hi Levi,

Thank you for your help and explanations.

Reply all
Reply to author
Forward
0 new messages