Juicer CPU - merged_*.txt

771 views
Skip to first unread message

Diana Sernas

unread,
Jul 27, 2018, 4:55:36 PM7/27/18
to 3D Genomics
Hi all,

Looks like Juicer is tripping up after alignment but isn't making the hic files. I'm using the HIC003 test data and running Juicer on a single CPU.

Since all the files (merged*, inter*) exist and are nonempty, I'm not sure how to go about addressing this error. Any advice/help would be much appreciated. Thank you for your time!

$ ./juicer.sh -d MyPath/juicer/work -S dedup -p ../references/chrom.sizes -z ../references/Homo_sapiens_assembly19.fasta -D ../
(-: Looking for fastq files...fastq files exist
Picked up _JAVA_OPTIONS: -Xmx16384m
MyPath/juicer/work/aligned/merged_nodups.txt does not exist or does not contain any reads.
Picked up _JAVA_OPTIONS: -Xmx16384m
Picked up _JAVA_OPTIONS: -Xmx16384m
MyPath/juicer/work/aligned/merged_nodups.txt does not exist or does not contain any reads.
Picked up _JAVA_OPTIONS: -Xmx16384m
***! Can't find inter.hic in MyPath/juicer/work/aligned/inter_30.hic
***! Error! Either inter.hic or inter_30.hic were not created
Either inter.hic or inter_30.hic were not created.  Check MyPath/juicer/work/aligned for results

$ ls
abnormal.sam    inter.txt         merged_nodups.txt  stats_dups_hists.m
collisions.txt  inter_30.txt      merged_sort.txt    unmapped.sam
dups.txt        inter_30_hists.m  opt_dups.txt
header          inter_hists.m     stats_dups.txt

The first few lines of the merged_sort.txt and merge_nodups.txt are:
0 CM000663.2 40548 0 0 CM000663.2 233057894 0 0 100M AGCATTTCCCTGGCTACCCTTTTAAAAATTGCAACCCACTTCCATCCCCATCCCCAACATGCCATATTTCCTTTCTTCTTCTTCCTTCTTCCTTTTTTTT 60 100M TACCCACTTTCAGAACTTGAGAACAGTGTCCACAAAGGTATCTTTCTCTTCAAATTGATTTCAGGAAATGGTGGTGAGTAGTCAACGTCTCCACCACCTG M00336:181:000000000-A29H6:1:1109:11292:5204/2 M00336:181:000000000-A29H6:1:1109:11292:5204/1
0 CM000663.2 67760 0 0 CM000663.2 185174 0 0 91M9S CAAAGATGCCCCAACAATACCTCCTTGTGTCTAGACAGTCATCATTATCCTTTACCTTTTTCTGTATTTATTTCTGCTCCTAAAAGGGATCGATCTCTTC 0 100M CGGCCCAAGTCTGGGTCTGGCGGGGAAGGTGTCATGGAGCCCCCTAGGATTCCCAGTCGTCCTCGTCCTCCTCTGCCTGTGGCTGCTGCGGTGGCGGCAG M00336:181:000000000-A29H6:1:1102:15179:8867/2 M00336:181:000000000-A29H6:1:1102:15179:8867/1

C
heers
,

Diana Sernas

Olga Dudchenko

unread,
Jul 30, 2018, 11:27:33 AM7/30/18
to 3D Genomics
Hi Diana,

It seems suspicious that fragment number in the merged_nodups.txt snippet you are showing is all 0. Seems like something went wrong in adding fragment number. I would suggest checking the restriction_sites files and the stdout/stderr associated with that step.

Hope this helps,
Olga

Diana Sernas

unread,
Jul 31, 2018, 3:37:59 PM7/31/18
to 3D Genomics
Hi Olga,

Thank you so much for a timely and helpful response!

I've gone back to my chrom.sizes and restriction_sites files to make sure they are compatible, I generated a new chrom.sizes file with the script posted and remain with the same error. However, my merged files look slightly different. Could you elaborate on how to check the adding fragment number step? I'm not getting "***! No $name${ext}_norm.txt file created", am I focusing on the wrong step? 

Again, thank you for your time.

Both merged files begin like so:
0 CM000663.2 14617 11 16 CM000663.2 133610 298 0 100M GAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATTCCCAGTCGTCCTC 0 49S51M GGCCCTGCCTCCTACCCTTGCGCCTCATGACCAGCTTGTTGAAGAGATCGATCTTCAACTGCAGGTGAAACGGATGCTGGTGGTGGGTGCAGGGCCGCTG M00336:181:000000000-A29H6:1:2109:29310:15678/1 M00336:181:000000000-A29H6:1:2109:29310:15678/2
16 CM000663.2 18798 20 16 CM000663.2 197239 461 0 100M ATGGGGCAAGCACTTCACAACCCCTCATGATCACGTGCAGCAGACAATGTGGCCTCTGCAGAGGGGGAACGGAGACCGGAGGCTGAGACTGGCAAGGCTG 0 100M CGATCACAACAAAGACGAATAAGACACTACACTAGCCAGGGAGAGTCTCAAAAACAACTAAACTCAAATTAAATTCATTCTACTCCAGTCATGAGTACAA M00336:181:000000000-A29H6:1:1112:18559:5882/1 M00336:181:000000000-A29H6:1:1112:18559:5882/2

$ head chrom.sizes
CM000663.2 248956422
KI270706.1 175055
KI270707.1 32032
KI270708.1 127682
KI270709.1 66860
KI270710.1 40176
KI270711.1 42210
KI270712.1 176043
KI270713.1 40745
KI270714.1 41717

hg19_MboI.txt:
CM000663.2 11160 12411 ... 248956422
KI270706.1 399 456 ... 175055
KI270707.1 121 404 ... 32032


Best,
Diana

Olga Dudchenko

unread,
Aug 1, 2018, 3:00:35 AM8/1/18
to 3D Genomics
Hi Diana,

Now the restriction sites file seem ok, as the mnd file. Just to be clear, you are showing a subset of the chrom.sizes and hg19_MboI, right? (they shoudl have the same number of lines.) Re no _norm file created, are you by any chance trying to run this on a very-very small amount of data?

Best,
Olga

Diana Sernas

unread,
Aug 1, 2018, 11:26:26 AM8/1/18
to odudc...@icloud.com, 3d-ge...@googlegroups.com
Hi Olga,

 Correct, in the last message I showed some lines from my chrom.sizes and hg19_MboI (my restriction site file) and the actual files both have 595 lines. And I'm trying to run juicer with the test data HIC003 from the Juicer Wiki.

Best,
Diana


--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/0dfc2a70-fa6f-4411-8da4-de2436180a65%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

Best,

Diana Sernas

University of California, Santa Cruz
Pure Mathematics Major
Grant Hartzog Lab | NIH-IMSD Scholar

dse...@ucsc.edu
Don't Forget To Be Awesome

Olga Dudchenko

unread,
Aug 1, 2018, 11:43:34 AM8/1/18
to 3D Genomics
Hi Diana,

It seems I have actually misread your previous post and thought that it is now that you are now getting the _norm error. Seems it is the other way around, and now things are good. Can you write the full command you are running, head for all input files and the full stdout and stderr?

Best,
Olga

Diana Sernas

unread,
Aug 1, 2018, 1:01:25 PM8/1/18
to Ольга Дудченко, 3d-ge...@googlegroups.com
Hi Olga,

Good! I redirected my stdout to 'output.txt', and for clarification where it says 'MyPath' is the full path I'm using (through Cygwin). 

$ ./juicer.sh -d MyPath/juicer/work -p ../references/chrom.sizes -z ../references/Homo_sapiens_assembly19.fasta -D .. > output.txt
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 400000 sequences (40000000 bp)...
[M::process] read 400000 sequences (40000000 bp)...
 [M::mem_process_seqs] Processed 400000 reads in 610.594 CPU sec, 559.877 real sec
[M::process] read 400000 sequences (40000000 bp)...
[M::mem_process_seqs] Processed 400000 reads in 502.328 CPU sec, 507.298 real sec
[M::process] read 400000 sequences (40000000 bp)...
[M::mem_process_seqs] Processed 400000 reads in 598.251 CPU sec, 552.027 real sec
[M::process] read 400000 sequences (40000000 bp)...
[M::mem_process_seqs] Processed 400000 reads in 494.999 CPU sec, 501.826 real sec
[M::process] read 348578 sequences (34857800 bp)...
[M::mem_process_seqs] Processed 400000 reads in 497.797 CPU sec, 494.449 real sec
[M::mem_process_seqs] Processed 348578 reads in 588.422 CPU sec, 520.413 real sec
[main] Version: 0.7.17-r1194-dirty
[main] CMD: bwa mem -t 4 ../references/Homo_sapiens_assembly19.fasta MyPath/juicer/work/splits/HIC003_S2_L001_R1_001.fastq
[main] Real time: 3161.538 sec; CPU: 3304.889 sec
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 400000 sequences (40000000 bp)...
[M::process] read 400000 sequences (40000000 bp)...
[M::mem_process_seqs] Processed 400000 reads in 507.749 CPU sec, 529.338 real sec
[M::process] read 400000 sequences (40000000 bp)...
[M::mem_process_seqs] Processed 400000 reads in 498.344 CPU sec, 500.391 real sec
[M::process] read 400000 sequences (40000000 bp)...
[M::mem_process_seqs] Processed 400000 reads in 523.016 CPU sec, 518.313 real sec
[M::process] read 400000 sequences (40000000 bp)...
[M::mem_process_seqs] Processed 400000 reads in 615.968 CPU sec, 545.222 real sec
[M::process] read 348578 sequences (34857800 bp)...
[M::mem_process_seqs] Processed 400000 reads in 544.578 CPU sec, 554.591 real sec
[M::mem_process_seqs] Processed 348578 reads in 433.501 CPU sec, 439.972 real sec
[main] Version: 0.7.17-r1194-dirty
[main] CMD: bwa mem -t 4 ../references/Homo_sapiens_assembly19.fasta MyPath/juicer/work/splits/HIC003_S2_L001_R2_001.fastq
[main] Real time: 3114.526 sec; CPU: 3135.671 sec
Picked up _JAVA_OPTIONS: -Xmx16384m
MyPath/juicer/work/aligned/merged_nodups.txt does not exist or does not contain any reads.
Picked up _JAVA_OPTIONS: -Xmx16384m
Picked up _JAVA_OPTIONS: -Xmx16384m
MyPath/juicer/work/aligned/merged_nodups.txt does not exist or does not contain any reads.
Picked up _JAVA_OPTIONS: -Xmx16384m

 Output.txt:
(-: Looking for fastq files...fastq files exist
(-: Aligning files matching MyPath/juicer/work/fastq/*_R*.fastq*
 to genome hg19 with site file ../restriction_sites/hg19_MboI.txt
(-: Created MyPath/juicer/work/splits and MyPath/juicer/work/aligned.
Running command bwa mem -t 4 ../references/Homo_sapiens_assembly19.fasta MyPath/juicer/work/splits/HIC003_S2_L001_R1_001.fastq > MyPath/juicer/work/splits/HIC003_S2_L001_R1_001.fastq.sam
(-:  Align of MyPath/juicer/work/splits/HIC003_S2_L001_R1_001.fastq.sam done successfully
Running command bwa mem -t 4 ../references/Homo_sapiens_assembly19.fasta MyPath/juicer/work/splits/HIC003_S2_L001_R2_001.fastq > MyPath/juicer/work/splits/HIC003_S2_L001_R2_001.fastq.sam
(-: Mem align of MyPath/juicer/work/splits/HIC003_S2_L001_R2_001.fastq.sam done successfully
(-: Sort read 1 aligned file by readname completed.
(-: Sort read 2 aligned file by readname completed.
(-: MyPath/juicer/work/splits/HIC003_S2_L001_001.fastq.sam created successfully.
(-: Finished sorting all sorted files into a single merge.
***! Can't find inter.hic in MyPath/juicer/work/aligned/inter_30.hic
***! Error! Either inter.hic or inter_30.hic were not created
Either inter.hic or inter_30.hic were not created.  Check MyPath/juicer/work/aligned for results

$ head juicer.sh
#!/bin/bash
##########
#The MIT License (MIT)
...
shopt -s extglob
export LC_ALL=C

juicer_version="1.8.9"
#Note: switched the version from 1.5.6 to 1.8.9

$ head chrom.sizes
CM000663.2 248956422
KI270706.1 175055
KI270707.1 32032
KI270708.1 127682
KI270709.1 66860
KI270710.1 40176
KI270711.1 42210
KI270712.1 176043
KI270713.1 40745
KI270714.1 41717

$ vi Homo_sapiens_assembly19.fasta
>CM000663.2 Homo sapiens chromosome 1, GRCh38 reference primary assembly
...(Several "N")...
taaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaacccta
accctaaccctaaccctaaccctaacccaaccctaaccctaaccctaaccctaaccctaaccctaacccctaaccctaac
cctaaccctaaccctaacctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaacccctaaccc
taaccctaaaccctaaaccctaaccctaaccctaaccctaaccctaaccccaaccccaaccccaaccccaaccccaaccc
caaccctaacccctaaccctaaccctaaccctaccctaaccctaaccctaaccctaaccctaaccctaacccctaacccc
taaccctaaccctaaccctaaccctaaccctaaccctaacccctaaccctaaccctaaccctaaccctcgCGGTACCCTC
AGCCGGCCCGCCCGCCCGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAGAGTACCACCGAAATCTGTGCAGAGGAc



Much thanks,
Diana

Olga Dudchenko

unread,
Aug 1, 2018, 1:38:02 PM8/1/18
to 3D Genomics
Hi Diana,

I am still a bit confused about the input. How are you passing the restriction sites file right now, the one you have shared in the previous email? Are you editing the script directly? Or perhaps there is a -y flag input missing for the restriction sites file? I suspect this must be confusion between the hg19 defaults and your particular input.

You either want to run simply juicer.sh
(which is equivalent o running with -g hg19 -s MboI -z <juicerDir>/references/Homo_sapiens_assembly19.fasta -y <juicerDir>/restriction_sites/hg19_MboI.txt -p hg19, with sequences names 1, 2, 3 etc.)

or you run with a custom version of the hg19 reference (with CM000663.2) which seems what you are trying to do if I interpret your input right. In this case you want to run something like

juicer.sh -g myhg19 -s MboI -z <path to fasta file which starts with CM000663.2> -y <path to restriction sites file that starts with CM000663.2> -p <path to your chrom.sizes file that start with CM000663.2>

Hope this helps,
Olga

Diana Sernas

unread,
Aug 3, 2018, 8:18:31 PM8/3/18
to Ольга Дудченко, 3d-ge...@googlegroups.com
Hi Olga,

Inputting a -y flag to the restriction_sites file resulted in the same error. 
I'm embarrassed to admit I noticed my reference genome was hg38 instead of hg19. So I've run juicer with the correct hg19 indexed by bwa, appropriate chrom.sizes and MboI files, and an unaltered juicer.sh. However, I end up with the same error message. 
On the last run my input was:
Experiment description: Juicer version 1.5.6; BWA 0.7.17-r1194-dirty; 4 threads; java ; ./juicer.sh -d MyPath/juicer/work -g myHg19 -s MboI -z ../references/Homo_sapiens_assembly19.fasta -y ../restriction_sites/hg19_MboI.txt -p ../references/chrom.sizes -D ..


Olga Dudchenko

unread,
Aug 3, 2018, 9:13:37 PM8/3/18
to 3D Genomics
Hi Diana,

Have you tried running just

./juicer.sh

, no other input? It will default to internally consistent options.

Thanks,
Olga

Diana Sernas

unread,
Aug 6, 2018, 4:19:23 PM8/6/18
to Ольга Дудченко, 3d-ge...@googlegroups.com
Hi Olga,

I had to change the juicer directory ( to "MyPath/juicer") in the script  for it to run without any other inputs. To which Juicer gave me the following error.

(-: Mem align of MyPath/juicer/work/splits/HIC003_S2_L001_R2_001.fastq.gz.sam done successfully

(-: Sort read 1 aligned file by readname completed.

(-: Sort read 2 aligned file by readname completed.

(-: MyPath/juicer/work/splits/HIC003_S2_L001_001.fastq.gz.sam created successfully.

(-: Finished sorting all sorted files into a single merge.

Error: Could not find or load main class LibraryComplexity

Picked up _JAVA_OPTIONS: -Xmx16384m

Error: Unable to access jarfile MyPath/juicer/scripts/common/juicer_tools.jar

Error: Could not find or load main class LibraryComplexity

Picked up _JAVA_OPTIONS: -Xmx16384m

Error: Unable to access jarfile MyPath/juicer/scripts/common/juicer_tools.jar

***! Can't find inter.hic in MyPath/juicer/work/aligned/inter_30.hic

***! Error! Either inter.hic or inter_30.hic were not created

Either inter.hic or inter_30.hic were not created.  Check MyPath/juicer/work/aligned for results


While my /common directory :

$ ls -l

total 31384

-rwxr-xr-x+ 1 dsern dsern     3519 Jul 24 14:52 check.sh

-rwxr-xr-x+ 1 dsern dsern    15361 Jul 24 14:53 chimeric_blacklist.awk

-rwxr-xr-x+ 1 dsern dsern     1971 Jul 24 14:53 cleanup.sh

-rwxr-xr-x+ 1 dsern dsern     3584 Jul 24 14:54 collisions.awk

-rwxr-xr-x+ 1 dsern dsern     1616 Jul 24 14:54 countligations.sh

-rwxr-xr-x+ 1 dsern dsern    13448 Jul 24 14:55 diploid.pl

-rwxr-xr-x+ 1 dsern dsern     1201 Jul 24 14:55 diploid.sh

-rwxr-xr-x+ 1 dsern dsern     2449 Jul 24 14:56 diploid_split.awk

-rwxr-xr-x+ 1 dsern dsern     5325 Jul 24 14:56 dups.awk

-rwxr-xr-x+ 1 dsern dsern     3551 Jul 24 14:56 fragment.pl

-rwxr-xr-x+ 1 dsern dsern     3726 Jul 24 14:57 fragment_4dnpairs.pl

-rwxr-xr-x+ 1 dsern dsern     3667 Jul 24 14:57 juicer_postprocessing.sh

-rwxr-xr-x+ 1 dsern dsern     2017 Jul 24 14:58 juicer_tools

-rwx--x--x+ 1 dsern dsern 32002848 Jul 25 13:50 juicer_tools.jar

-rwxrwxr-x+ 1 dsern dsern     4526 Jul 24 14:51 LibraryComplexity.class

-rwxr-xr-x+ 1 dsern dsern     6926 Jul 24 14:52 LibraryComplexity.java

-rwxr-xr-x+ 1 dsern dsern     7681 Jul 24 14:58 mega.sh

-rwxr-xr-x+ 1 dsern dsern     2276 Jul 24 14:59 relaunch_prep.sh

-rwxr-xr-x+ 1 dsern dsern    14506 Jul 24 14:59 statistics.pl

-rwxr-xr-x+ 1 dsern dsern     1751 Jul 24 14:59 stats_sub.awk


How would you suggest I change my permissions? Or do you suspect something else may be the issue?

Best,
Diana

Diana Sernas

unread,
Aug 8, 2018, 1:12:25 PM8/8/18
to Ольга Дудченко, 3d-ge...@googlegroups.com
Hi all,

The errors before were related to me using Cygwin64 Terminal on Windows. Giving the "windows syntax" to the script let it run.

Lines calling LibraryComplexity start off like so to address the "Could not find or load main class LibraryComplexity":
java -cp `cygpath -w "${juiceDir}/scripts/common/"` LibraryComplexity `cygpath -w $outputdir`

While translating the path to juicer_tools instead of permissions. I'm showing one of the five lines that direct to juicer_tools:
`realpath --relative-to=$(pwd) "/cygdrive/c/Users/dsern/Desktop/juiceraw/opt/juicer/scripts/common/juicer_tools"` pre -s `cygpath -w "$outputdir/inter.txt"` -g `cygpath -w "$outputdir/inter_hists.m"` -q 1 `cygpath -w "$outputdir/merged_nodups.txt"` `cygpath -w "$outputdir/inter.hic"` `cygpath -w $genomePath`

Now I'm getting:
Problem with creating fragment-delimited maps, NullPointerException.
This could be due to a null fragment map or to a mismatch in the chromosome name in the fragment map vis-a-vis the input file or chrom.sizes file.
Exiting.

I consider this thread completed. Thank you Olga for your timeley responses!

Best,
Diana

Olga Dudchenko

unread,
Aug 8, 2018, 2:28:11 PM8/8/18
to 3D Genomics
Diana,

Glad to hear you've figured this out!

Best,
Olga
Reply all
Reply to author
Forward
0 new messages