problems with command-line juicer

311 views
Skip to first unread message

Susan Chacko

unread,
Sep 9, 2016, 8:31:16 AM9/9/16
to 3D Genomics
Hi,
I just installed Juicer 0.9 on our cluster, and am having trouble getting a test job to run.
The split, count_ligation, align, merge and fragmerge steps seem to run fine (nothing unusual in the .err files).
The dedup step is the first one that gives an errror:
----
cat: /data/susanc/juicer/aligned/a1473170723_msplit*_optdups.txt: No such file or directory
----

However, after that the msplit and stats steps seem to run ok (no errors in the .err files).
Then the hic30 step exits with the error
----
java.lang.NullPointerException
           at juicebox.tools.utils.original.ExpectedValueCalculation.<init>(ExpectedValueCalculation.java:109)
           at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:201)
           at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:94)
           at juicebox.tools.HiCTools.main(HiCTools.java:79)
-----

Any ideas about what might be the problem?

Thanks,
Susan.

Neva Durand

unread,
Sep 9, 2016, 8:41:07 AM9/9/16
to Susan Chacko, 3D Genomics
Hello Susan,

The opt dups error just means no optical duplicates were found, which might happen depending on your sequencer (optical dups just looks at read name). It's not a big deal - it just means the library complexity estimate will be off and all the duplicates will be in the dups.txt file. 

The second error looks like a restriction site file issue to me. What does your restriction site file look like?  What genome did you align to?  The chromosome names have to match. 

Also note that the latest version of Juicer is 1.5 - you can download it from github. 

Best
Neva
--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/58143336-d05e-4082-a828-1ab092f5f835%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

Susan Chacko

unread,
Sep 10, 2016, 11:07:12 AM9/10/16
to Neva Durand, 3D Genomics

Thanks, Neva. I'm the process of downloading Juicer 1.5 and making the modifications to run on our cluster. I'll also check on the restriction file. Will let you know how it goes. 

Susan

Neva Durand

unread,
Sep 13, 2016, 12:55:21 AM9/13/16
to Susan C, 3D Genomics
Hi Susan,

That error also happens when all the reads maps to the same fragment.  Would you mind posting a few lines of output?

Yes, we agree that our documentation needs a lot of work - thank you for the feedback.  I think a test set with all those components would indeed be very useful.  We do have one for the AWS Juicer, but it's hard to find - buried in those notes - and hard to explore.  I will change this over to Box so it's easier, but in the meantime, here's a thread about the AWS links:  https://groups.google.com/forum/#!topic/3d-genomics/2EpC2Helcxg

Best
Neva

On Mon, Sep 12, 2016 at 11:33 PM, Susan C <susan....@gmail.com> wrote:

I made some progress with Juicer 1.5: the juicer.sh job now runs all the way to the hic30 job without errors.
However, the hic30 job gives the error:
java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment.
----

I set the mapq filter to '-q 20' (decreased from -q 30) but got the same result. What is an appropriate value?

After that, the hiccups-wrap step fails with:
***! Can't find folder /usr/local/apps/juicer/juicer-1.5/SLURM/references/motif

What is the motif file, and where do I get it from?

Is there a sample dataset available for Juicer 1.5 that I could use for testing? It would be really useful to have a downloadable sample that contained a fastq file or two, hg19, the chrom.sizes and a restriction site file and motif file, all of which were consistent with what Juicer expects. It would help us to figure out what's wrong with our own setup.

Susan

On Sep 9, 2016, at 8:41 AM, Neva Durand <ne...@broadinstitute.org> wrote:

Susan C

unread,
Sep 13, 2016, 4:34:30 AM9/13/16
to Neva Durand, 3D Genomics

I made some progress with Juicer 1.5: the juicer.sh job now runs all the way to the hic30 job without errors.
However, the hic30 job gives the error:
java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment.
----

I set the mapq filter to '-q 20' (decreased from -q 30) but got the same result. What is an appropriate value?

After that, the hiccups-wrap step fails with:
***! Can't find folder /usr/local/apps/juicer/juicer-1.5/SLURM/references/motif

What is the motif file, and where do I get it from?

Is there a sample dataset available for Juicer 1.5 that I could use for testing? It would be really useful to have a downloadable sample that contained a fastq file or two, hg19, the chrom.sizes and a restriction site file and motif file, all of which were consistent with what Juicer expects. It would help us to figure out what's wrong with our own setup.

Susan
On Sep 9, 2016, at 8:41 AM, Neva Durand <ne...@broadinstitute.org> wrote:

Muhammad Shamim

unread,
Sep 13, 2016, 3:34:00 PM9/13/16
to 3D Genomics
Hi Susan,

Here's an updated url that shows the setup on the AWS version of juicer and includes some sample datasets: https://bcm.box.com/v/juicerawsmirror
These links are also on the AWS S3 mirror (described under the "Installation/Directory Structure" tab here), but Box allows for better exploration of the file structure.

The motif folder should contain CTCF(/RAD21/SMC3) ChIP-Seq tracks (see example in the Box link).
A more detailed description of MotifFinder can be found under the "Finding DNA Motifs for Loops (MotifFinder)" tab here.
Note that MotifFinder can also be run at a later time as long as the loop list is generated by HiCCUPS.

But it seems from the printout that the inter_30.hic file is not being created.
Can you confirm this?

Best,

On Tuesday, September 13, 2016 at 3:34:30 AM UTC-5, Susan Chacko wrote:

I made some progress with Juicer 1.5: the juicer.sh job now runs all the way to the hic30 job without errors.
However, the hic30 job gives the error:
java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment.
----

I set the mapq filter to '-q 20' (decreased from -q 30) but got the same result. What is an appropriate value?

After that, the hiccups-wrap step fails with:
***! Can't find folder /usr/local/apps/juicer/juicer-1.5/SLURM/references/motif

What is the motif file, and where do I get it from?

Is there a sample dataset available for Juicer 1.5 that I could use for testing? It would be really useful to have a downloadable sample that contained a fastq file or two, hg19, the chrom.sizes and a restriction site file and motif file, all of which were consistent with what Juicer expects. It would help us to figure out what's wrong with our own setup.

Susan
On Sep 9, 2016, at 8:41 AM, Neva Durand wrote:

Hello Susan,

The opt dups error just means no optical duplicates were found, which might happen depending on your sequencer (optical dups just looks at read name). It's not a big deal - it just means the library complexity estimate will be off and all the duplicates will be in the dups.txt file. 

The second error looks like a restriction site file issue to me. What does your restriction site file look like?  What genome did you align to?  The chromosome names have to match. 

Also note that the latest version of Juicer is 1.5 - you can download it from github. 

Best
Neva

Susan Chacko

unread,
Sep 13, 2016, 3:37:45 PM9/13/16
to 3d-ge...@googlegroups.com


That's correct, the *.hic files are not being generated. I'm not sure where to look for the problem, so any ideas are much appreciated.

Susan.
You received this message because you are subscribed to a topic in the Google Groups "3D Genomics" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/3d-genomics/DCyyfHsGx94/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/fa7538c8-628e-4b12-8570-bc7c10176e7a%40googlegroups.com.

Muhammad Shamim

unread,
Sep 13, 2016, 3:54:42 PM9/13/16
to 3D Genomics
What does the merged_nodups.txt file show? (can you print the first 10 lines here)
To unsubscribe from this group and all its topics, send an email to 3d-genomics+unsubscribe@googlegroups.com.

Susan Chacko

unread,
Sep 14, 2016, 9:32:17 AM9/14/16
to 3d-ge...@googlegroups.com

Hi Mohammed,

I was communicating offline with Neva Durand, and the problem turned out to be that the chrom.sizes file was space-separated, not tab. I had generated it according to an earlier post on this forum.

Once I tab-separated the chrom.sizes file, my juicer.sh pipeline ran to the end.

Thanks!

Susan.
To unsubscribe from this group and all its topics, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/4a346f5a-a38a-4b44-9d8f-1085bd2e1340%40googlegroups.com.

Neva Durand

unread,
Sep 14, 2016, 9:34:08 AM9/14/16
to Susan Chacko, 3D Genomics
Thanks for updating us!

Since this has now come up a few times, I think we need to support both tab and space delimited.  We'll push that change soon.

Best
Neva

To unsubscribe from this group and all its topics, send an email to 3d-genomics+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages