merged_nodups.txt as input to juicebox

911 views
Skip to first unread message

tyler borrman

unread,
Sep 10, 2016, 7:30:30 PM9/10/16
to 3D Genomics
Hi,

I've been troubleshooting getting juicer to run on our LSF cluster using the example HIC003 data. 
I get the following error when trying to create the inter.hic file running:


/juicer/scripts/juicebox pre -s /juicer/work/HIC003/aligned/inter.txt -g /juicer/work/HIC003/aligned/inter_hists.m 
-q 1 /juicer/work/HIC003/aligned/merged_nodups.txt /juicer/work/HIC003/aligned/inter.hic /juicer/references/chrom.sizes;

Error:

Picked up _JAVA_OPTIONS: -Xmx16384m
Skipping chr1 249250621
Skipping chr2 243199373
Skipping chr3 198022430
Skipping chr4 191154276
Skipping chr5 180915260
Skipping chr6 171115067
Skipping chr7 159138663
Skipping chr8 146364022
Skipping chr9 141213431
Skipping chr10 135534747
Skipping chr11 135006516
Skipping chr12 133851895
Skipping chr13 115169878
Skipping chr14 107349540
Skipping chr15 102531392
Skipping chr16 90354753
Skipping chr17 81195210
Skipping chr18 78077248
Skipping chr19 59128983
Skipping chr20 63025520
Skipping chr21 48129895
Skipping chr22 51304566
Skipping chrX 155270560
Skipping chrY 59373566
Skipping chrM 16572
Not including fragment map
Start preprocess
Writing header
Writing body
java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment.
        at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.mergeAndWriteBlocks(Preprocessor.java:1397)
        at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.access$000(Preprocessor.java:1168)
        at juicebox.tools.utils.original.Preprocessor.writeMatrix(Preprocessor.java:582)
        at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:313)
        at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:223)
        at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:98)
        at juicebox.tools.HiCTools.main(HiCTools.java:77)
 
I noticed my merged_nodups.txt does not have the 11 column format described in the doc for juicebox pre <infile>

Example line of merged_nodups.txt:

0 chr1 10005 0 0 chr1 249240258 64394 0 100M CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA 0 46M1I21M32S GGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGGTTAGGGGTAGGGGTAGGGGGTGGGGTAGGGATAGGGATAGGGGTAGGGGT M00336:181:000000000-A29H6:1:1110:11312:16525/1 M00336:181:000000000-A29H6:1:1110:11312:16525/2 

Is that close to the problem?

Thanks,
Tyler

Neva Durand

unread,
Sep 11, 2016, 5:53:44 AM9/11/16
to tyler borrman, 3D Genomics
Hello Tyler, 

From the error it looks like you've somehow transposed your chrom.sizes file and merged_nodups.txt file.  That is, it's taking as input the chrom.sizes file.  

The merged_nodups.txt file looks fine (it's the long format).  Could you double check your files?  Perhaps try with mm9 or mm10 instead of the chrom.sizes file (I assume you aligned to human so this would possibly cause other errors but would help troubleshoot).


--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/5099f31a-71e3-4438-9421-26802a79c6f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab
Message has been deleted
Message has been deleted

tyler borrman

unread,
Sep 12, 2016, 4:00:48 PM9/12/16
to 3D Genomics

Here are the errors for running juicebox pre on merged_nodups.txt without chrom.sizes file

../../scripts/juicebox pre aligned/merged_nodups.txt aligned/inter.hic hg19
Not including fragment map
Start preprocess
Writing header
Writing body
java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment.
        at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.mergeAndWriteBlocks(Preprocessor.java:1397)
        at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.access$000(Preprocessor.java:1168)
        at juicebox.tools.utils.original.Preprocessor.writeMatrix(Preprocessor.java:582)
        at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:313)
        at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:223)
        at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:98)
        at juicebox.tools.HiCTools.main(HiCTools.java:77)

../../scripts/juicebox pre aligned/merged_nodups.txt aligned/inter.hic mm9
Not including fragment map
Start preprocess
Writing header
Writing body
.java.lang.ArrayIndexOutOfBoundsException: 99
        at juicebox.tools.utils.original.ExpectedValueCalculation.addDistance(ExpectedValueCalculation.java:184)
        at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.incrementCount(Preprocessor.java:1296)
        at juicebox.tools.utils.original.Preprocessor$MatrixPP.incrementCount(Preprocessor.java:1138)
        at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:382)
        at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:223)
        at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:98)
        at juicebox.tools.HiCTools.main(HiCTools.java:77)

I don't have the mouse genome in my /references folder. Does juicebox pre require the genome fasta and bwa index files to be in the /references folder? Also, to use the "hg19" genome ID in the juicebox pre command, do the fasta files in the /references folder need a specific name?

-Tyler

Neva Durand

unread,
Sep 12, 2016, 4:34:37 PM9/12/16
to tyler borrman, 3D Genomics
Hi Tyler,

The reason hg19 didn't work is because the chromosomes are named "1", "2", "3" instead of "chr1", "chr2", "chr3".

You could use a sed command to replace all "chrN" with "N", something like 

sed s/chr//g <merged_nodups.txt >new_merged_nodups.txt

The references won't matter at the juicebox pre stage, the Juicebox command line tools don't look at the reference.

Is your chrom.sizes file tab delimited?

Best
Neva

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

tyler borrman

unread,
Sep 12, 2016, 8:35:56 PM9/12/16
to 3D Genomics
Ah! Making chrom.sizes tab-delimited fixed it.

Thanks Neva!


On Saturday, September 10, 2016 at 7:30:30 PM UTC-4, tyler borrman wrote:
Reply all
Reply to author
Forward
0 new messages