Input String Error

595 views
Skip to first unread message

Chirag Krishna

unread,
Jul 5, 2017, 9:23:10 AM7/5/17
to 3D Genomics
Hello,

I'm trying to run hiccups and keep running into this same error: 

java.lang.NumberFormatException: For input string: "X"

        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

        at java.lang.Integer.parseInt(Integer.java:492)

        at java.lang.Integer.parseInt(Integer.java:527)

        at juicebox.tools.utils.original.AsciiPairIterator.advance(AsciiPairIterator.java:140)

        at juicebox.tools.utils.original.AsciiPairIterator.next(AsciiPairIterator.java:194)

        at juicebox.tools.utils.original.Preprocessor.computeWholeGenomeMatrix(Preprocessor.java:493)

        at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:371)

        at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:283)

        at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:108)

        at juicebox.tools.HiCTools.main(HiCTools.java:86)

Could not read hic file: null


The command I tried to run was:  java -jar ~/Programs/juicer_tools_linux_0.8.jar hiccups -m 100 -c 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y -r 100000 /home/krc3004/input_2_avp_mod_awk_3.hic /home/krc3004

It seems that the error is with lines containing chromosome X.  Here is a small subset of the file used to create input_2_avp_mod_awk_3.hic,using pre (which ran with no errors) Any tips would be much appreciated.  Thank you!

1 0 2 0 1 0 2 1

1 0 10 0 1 0 10 1

1 0 3 0 1 0 3 1

1 0 10 0 1 0 10 1

1 0 5 0 1 0 17 1

1 0 5 0 1 0 5 1

1 0 4 0 1 0 4 1

1 0 X 0 1 0 X 1

Neva Durand

unread,
Jul 5, 2017, 9:26:59 AM7/5/17
to Chirag Krishna, 3D Genomics

Hello,

Did you try and look at your hic file in Juicebox? If you’re using the 8 field format, it should be

str1 chr1 pos1 frag1 str2 chr2 pos2 frag2

See https://github.com/theaidenlab/juicer/wiki/Pre#short-format

I’m actually very surprised that this didn’t cause an error for Pre.

Best
Neva


--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/7017a39f-3163-40a7-bf00-7c465f65b6ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

Chirag Krishna

unread,
Jul 5, 2017, 9:51:15 AM7/5/17
to 3D Genomics, chirk...@gmail.com
Hi Neva,

Thanks very much for your reply.  You are right, there is something wrong with the input.  Here is what I tried to use (small subset):

J00118:127:H5H7FBBXX:5:1211:11282:31312 1 13065 - 7 39768357 + 457

J00118:127:H5H7FBBXX:5:2218:18944:16207 1 13075 - 3 196831554 + 370

J00118:127:H5H7FBBXX:5:1106:16691:11319 1 13472 + 9 115509326 + 180

J00118:127:H5H7FBBXX:5:1209:11891:3354 1 13478 + 4 220011 - 220

J00118:127:H5H7FBBXX:5:2207:14620:6255 1 13485 - 12 574609 - 167

J00118:127:H5H7FBBXX:6:1207:21968:15645 1 13485 + 15 55848167 - 222


I then tried to use the hicpro2juicebox script from HiCPro, like so: sh ~/Programs/HiC-Pro-install/HiC-Pro_2.8.1_devel/bin/utils/hicpro2juicebox.sh -i avp_mod -g /data/leslie/krc3004/hichip/hg19.sizes -j ~/Programs/juicer_tools_linux_0.8.jar -r /home/krc3004/enz_mod.bed, where enz_mod.bed looks like this:


1 0 11159 HIC_1_1 0 +

1 11159 12410 HIC_1_2 0 +

1 12410 12460 HIC_1_3 0 +

1 12460 12685 HIC_1_4 0 +


I also ensured that hg19.sizes has the same chromosome names as the others.  However, I obtain the following error when running the script:

Generating Juicebox input files ...

Running Juicebox ...

Start preprocess

Writing header

Writing body

java.io.IOException: Unexpected column count.  Only 11 or 16 columns supported.  Check file format

at juicebox.tools.utils.original.AsciiPairIterator.advance(AsciiPairIterator.java:108)

at juicebox.tools.utils.original.AsciiPairIterator.<init>(AsciiPairIterator.java:70)

at juicebox.tools.utils.original.Preprocessor.computeWholeGenomeMatrix(Preprocessor.java:487)

at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:371)

at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:283)

at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:108)

at juicebox.tools.HiCTools.main(HiCTools.java:86)

java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment.

at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.mergeAndWriteBlocks(Preprocessor.java:1466)

at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.access$000(Preprocessor.java:1237)

at juicebox.tools.utils.original.Preprocessor.writeMatrix(Preprocessor.java:651)

at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:373)

at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:283)

at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:108)

at juicebox.tools.HiCTools.main(HiCTools.java:86)

done 


I'm not sure what it means by 11 or 16 column format.  There is a similar thread here: https://github.com/nservant/HiC-Pro/issues/49...not sure if I've misunderstood the awk commands in Nicolas' script.  Thanks very much again for your help!

Best,
Chirag
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.

Neva Durand

unread,
Jul 5, 2017, 9:55:36 AM7/5/17
to Chirag Krishna, 3D Genomics
Hello,

The input file formats are detailed on the page I sent you ( https://github.com/theaidenlab/juicer/wiki/Pre ).  Could you explain the columns of your input file?  The most common formats are 11 or 16 columns but we do have others as well.

Best
Neva

To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/676cc369-bab5-4f11-8cdd-43d4f561e537%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Muhammad Saad Shamim

unread,
Jul 5, 2017, 1:17:00 PM7/5/17
to Neva Durand, Chirag Krishna, 3D Genomics
Quick comment - there's no reason to be using the -c chromosome flag if you want to run HiCCUPS on all chromosomes. It'll do that by default.
The only time you should be passing in chromosomes for the -c flag is if you only want to run on a small number of chromosomes (e.g. -c 1,2).

Chirag Krishna

unread,
Jul 5, 2017, 2:00:04 PM7/5/17
to 3D Genomics, ne...@broadinstitute.org, chirk...@gmail.com
Hi Neva, Muhammad, 

Thank you both for your suggestions. The input file I am using is the all valid pairs from HiC Pro.  In particular, the format is:

read name / chr_reads1 / pos_reads1 / strand_reads1 / chr_reads2 / pos_reads2 / strand_reads2 / fragment_size (taken from http://nservant.github.io/HiC-Pro/MANUAL.html#run-hic-pro-in-sequential-mode)

I checked the link provided by Neva, and it looks like the closest analog is the "short" format, but I'm not sure what frag1 and frag2 should be.  Thanks again for your help!

Best,
Chirag



--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.

Neva Durand

unread,
Jul 5, 2017, 2:08:07 PM7/5/17
to Chirag Krishna, 3D Genomics
Hello Chirag,

I would suggest putting the reads into a six column format (str1 chr1 pos1 str2 chr2 pos2) and then running fragment.plhttps://github.com/theaidenlab/juicer/blob/master/CPU/common/fragment.pl

This will return an eight column format.  

You can create the restriction site file with this script: https://github.com/theaidenlab/juicer/blob/master/misc/generate_site_positions.py

Or, if you're using one of the genome/restriction site combinations in this directory, you can just download the file:

If on the other hand, you don't care about reads that map to the same restriction fragment (we throw these out and it does affect things like normalization and loop calling), you can put dummy values in for the fragment value.  You would put in "0" for the first fragment and "1" for the second, e.g.

Best
Neva

Chirag Krishna

unread,
Jul 5, 2017, 2:14:34 PM7/5/17
to 3D Genomics, chirk...@gmail.com
Hi Neva,

Thank you very much!  I will try this and report back.  Currently I have my own restriction site file (enz_mod.bed, in my commands above), so I'll see if that works.  Many thanks again for your help.

Best,
Chirag

Neva Durand

unread,
Jul 5, 2017, 2:16:07 PM7/5/17
to Chirag Krishna, 3D Genomics
Hello Chirag,

You will need to convert your bed file to our format: https://github.com/theaidenlab/juicer/wiki/Pre#restriction-site-file-format

Best
Neva

To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/916d32a4-0fd7-4f60-9121-2ae41fabac8c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages