Juicebox - command line tools error

415 views
Skip to first unread message

Ankita Nand

unread,
Aug 29, 2016, 3:24:38 PM8/29/16
to 3D Genomics
Hi,


I was trying to make a .hic file from my contact file putting it the format :

 

<readname> <str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2>

 

But it gives me error:

 

Warning: Unable to process fragment file. Pre will continue without fragment file.

Start preprocess

Writing header

Writing body

java.lang.NumberFormatException: For input string: "HWIST560:195:C7601ACXX:5:1213:5161:48978"

     at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

     at java.lang.Integer.parseInt(Integer.java:580)

     at java.lang.Integer.parseInt(Integer.java:615)

     at juicebox.tools.utils.original.AsciiPairIterator.advance(AsciiPairIterator.java:151)

     at juicebox.tools.utils.original.AsciiPairIterator.next(AsciiPairIterator.java:194)

     at juicebox.tools.utils.original.Preprocessor.computeWholeGenomeMatrix(Preprocessor.java:493)

     at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:371)

     at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:283)

     at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:106)

     at juicebox.tools.HiCTools.main(HiCTools.java:83)

 

 

My restriction fragment file looks like:

chr1    1       16009   HiC_AAGCTT_0|hg19|chr1:1-16009  0

chr1    16010   24573   HiC_AAGCTT_1|hg19|chr1:16010-24573      1

chr1    24574   27983   HiC_AAGCTT_2|hg19|chr1:24574-27983      2

chr1    27984   30431   HiC_AAGCTT_3|hg19|chr1:27984-30431      3

chr1    30432   32155   HiC_AAGCTT_4|hg19|chr1:30432-32155      4

chr1    32156   32776   HiC_AAGCTT_5|hg19|chr1:32156-32776      5

chr1    32777   37754   HiC_AAGCTT_6|hg19|chr1:32777-37754      6

chr1    37755   38371   HiC_AAGCTT_7|hg19|chr1:37755-38371      7

chr1    38372   38793   HiC_AAGCTT_8|hg19|chr1:38372-38793      8

chr1    38794   39257   HiC_AAGCTT_9|hg19|chr1:38794-39257      9

chr1    39258   43604   HiC_AAGCTT_10|hg19|chr1:39258-43604     10

chr1    43605   46457   HiC_AAGCTT_11|hg19|chr1:43605-46457     11

chr1    46458   52421   HiC_AAGCTT_12|hg19|chr1:46458-52421     12

chr1    52422   56817   HiC_AAGCTT_13|hg19|chr1:52422-56817     13

chr1    56818   58749   HiC_AAGCTT_14|hg19|chr1:56818-58749     14

chr1    58750   58956   HiC_AAGCTT_15|hg19|chr1:58750-58956     15

chr1    58957   59358   HiC_AAGCTT_16|hg19|chr1:58957-59358     16

chr1    59359   75367   HiC_AAGCTT_17|hg19|chr1:59359-75367     17

chr1    75368   83361   HiC_AAGCTT_18|hg19|chr1:75368-83361     18

chr1    83362   84051   HiC_AAGCTT_19|hg19|chr1:83362-84051     19

…….

 

 

My input file looks like:

HWIST560:195:C7601ACXX:5:2314:15009:20817 0 1 14928 0 16 1 20598066 4368

HWIST560:195:C7601ACXX:5:1103:13213:53983 0 1 14923 0 16 1 86591161 22752

HWIST560:195:C7601ACXX:5:2105:6950:49014 0 1 14918 0 16 1 93350627 24949

HWIST560:195:C7601ACXX:5:1213:5161:48978 16 1 10020 0 16 4 35317199 205728

HWIST560:195:C7601ACXX:5:2216:20926:71156 0 1 14924 0 0 6 90505063 335825

HWIST560:195:C7601ACXX:5:1209:18569:84151 0 1 14915 0 16 9 20383153 455953

HWIST560:195:C7601ACXX:5:1306:11391:18662 0 1 14913 0 0 9 122656661 480262

HWIST560:195:C7601ACXX:5:1315:8591:2634 0 1 14920 0 16 12 118930944 597013

HWIST560:195:C7601ACXX:5:1216:11807:3137 16 1 13348 0 16 13 99950685 625683

HWIST560:195:C7601ACXX:5:1203:15364:97987 0 1 14926 0 16 14 90757852 651681

HWIST560:195:C7601ACXX:5:1311:12711:49651 0 1 14930 0 16 17 21034332 703402

HWIST560:195:C7601ACXX:5:2203:13092:31583 16 1 13355 0 16 17 69911952 715220

HWIST560:195:C7601ACXX:5:2209:1128:93894 16 1 16118 1 16 9 98095139 472952

HWIST560:195:C7601ACXX:5:2312:5063:61956 16 1 30893 4 16 3 79285981 159525

HWIST560:195:C7601ACXX:5:2205:19681:3867 0 1 52175 12 16 9 13980937 453957

HWIST560:195:C7601ACXX:5:1209:17582:7851 0 1 56797 13 16 1 92471231 24697

HWIST560:195:C7601ACXX:5:1215:5268:12207 0 1 56787 13 0 4 156436745 243450

….

 

I tried with command:

 

java -Xms512m -Xmx2048m -jar juicebox_tools.7.5.jar pre -f hg19__AAGCTT.txt hic_HBCRACKHiC-K562-DN-R1__hg19.validpair.txt.gz hic_HBCRACKHiC-out.hic hg19

 

 


 Please let me know what is incorrect in my input file causing NumberFormatException on my read name and how can I correct this? And what should be the restriction fragment file format able to be processed?

 

I’ll highly appreciate any help in this regard!

Muhammad Saad Shamim

unread,
Aug 29, 2016, 5:37:35 PM8/29/16
to Ankita Nand, 3D Genomics
Hi Ankita,

I hope you are doing well!

The format of the fragment file should be something like this:

<chr_name> <list of fragment indices>
chrX 1 570 1378 1622 1883 1889 ...
chrY 1 470 1278 1522 1873 1989 ...

The other error is most likely because the read name is in the wrong column for one of your lines. Could you maybe confirm this with a quick awk script? We'll update the error message as well so that juicer provides more information.

​Best,​

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/19144bd0-2ed9-444c-ae18-ea8047a9ed88%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ankita Nand

unread,
Aug 29, 2016, 6:17:12 PM8/29/16
to 3D Genomics, nanda...@gmail.com
HI Muhammad,

Thanks a lot for getting back to me! I don't think that the error I am getting is because the read name is in incorrect column. I checked it as shown in attached file. Also You can see the error string "HWIST560:195:C7601ACXX:5:1213:5161:48978" is the fourth row in my file from the top.

Please have a look and let me know further.

Thanks,
Ankita
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
error string.tiff

Ankita Nand

unread,
Aug 30, 2016, 1:33:54 PM8/30/16
to 3D Genomics, nanda...@gmail.com
HI Muhammad,

I tried with 8 columns (<str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2>) but I got into another error:

[ankita@localhost hic]$ java -Xms512m -Xmx2048m -jar juicebox_tools.7.5.jar pre  noReadname_hic_HBCRACKHiC-K562-DN-R1__hg19.validpair.txt.gz hic_HBCRACKHiC-out.hic hg19
Not including fragment map

Start preprocess
Writing header
Writing body
.........Error: the chromosome combination 1_9 appears in multiple blocks


As I understood, we just have to make a input file which has all the valid pair interactions with the position, so my file does contain lines like:

0       1       568175  141     0       9       107376510       475766
16      1       567498  141     16      9       113646575       477626
16      1       710174  187     0       9       108222909       476013
0       1       714649  188     0       9       103211344       474407
16      1       720662  188     0       9       124488377       480743
0       1       746618  192     16      9       27770225        458377
0       1       746626  192     0       9       74258954        465986
0       1       746626  192     0       9       74258954        465986

so, the combination is at multiple places, most of the places with different positions but at one place it is the same line twice, is this error because of this multiple same entries?

Please let me know!

Thanks,
Ankita

Muhammad Shamim

unread,
Aug 31, 2016, 5:40:27 PM8/31/16
to 3D Genomics, nanda...@gmail.com
Hey Ankita,

The input should be sorted so that all the individual chromosomes are together (i.e. 1-1, 1-2, etc) and that the first read end chromosome should be less than or the same as the second read end chromosome.
These posts also deal with this error:

Regarding duplications, juicebox will assume they aren't duplicates if you include them with the input file. To remove duplicates, you'll need to dedup separately or use the scripts in juicer.

Best,
Muhammad S Shamim

Ankita Nand

unread,
Sep 2, 2016, 5:14:12 AM9/2/16
to Muhammad Shamim, 3D Genomics
Awesome! Thank you so much Muhammad for our help! I was able to create .hic file and heatmap from Juicebox as well.

Best,
Ankita

Reply all
Reply to author
Forward
0 new messages