converting HiC output files to juicebox input

1,299 views
Skip to first unread message

Shane

unread,
Mar 2, 2017, 2:57:51 PM3/2/17
to HiC-Pro
Dear all,

I met a problem when I tried to convert the HiC output to juicebox input to visulize my data.

The following is the command line I used:

~/software/HiC-Pro_2.7.8/bin/utils/hicpro2juicebox.sh -t ./tmp -i hic_output/data/input_1/input_1_allValidPairs -g ~/software/HiC-Pro_2.7.8/annotation/chrom_hg19.sizes -j ~/software/Juicebox.jar -r ~/software/HiC-Pro_2.7.8/annotation/HindIII_resfrag_hg19.bed

Then I got some error info like this:
Generating Juicebox input files ...
sort: write failed: /tmp/sort40hrUe: No space left on device
Running Juicebox ...
Exception in thread "main" java.awt.AWTError: Can't connect to X11 window server using 'gwdu102:0.0' as the value of the DISPLAY variable.
at sun.awt.X11GraphicsEnvironment.initDisplay(Native Method)
at sun.awt.X11GraphicsEnvironment.access$200(X11GraphicsEnvironment.java:65)
at sun.awt.X11GraphicsEnvironment$1.run(X11GraphicsEnvironment.java:115)
at java.security.AccessController.doPrivileged(Native Method)
at sun.awt.X11GraphicsEnvironment.<clinit>(X11GraphicsEnvironment.java:74)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:260)
at java.awt.GraphicsEnvironment.createGE(GraphicsEnvironment.java:102)
at java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(GraphicsEnvironment.java:81)
at sun.awt.X11.XToolkit.<clinit>(XToolkit.java:123)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:260)
at java.awt.Toolkit$2.run(Toolkit.java:860)
at java.awt.Toolkit$2.run(Toolkit.java:855)
at java.security.AccessController.doPrivileged(Native Method)
at java.awt.Toolkit.getDefaultToolkit(Toolkit.java:854)
at sun.swing.SwingUtilities2.getSystemMnemonicKeyMask(SwingUtilities2.java:2020)
at javax.swing.plaf.basic.BasicLookAndFeel.initComponentDefaults(BasicLookAndFeel.java:1158)
at javax.swing.plaf.metal.MetalLookAndFeel.initComponentDefaults(MetalLookAndFeel.java:431)
at javax.swing.plaf.basic.BasicLookAndFeel.getDefaults(BasicLookAndFeel.java:148)
at javax.swing.plaf.metal.MetalLookAndFeel.getDefaults(MetalLookAndFeel.java:1577)
at javax.swing.UIManager.setLookAndFeel(UIManager.java:539)
at javax.swing.UIManager.setLookAndFeel(UIManager.java:579)
at javax.swing.UIManager.initializeDefaultLAF(UIManager.java:1349)
at javax.swing.UIManager.initialize(UIManager.java:1459)
at javax.swing.UIManager.maybeInitialize(UIManager.java:1426)
at javax.swing.UIManager.getUI(UIManager.java:1006)
at javax.swing.JLabel.updateUI(JLabel.java:275)
at javax.swing.JLabel.<init>(JLabel.java:164)
at javax.swing.JLabel.<init>(JLabel.java:235)
at juicebox.windowui.DisabledGlassPane.<init>(DisabledGlassPane.java:47)
at juicebox.MainWindow.<clinit>(MainWindow.java:53)
done !

It seems that I don't have enough space for the sorting process which I am sure that I have.

I also checked the tmp directory, i got two files, 15780_resfrag.juicebox and 15780_allValidPairs.pre_juicebox_sorted. But the sorted file is empty.

Can anybody help me?? Thanks in advance.

nservant

unread,
Mar 2, 2017, 3:06:38 PM3/2/17
to HiC-Pro
Hi,
Indeed, it looks like the sort does not work for memory issue. But this is a space issue in your root tmp, ie. /tmp/, not ./tmp/
Could you try ;

awk '{$4=$4!="+"; $7=$7!="+"; split($9, frag1, "_"); split($10, frag2, "_"); } $2<=$5{print $1, $4, $2, $3, frag1[3], $7, $5, $6, frag2[3], $11, $12 }$5<$2{ print $1, $7, $5, $6, frag2[3], $4, $2, $3, frag1[3], $12, $11}'  hic_output/data/input_1/input_1_allValidPairs > ./tmp/test.out

sort -k3,3d  -k7,7d -S 90 ./tmp/test.out > ./tmp/test_allValidPairs.pre_juicebox_sorted

These are the commands run by hicpro2juicebox.
I think that the sort will crash ...
N

Shane

unread,
Mar 2, 2017, 3:20:30 PM3/2/17
to HiC-Pro
Hi Nicolas,

Thanks for the rapid reply, totally agree with you. I also thought about the /tmp is not my ./tmp, this root issue is always annoying..
Now I am trying to run the same command as before within a big memory queue, hope it works. At the same time, I am also trying the awk/sort you gave me... I will update the result then...
Thanks.

nservant

unread,
Mar 2, 2017, 3:33:52 PM3/2/17
to HiC-Pro
Otherwise, if it fails ... maybe you can directly edit the hicpro2juicebox.sh script and add the option -T to the sort option (line 118 + 120) in order to specify which TMP dir has to be used by sort.
Does it make sense ?

Shane

unread,
Mar 2, 2017, 3:47:19 PM3/2/17
to HiC-Pro

Yep, that also makes sense. I really need to learn more shell sutff :(

Now the task is still running, I will post the results here once it is finished.
Thanks so much, N.

Best,
X

Shane

unread,
Mar 3, 2017, 5:15:47 AM3/3/17
to HiC-Pro
Hi Nicolas,

Thanks for your help, with a fat node I got a sorted valid pair file.

But now I met another question when I was trying to generate the HiC input. The command line I used is 

java -jar ~/software/Juicebox.jar pre -f 11230_resfrag.juicebox test_allValidPairs.pre_juicebox_sorted input1.hic ~/software/HiC-Pro_2.7.8/annotation/chrom_hg19.sizes

I met several problems before I can eventually forward the X11 display to my PC, anyway, it is working now.
However, now what I got is just a juicebox window. It did not generate any files with a .hic suffix. 

Do you have any suggestions for this? 
A  LOT OF THANKS.

Best,
X

nservant

unread,
Mar 3, 2017, 5:53:29 AM3/3/17
to HiC-Pro
Hi,
I think that you are using Juicebox ? and not the Juicebox command line tools (juicebox_clt) ?
You need to use the command line tools to generate the .hic file.
N

Shane

unread,
Mar 3, 2017, 5:37:28 PM3/3/17
to HiC-Pro

Lol, your are right.
Now I got the correct .hic file, thanks a lot, Nicolas.

Best wishes,
X

Mandy Wong

unread,
Apr 4, 2018, 12:32:14 AM4/4/18
to HiC-Pro
Hi Nicolas,

I have been trying to use this tool using the following command:
 ~/newHiC-Pro/HiC-Pro_2.10.0/bin/utils/hicpro2juicebox.sh -i /scratch/users/mkmwong/Rao2014/hic_results/data/sample1/sample1_allValidPairs -g ~/newHiC-Pro/HiC-Pro_2.10.0/annotation/chrom_hg19.sizes  -j /share/PI/ashbym/Juicebox_1.8.8.jar -r ~/newHiC-Pro/HiC-Pro_2.10.0/annotation/hg19_mboi.bed  -o /scratch/users/mkmwong/

However, it seems like it's taking a very long time to run. The only thing that was printed to the terminal was:
HiC-Pro format > 2.7.5 detected ...

Generating Juicebox input files ...

And the file I have gotten was
204280_resfrag.juicebox, of 63 mb and pre_juicebox_sorted file that was empty. was the execution for the script somehow halted?

nservant

unread,
Apr 4, 2018, 3:25:54 AM4/4/18
to HiC-Pro
 Hi Mandy,
Yes it can be long. From my experience, there are two things you can try to do.
1/ remove the -r option. It is useful if you want to see your maps at the restriction fragment resolution, but it also requires a huge sequencing depth.
2/ could check the list of chromosome in the /chrom_hg19.sizes file ? I realized that the more chromosomes you have, the longer it is.
I fyou have any 'random' chromosomes, I would suggest to remove them.

N
Message has been deleted

Mandy Wong

unread,
Apr 4, 2018, 7:18:44 PM4/4/18
to HiC-Pro
Hi,

I realized that I made the same mistake as above - using the actual juice box jar. I looked up the juice_clt jar but I think they have released new version of the thing - as juicer_tools.1.8.9_jcuda.0.8.jar. I used that instead, but I am still having the same problem as described below:

I noticed that after running for 12 hours, the code still hasn't gone past this step:

     awk '{$4=$4!="+"; $7=$7!="+"; n1=split($9, frag1, "_"); n2=split($10, frag2, "_"); } $2<=$5{print $1, $4, $2, $3, frag1[n1],

     $7, $5, $6, frag2[n2], $11, $12 }$5<$2{ print $1, $7, $5, $6, frag2[n2], $4, $2, $3, frag1[n1], $12, $11}' $VALIDPAIRS | sort -

    k3,3d  -k7,7d -S 90 > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted

and the .prejuicebox_sorted file is still empty.(0K)It seems like making changes (1) & (2) you've suggested above still yields the same results. Is there something wrong maybe regarding the format of the output for allValidPairs that could be causing the problem?


I attach here the head of my .allValidPairs file:


SRR1658673.259994198    chr1    10776   +       chr7    131533799       +       570     HIC_chr1_1      HIC_chr7_319776 0       42

SRR1658673.72579757     chr1    12259   +       chr15   93563685        -       341     HIC_chr1_2      HIC_chr15_188640        2  40

SRR1658676.125252832    chr1    13028   +       chr4    134982856       +       426     HIC_chr1_6      HIC_chr4_308064 30      42

SRR1658676.120863062    chr1    13028   -       chr15   102348304       -       454     HIC_chr1_6      HIC_chr15_209860        31 42

SRR1658676.152863214    chr1    13028   -       chr18   77468066        -       361     HIC_chr1_6      HIC_chr18_177304        30 42

SRR1658673.90423160     chr1    13028   -       chr19   560618  +       439     HIC_chr1_6      HIC_chr19_1329  0       2

SRR1658673.201320596    chr1    13028   -       chr19   16679875        +       297     HIC_chr1_6      HIC_chr19_54907 31      42

SRR1658676.103541592    chr1    13029   +       chr3    78354693        -       456     HIC_chr1_6      HIC_chr3_194482 6       23

SRR1658676.219111220    chr1    13029   -       chr11   111314436       -       393     HIC_chr1_6      HIC_chr11_265725        31 42

SRR1658676.81193276     chr1    13031   +       chr15   86012106        +       365     HIC_chr1_6      HIC_chr15_168511        11 42


and the file is 70GB....

nservant

unread,
Apr 5, 2018, 3:11:40 AM4/5/18
to HiC-Pro
Hi,
Yes indeed. They move the command line tools into Juicer now.
So the jar should be this one ;
https://github.com/theaidenlab/juicer/wiki/Juicer-Tools-Quick-Start

Regarding you issu, I'm a bit confused by the fact that the awk command did not finish.
My feeling is that this is more related to the sort of a 70Gb file.

To double check that, could you please run the same command on the head of your file ?
And then the same on your entire file, but without the sort ?
Thanks

Mandy Wong

unread,
Apr 5, 2018, 12:46:47 PM4/5/18
to HiC-Pro
So when I only run it on head - this is the error message from it:

Problem with creating fragment-delimited maps, NullPointerException.

This could be due to a null fragment map or to a mismatch in the chromosome name in the fragment map vis-a-vis the input file or chrom.sizes file.

Exiting.

done !


Does that mean some possible formatting error in my file?

nservant

unread,
Apr 5, 2018, 1:33:10 PM4/5/18
to HiC-Pro
ok but this is a Juicebox error.
It means that it passes through the awk command.
Did you try to run only the awk command on your valid pairs file, without the sort.
My feeling is that the sort of the 70Gb file is the issue ...

Mandy Wong

unread,
Apr 5, 2018, 2:47:25 PM4/5/18
to HiC-Pro
Your feeling was right.... it started running as soon as I removed the sort.  :-| 

nservant

unread,
Apr 5, 2018, 3:02:44 PM4/5/18
to HiC-Pro
good. If it runs without the sort, it's fine.
Otherwise, how many RAM do you have ? Could you have access to a more powerfull machine ?

Mandy Wong

unread,
Apr 5, 2018, 3:08:56 PM4/5/18
to HiC-Pro
I'm running things on sherlock cluster at Stanford, and I think I can get ~100GB? I might be able to get more if needed.

nservant

unread,
Apr 5, 2018, 3:22:56 PM4/5/18
to HiC-Pro
It's already huge. It might be interesting to investigate why sorting this file take so much time.
One idea would be to specify the TMP_DIR of sort using -T option.
I think that by default, sort writes its temporary files in your current folder. If for any reason you do not have enough disk, it could crash.
Let me know if I can help.
If necessary, you can send me your file in a private message. I can try here.
Best wishes

Mandy Wong

unread,
Apr 5, 2018, 3:33:08 PM4/5/18
to HiC-Pro
I am assuming that, the temporary file that you are talking about is 

${TEMP}/$$_allValidPairs.pre_juicebox_sorted <- which is be in ./tmp , where . is whatever the directory I ran the script. Is it odd that, when I include the sort part of the awk command, the script doesn't show signs of crashing(quitting and going back to the command prompt), and yet the pre_juicebox_sorted file is 0KB the whole time? or is it that if the sorting was never finished and the program has crashed, this is what we expect to see? 


Thank you so much for you help on running the script and the whole pipeline :)

nservant

unread,
Apr 5, 2018, 3:43:20 PM4/5/18
to HiC-Pro
Not really ... when you run the 'sort' command, I think that it writes it own temp files (see the sort manual for details) ... that's why I was wondering if you have enough space in your current folder.
And indeed, my feeling is that the sort never writes its output.
So finally, did it run without the sort ?

Mandy Wong

unread,
Apr 5, 2018, 4:03:09 PM4/5/18
to HiC-Pro
Yes, it runs without the sort! I think that the space in my current folder is ~28TB ..... 

nservant

unread,
Apr 5, 2018, 4:39:21 PM4/5/18
to HiC-Pro
ok, so it's fine.
Could you try to remove the -S 90 ? using this option, it allows sort to use 90% of your RAM ... it might be too much.
N

Mandy Wong

unread,
Apr 5, 2018, 5:23:44 PM4/5/18
to HiC-Pro
Here's all he things I have tried - 

1. 

awk '{$4=$4!="+"; $7=$7!="+"; n1=split($9, frag1, "_"); n2=split($10, frag2, "_"); } $2<=$5{print $1, $4, $2, $3, frag1[n1], $7, $5, $6, frag2[n2], $11, $12 }$5<$2{ print $1, $7, $5, $6, f    rag2[n2], $4, $2, $3, frag1[n1], $12, $11}' $VALIDPAIRS > ./tmp/test.out

##sort -k3,3d  -k7,7d -S 90 ./tmp/test.out > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted


The first step works, the second step doesn't

2.

##awk '{$4=$4!="+"; $7=$7!="+"; n1=split($9, frag1, "_"); n2=split($10, frag2, "_"); } $2<=$5{print $1, $4, $2, $3, frag1[n1], $7, $5, $6, frag2[n2], $11, $12 }$5<$2{ print $1, $7, $5, $6, f    rag2[n2], $4, $2, $3, frag1[n1], $12, $11}' $VALIDPAIRS > ./tmp/test.out

##sort -k3,3d  -k7,7d ./tmp/test.out > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted


Again, the sort doesn't work

3. awk '{$4=$4!="+"; $7=$7!="+"; n1=split($9, frag1, "_"); n2=split($10, frag2, "_"); } $2<=$5{print $1, $4, $2, $3, frag1[n1], $7, $5, $6, frag2[n2], $11, $12 }$5<$2{ print $1, $7, $5, $6, fra    g2[n2], $4, $2, $3, frag1[n1], $12, $11}' $VALIDPAIRS > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted

This works, but then it run into trouble when running juicebox. The run terminates and the file is gone

4. awk '{$4=$4!="+"; $7=$7!="+"; n1=split($9, frag1, "_"); n2=split($10, frag2, "_"); } $2<=$5{print $1, $4, $2, $3, frag1[n1], $7, $5, $6, frag2[n2], $11, $12 }$5<$2{ print $1, $7, $5, $6, fra    g2[n2], $4, $2, $3, frag1[n1], $12, $11}' $VALIDPAIRS | sort -k3,3d  -k7,7d  > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted

This just goes into the same problem where the sorted file is empty...
Message has been deleted

pengt...@gmail.com

unread,
Feb 26, 2019, 3:05:34 AM2/26/19
to HiC-Pro
Hi,
I think I have met the same problem.It reported the erro:

sort: write failed: /tmp/sortFyOmgy: No space left on device

and the memory imformation about my device is :

              total        used        free      shared  buff/cache   available
Mem:           251G         36G         80G        175M        134G        198G
Swap:          4.0G        1.8G        2.2G

So is it because the memory is to small and I have to change to a more powerful machine ?
在 2018年4月6日星期五 UTC+8上午3:02:44,nservant写道:

nservant

unread,
Mar 1, 2019, 6:12:40 AM3/1/19
to HiC-Pro
I think that this is more a disk space issue than a RAM issue ...
N

Maharshi Chakravortee

unread,
May 16, 2019, 9:24:32 AM5/16/19
to HiC-Pro
Hi Nicolas,

Thanks again for helping out so much.


I'm running the hicpro2juicebox.sh script on an SGE cluster with 10 cores and 10GB RAM each with the following command:

PATH/bin/utils/hicpro2juicebox.sh -i PATH/hic_results/data/samples/samples.allValidPairs -g PATH/annotation/chrmSizes.GRCh38 -j PATH/juicer_tools_1.11.04_jcuda.0.8.jar -o PATH/hic_results/

The juicer version I used was version 1.11.4(DEV). The process is still running(not sure how long should it take). Should I qdel it and use the 1.8.9(Stable) version instead? The disk space on which I'm writing onto has about 1.5TB left, in your experience do you think it should be enough?


Thanks a lot!

Maharshi Chakravortee

unread,
May 16, 2019, 4:38:14 PM5/16/19
to HiC-Pro
UPDATE: It worked, thanks! 
Reply all
Reply to author
Forward
Message has been deleted
0 new messages