Hi,
I'm having trouble running the cluster juicer.sh script for HiC reads. I'm just using a subsample of practice data (genome size ~4Mb) before I use my real data (genome size ~800Mb).
I'm running juicer.sh with:
$MYSCRATCH/cluster_juicer/scripts/juicer.sh -C 7333940 -q workq -l workq -Q 1440 -L 1440 -g cherry -s HindIII -A pawsey0149 -z $MYSCRATCH/cluster_juicer/references/cherry_pilon3.fasta -y $MYSCRATCH/cluster_juicer/restriction_sites/cherry_HindIII.txt -p $MYSCRATCH/cluster_juicer/references/chrom.sizes -d $MYSCRATCH/cluster_juicer/work -D $MYSCRATCH/cluster_juicer
(lines are messed up a bit here though)
It stops on the output "srun: job 5609221 queued and waiting for resources". I waited a few hours and nothing happened so I hit ctrl C and the following errors appeared. I've tried this several times. Around four jobs get submitted to squeue after it starts, but they disappear after a few seconds to a minute.
debug/head-5609218.err shows errors for spack in a few lines: e.g. /var/spool/slurmd/job5609218/slurm_script: line 10: spack: command not found
I made the following edits to juicer.sh to try and get it working
1. Line 115. Added own load_java and commented out spack commands
My cluster has bwa 0.7.17 and GNU awk 4.1.0 loaded by default. The compiler is PrgEnv-cray/6.0.4 by default, but I also got the same errors when changing to PrgEnv-gnu/6.0.4 before running the script. I'm not sure how to add this compiler swap to the juicer.sh script if that's needed.
I commented out $load_bwa, $load_awk, $load_gpu anywhere else they appeared in the script.
2. Line 123. Changed queue names to workq since that is the only partition in my cluster.
24 hours is the time limit for this partition so I also seached for all "time" or "-t" and changed these if they were above 24 hours.
3. Line 369. Changed memory to below 58G. It was 80G here previously.
I also searched for all "mem" in the script and changed these to 58G if they were above this (64G is the max for my cluster). I assumed 8 CPUs for the mem-per-cpu options.
Running the modified juicer script gives similar problems:
This time there are no errors in cat debug/head* though
Cat debug/split * shows that the splitting seemed to work:
Content of the created splits folder:
The HIC_tmp and aligned
directories get created but are empty.
I figured out that this (line 576) is the line causing the program to wait at "srun job __ queued and waiting for resources":
srun -c 1 -p "$queue" -t 1 -o $debugdir/wait-%j.out -e $debugdir/wait-%j.err -d $dependsplit -J "${groupname}_wait" sleep 1
But it looks like the -d $dependsplit should be fulfilled by the output in split * above. If I remove -d $dependsplit it runs without pausing, but I get several errors like "srun: error: Unable to resolve "nid01185": Unknown host" before the spam of "sbatch: error: Batch job ...". nid01185 is the node my group uses on the cluster. I tried adding -N 1 and -n 1 to the srun command to limit it to one node and task but that didn't change anything.
Any ideas? Sorry for the long post, but I was thinking more information is better for error solving.