Dear Neva,
could you please help me with the following issue. I installed a SLURM version of juicer and was trying to analyze the hic data starting with fastq files. With exception of some issues (the job submission to remove duplicates failed and I had to re-submit it manually) I could finally reach the final stage of the pipeline (-S final). Yet here I encounter the problem, only for 3 input files out of 4 I get the following error during the generation of .hic files:
java.lang.NumberFormatException: For input string: "MT"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at juicebox.tools.utils.original.AsciiPairIterator.advance(AsciiPairIterator.java:149)
at juicebox.tools.utils.original.AsciiPairIterator.next(AsciiPairIterator.java:194)
at juicebox.tools.utils.original.Preprocessor.computeWholeGenomeMatrix(Preprocessor.java:493)
at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:371)
at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:283)
at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:108)
at juicebox.tools.HiCTools.main(HiCTools.java:86)
I am running juicer with the following command:
/projects/ag-papan/Yulia/Sources/juicer/juicer/SLURM/scripts/juicer.sh -z /projects/ag-papan/Yulia/Sources/juicer/opt/juicer/references/Homo_sapiens_assembly19.fasta -p /projects/ag-papan/Yulia/Sources/juicer/opt/juicer/references/hg19.chrom.sizes -y /projects/ag-papan/Yulia/Sources/juicer/opt/juicer/restriction_sites/hg19_NcoI.txt -s NcoI -S final
The .hic file generation stops immediately after the exception is thrown.
Could you please recommend a solution to this problem.
My guess is there is an error in the deduplication process for the split files containing MT reads.
Thank you very much for your support.
Looking forward to hearing from you soon.
Kind regards,
Yulia
Hello Yulia,
This is most likely a problem in your merged_nodups file. It looks like there’s a chromosome name where Juicer expects a number; usually this occurs when the merging failed. One easy thing to check is if all lines have 16 fields:
awk 'NF==16' merged_nodups.txt > new_merged_nodups.txt
This will remove any offending lines.
You might want to check if the problem occurred earlier. Have a look in your debug folder at the “.err” files. The “align.err” files should all end with a statement about how long the alignment took, and from other jobs they should be empty.
If the above awk script produces a differently sized file (check via ls -l
), then you know the merged_nodups was corrupt. You should then run the same script on merged_sort to check if it is also corrupt. If it’s not corrupt, rerun starting in the dedup stage. If it is corrupt, there are further debugging steps I could explain; one hint is to look at the *norm.res.txt files in splits and make sure they are correct.
Hope that helps.
Best
Neva
--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/4b34d11b-230f-4009-9577-f9fff55b4cb1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.