Hi 3D Genomics group,
Thanks for creating the genome assembly tools and the 3D Genomics forum.
I am having an issue while running the 3d-dna assembly pipeline on a draft genome assembly (the step before modifying assembly in Juicebox Assembly Tools) --- ./3d-dna/run-asm-pipeline.sh draft.fa merged_nodups.txt. In the log file, there were 2964 lines of error messages: tail: invalid number of bytes: ‘+’. I believe the "invalid number of bytes" error message was related to the tail commands at lines 62 and 64 in the script construct-fasta-from-asm.sh.
This is an illustration of the "tail" command:
"The -c option is less tolerant than the -n option. That is, there is no default number of bytes, and thus some integer must be supplied. Also, the letter c cannot be omitted as can the letter n, because in such case tail would interpret the hyphen and integer combination as the -n option. Thus, for example, the following would produce an error message something like
tail: aardvark: invalid number of bytes: tail -c aardvark" ---
http://www.linfo.org/tail.html
Based on the usage information of "tail", it seems that the tail command at lines 62 and 64 in the script construct-fasta-from-asm.sh didn't revceive its -c parameter properly:
tail -c +${index[${contig}]} ${fasta} | awk '$0~/>/{exit}1' | awk -f ${pipeline}/utils/reverse-fasta.awk -
tail -c +${index[${contig}]} ${fasta} | awk '$0~/>/{exit}1'
But I can't figure out the original source from which the "invalid number of bytes" error came. I have tried some possible solutions related to the problem in this forum and theaidenlab/3d-dna github issues, but none of them solved the "invalid number of bytes" errors.
Howerer, I found that the temp index file (32.3 MB) created (and removed) by construct-fasta-from-asm.sh seems larger than usual. And there were many hidden characters (^@) before the last line of the index file (please see the attached tmp.index screenshot by less). I think this might give some clues about the "tail" issue.
By the way, the draft.fa file was generated by a colleague of mine using wtdbg2.
Here is my environment runing 3d-dna:
- lastz (version 1.04.00 released 20170312)
- java version "1.7.0_45"
- GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
- GNU Awk 5.1.0, API: 3.0
- GNU coreutils sort version 8.31
- Python 2.7.8
- GNU parallel 20200722
Attached were the log file, rawchrom.assembly file, HiC map (derived from rawchrom.hic and rawchrom.assembly), and tmp.index file (screenshot by less), the tmp.index file was too large to upload.
It's my first time posting a question on the google forum, please give me a little patience if I didn't make myself clear. Please let me know if I miss any essential point.
Thanks!
Regards,
Xingzheng Li