I am working on creating a custom repeat library. I want to use the ProtExcluder.pl script, found on the maker wiki at
to trim out possible gene sequences from the default RepeatModeler output when run on my genome. I'm getting some errors and output in which no sequences are removed from my RepeatModeler library and am wondering if you anyone has experience with this script and can help me understand the errors.
I am feeding ProtExcluder.pl a FASTA file from RepeatModeler and blastx output (default output,blast 2.2.31+) like:
ProtExcluder.pl blast_output repeat_fasta 1>stdout 2>stderr
- I get an output file repeat_fastanoProtFinal that contains exactly the same sequences as the input repeat_fasta.
- stderr has these errors:
Can't exec "binaries/esl-sfetch": No such file or directory at /share/apps/genomics/ProtExcluder1.1/mspesl-sfetch.pl line 17.
Can not open the seqfile /home/joshd/data/azolla/blasts/repeats/RepeatModeler.celera_blastx_PT-1.1-orthofinder/AzlRptMdlrLib.celera_blastx_PT-1.1-orthofinder_1e-5.fnolowm50seq
mergeunmatchedregion.pl seqfile
Illegal division by zero at /share/apps/genomics/ProtExcluder1.1/GCcontent.pl line 122.
ProtExcluder.pl created a bunch of files in the directory where it is trying to unsuccessfully access the fnolow50seq file, which does not exist, though there are files whose names have the suffix fnolow50seqm, fnolow50seqmGC, and fnolow50seqmns.
Any help would be appreciated! I could write a script to do this but would rather use an already debugged one to save time. Thanks!
Matt Simenc
Der Evolutionary Genomics Lab
California State University, Fullerton