Maria,
The okayset/ files from your tr2aacds run are from older version of tr2aacds, maybe incompatible with tr2ncrna.
Current tr2aacds from evigene/20200520 *should* produce an okayset1st/ folder, then okayset/ with only name.okay.{mrna,cds,aa} sequence files. The tr2ncrna program wants those, as well as the dropset/ folder of discarded coding sequences (which include non-coding sequences). I can't guess if you found a software bug, or more likely a data processing problem, or otherwise. You may want to retry running current tr2aacds from 20200520 set (I should have had an update since but have been stuck on adding genome assembly "gnodes" calculations to Evigene).
If you desire worked details of Evigene including tr2ncrna, I suggest two sources:
a. the Sra2Genes example that produces a full run of Evigene, including assemblies, thru publication set, as found here:
see run_plant1kYYPE.txt text file for details, these plant species RNA samples are small and run quickly
b. a recent worked example with a beetle with large genome is here:
- Don Gilbert
Here is the run_tr2aacds cluster script from bbeetle20 you may adapt:
#! /bin/bash
# env trset=myspecies_allinput.tr datad=path/to/data qsub -q normal run_tr2aacds.sh
## --- gnodes_setup.sh for Slurm ---
#SBATCH --job-name="sra2genes_pipe"
#SBATCH --output="sra2genes_pipe.%j.log"
#SBATCH --partition=shared
#SBATCH --ntasks-per-node=14
#SBATCH --nodes=1
#SBATCH -t 23:55:00
#SBATCH --export=ALL
ncpu=14
maxmem=164000
# not opts in tr2aacds4 but in component apps
export idprefix=BemtraEVm
export ORGANISM=Bembidion_haplogonum
if [ "X" = "X$datad" ]; then echo Please set datad=/path/to/data; exit -1; fi
if [ "X" = "X$trset" ]; then echo Please set trset=input.tr; exit -1; fi
evigenes=/oasis/projects/nsf/ind114/ux455375/chrs/evigenes/sra2genes_testdrive/bio/apps/evigene/scripts
export evigenes=YOUR_PATH/evigene/scripts
export PATH=YOUR_PATH/ncbi/bin:$PATH
export PATH=YOUR_PATH/exonerate/bin:$PATH
export PATH=YOUR_PATH/cdhit/bin:$PATH
evapp=$evigenes/prot/tr2aacds4.pl
# testing tr2aacds4 -reorient == DO_RESOLVESENSE for genes/trclass_resolve_strandmix.pl stg2 call
# DO_RESOLVESENSE option: reor_nomaybe=1 turns off ambiguous fwd/rev prots, returning to 1:1 prot/rna
export DO_RESOLVESENSE=1
traopts="-log"
addopt=""
if [ "X" != "X$addopt" ]; then traopts="$traopts $addopt"; fi
cd $datad/
echo "#START `date` "
echo $evapp -NCPU $ncpu -MAXMEM $maxmem $traopts -cdna $trset
$evapp -NCPU $ncpu -MAXMEM $maxmem $traopts -cdna $trset
echo "#DONE : `date`"
Here is the run_tr2ncrna cluster script from that bbeetle20 worked example, you might adapt:
#! /bin/bash
# env trset=inputset/name.tr mrna=okayset/name.okay.mrna datad=path/to/data qsub -q normal run_evgtr2ncrna.sh
## --- gnodes_setup.sh for Slurm ---
#SBATCH --job-name="sra2genes_pipe"
#SBATCH --output="sra2genes_pipe.%j.log"
#SBATCH --partition=shared
#SBATCH --ntasks-per-node=14
#SBATCH --nodes=1
#SBATCH -t 23:55:00
#SBATCH --export=ALL
# reduce ncpu used, dont use up all mem..
ncpu=10
maxmem=64000
if [ "X" = "X$datad" ]; then echo Please set datad=/path/to/data; exit -1; fi
if [ "X" = "X$mrna" ]; then echo Please set mrna=okayset/name.okay.mrna; exit -1; fi
if [ "X" = "X$trset" ]; then echo Please set trset=input.tr; exit -1; fi
export evigenes=YOUR_PATH/evigene/scripts
export PATH=YOUR_PATH/ncbi/bin:$PATH
export PATH=YOUR_PATH/exonerate/bin:$PATH
# TEST_OKCDS still needs tests, seems to help
export TEST_OKCDS=1
evgapp=$evigenes/genes/tr2ncrna.pl
evopts="-debug -log"
if [ "X" != "X$opts" ]; then evopts="$evopts $opts"; fi
cd $datad/
echo "#START `date` "
echo $evgapp $evopts -ncpu $ncpu -mrna $mrna -trset $trset
$evgapp $evopts -ncpu $ncpu -mrna $mrna -trset $trset
echo "#DONE `date` "