How to run tr2ncrna.pl

292 views
Skip to first unread message

María Torres Sánchez

unread,
Jun 23, 2021, 3:18:09 PM6/23/21
to EvidentialGene
Dear Dr. Gilbert,

First, let me thank you for compiling Evigene!

I want to run the tr2ncrna.pl script of the newest version of evigene (evigene/20200520) using my results of tr2aacds.pl script. 

I have some questions about the input files since I am trying to run the script without success. From the usage information, I understand that I need an assembly file from the inputset folder with extension .tr (myassembly.tr ) and a file from the okayset with .mrna extension (myassembly.okay.mrna).

tr2ncrna.pl -input myassembly.tr -mrna okayset/myassembly.okay.mrna

Nevertheless, in my output folders, I don't have files with these extensions. Please see below the files that the tr2aacds.pl script created.


[mtorressanchez@login2 inputset]$ ls
combined_assembly.aa  combined_assembly.aa.qual  combined_assembly.cds

[mtorressanchez@login2 okayset]$ ls
combined_assembly.okalt.aa  combined_assembly.okalt.cds  combined_assembly.okalt.tr  combined_assembly.okay.aa combined_assembly.okay.cds  combined_assembly.okay.tr

Any suggestions?
Best wishes,

Maria

Don Gilbert

unread,
Jun 23, 2021, 3:51:29 PM6/23/21
to EvidentialGene
Maria,

The okayset/ files from your tr2aacds run are from older version of tr2aacds, maybe incompatible with tr2ncrna.
Current tr2aacds from evigene/20200520 *should* produce an okayset1st/ folder, then okayset/ with only name.okay.{mrna,cds,aa} sequence files.  The tr2ncrna program wants those, as well as the dropset/ folder of discarded coding sequences (which include non-coding sequences).  I can't guess if you found a software bug, or more likely a data processing problem, or otherwise.  You may want to retry running current tr2aacds from 20200520 set (I should have had an update since but have been stuck on adding genome assembly "gnodes" calculations to Evigene).

If you desire worked details of Evigene including tr2ncrna, I suggest two sources:
  a. the Sra2Genes example that produces a full run of Evigene, including assemblies, thru publication set, as found here:
   see run_plant1kYYPE.txt text file for details, these plant species RNA samples are small and run quickly
b. a recent worked example with a beetle with large genome is here:
  with the data set and all Evigene files listed (tr2aacds and tr2ncrna) in bbeetle20sra2genes.tar.list

- Don Gilbert

Here is the run_tr2aacds cluster script from bbeetle20 you may adapt:
#! /bin/bash
# env trset=myspecies_allinput.tr datad=path/to/data qsub -q normal run_tr2aacds.sh
## --- gnodes_setup.sh for Slurm ---   
#SBATCH --job-name="sra2genes_pipe"
#SBATCH --output="sra2genes_pipe.%j.log"
#SBATCH --partition=shared
#SBATCH --ntasks-per-node=14
#SBATCH --nodes=1
#SBATCH -t 23:55:00
#SBATCH --export=ALL

ncpu=14
maxmem=164000

# not opts in tr2aacds4 but in component apps
export idprefix=BemtraEVm
export ORGANISM=Bembidion_haplogonum

if [ "X" = "X$datad" ]; then echo Please set datad=/path/to/data; exit -1; fi
if [ "X" = "X$trset" ]; then echo Please set trset=input.tr; exit -1; fi

evigenes=/oasis/projects/nsf/ind114/ux455375/chrs/evigenes/sra2genes_testdrive/bio/apps/evigene/scripts
export evigenes=YOUR_PATH/evigene/scripts
export PATH=YOUR_PATH/ncbi/bin:$PATH
export PATH=YOUR_PATH/exonerate/bin:$PATH
export PATH=YOUR_PATH/cdhit/bin:$PATH
evapp=$evigenes/prot/tr2aacds4.pl

# testing tr2aacds4 -reorient == DO_RESOLVESENSE for genes/trclass_resolve_strandmix.pl stg2 call
# DO_RESOLVESENSE option: reor_nomaybe=1  turns off ambiguous fwd/rev prots, returning to 1:1 prot/rna
export DO_RESOLVESENSE=1

traopts="-log"
addopt=""
if [ "X" != "X$addopt" ]; then traopts="$traopts $addopt"; fi

cd $datad/
echo "#START `date` "
echo $evapp -NCPU $ncpu -MAXMEM $maxmem $traopts -cdna $trset
$evapp -NCPU $ncpu -MAXMEM $maxmem $traopts -cdna $trset
echo "#DONE : `date`"


Here is the run_tr2ncrna cluster script from that bbeetle20 worked example, you might adapt:
#! /bin/bash
# env trset=inputset/name.tr  mrna=okayset/name.okay.mrna datad=path/to/data  qsub -q normal run_evgtr2ncrna.sh
## --- gnodes_setup.sh for Slurm ---   
#SBATCH --job-name="sra2genes_pipe"
#SBATCH --output="sra2genes_pipe.%j.log"
#SBATCH --partition=shared
#SBATCH --ntasks-per-node=14
#SBATCH --nodes=1
#SBATCH -t 23:55:00
#SBATCH --export=ALL

# reduce ncpu used, dont use up all mem..
ncpu=10
maxmem=64000

if [ "X" = "X$datad" ]; then echo Please set datad=/path/to/data; exit -1; fi
if [ "X" = "X$mrna" ]; then echo Please set mrna=okayset/name.okay.mrna; exit -1; fi
if [ "X" = "X$trset" ]; then echo Please set trset=input.tr; exit -1; fi

export evigenes=YOUR_PATH/evigene/scripts
export PATH=YOUR_PATH/ncbi/bin:$PATH
export PATH=YOUR_PATH/exonerate/bin:$PATH

# TEST_OKCDS still needs tests, seems to help
export TEST_OKCDS=1
evgapp=$evigenes/genes/tr2ncrna.pl
evopts="-debug -log"
if [ "X" != "X$opts" ]; then evopts="$evopts $opts"; fi

cd $datad/
echo "#START `date` "
echo $evgapp $evopts -ncpu $ncpu  -mrna $mrna -trset $trset
$evgapp $evopts -ncpu $ncpu  -mrna $mrna -trset $trset
echo "#DONE `date` "


Don Gilbert

unread,
Jun 23, 2021, 10:29:18 PM6/23/21
to EvidentialGene
Maria,

It may be the tr2aacds.pl program from 2020.05 has a problem that I corrected, there is an updated version from 2020.09 here
tr2aacds.pl is same script as tr2aacds4.pl now.

---------- Forwarded message ---------
From: María Torres Sánchez <torressan...@gmail.com>
Date: Wed, Jun 23, 2021 at 6:01 PM
Subject: Re: [evidentialgene] Re: How to run tr2ncrna.pl
To: Don Gilbert <gilbert...@gmail.com>


Dear Dr. Gilbert,

Thank you for your reply!

I ran the tr2aacds script with the latest update of Evigene (evigene/20200520). This is the version information of the script from my log file:

EvidentialGene tr2aacds.pl VERSION 2022.01.20


Should I run the tr2aacds4 script? 


Thank you again.


Best wishes,


Maria



 


--
don gilbert - www.bio.net - bioinformatics - indiana.u.
Reply all
Reply to author
Forward
0 new messages