Error while running traacds4.2aa.pl script

Aaqib Javid

unread,

Dec 13, 2022, 8:33:32 AM12/13/22

to EvidentialGene

Hi all,

could you please help me to understand the Error !

Thankyou.

evigene/scripts/prot/tr2aacds4_22a.pl -tidy -NCPU 12 -MAXMEM 50000 -logfile -MINAA=90 -debug -cdnaseq merged-trinity.all/Trinity-Rhodiola.tr

#t2ac: CMD= mv merged-trinity.all/Trinity-Rhodiolanrcd1x_db.log tmpfiles/Trinity-Rhodiolanrcd1x_db.log

#t2ac: CMD= mv merged-trinity.all/Trinity-Rhodiola.alntab tmpfiles/Trinity-Rhodiola.alntab

#t2ac: CMD= mv merged-trinity.all/Trinity-Rhodiola.adupfilt.log tmpfiles/Trinity-Rhodiola.adupfilt.log

#t2ac: tidyup erase: n=65, merged-trinity.all/Trinity-Rhodiola_split/Trinity-Rhodiola.tr.split1.fa merged-trinity.all/Trinity-Rhodiola_split/Trinity-Rhodiola.tr.split2.fa merged-trinity.all/Trinity-Rhodiola_split/Trinity-Rhodiola.tr.split3.fa merged-trinity.all/Trinity-Rhodiola_split/Trinity-Rhodiola.tr.split4.fa merged-trinity.all/Trinity-Rhodiola_split/Trinity-Rhodiola.tr.split5.fa ..

#t2ac: ERR: missing cdslbast tmpfiles/merged-trinity.all/Trinity-Rhodiolanrcd1x-self98.blastn

#t2ac: DONE at date= Tue Dec 13 14:14:01 CET 2022

Don Gilbert

unread,

Dec 13, 2022, 1:38:20 PM12/13/22

to Aaqib Javid, EvidentialGene

Your problem is here: '-cdnaseq merged-trinity.all/Trinity-Rhodiola.tr'

Don't use that sub-directory merged-trinity.all/ .. that is messing up how tr2aacds finds data and creates new data

This should work:

evigene/scripts/prot/tr2aacds4_22a.pl -tidy -NCPU 12 -MAXMEM 50000 -logfile -MINAA=90 -debug -cdnaseq Trinity-Rhodiola.tr

--
You received this message because you are subscribed to the Google Groups "EvidentialGene" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidentialgen...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/evidentialgene/aef12f2f-4bd2-4747-b5ea-3c0296bf0693n%40googlegroups.com.

--

don gilbert - www.bio.net - bioinformatics - indiana.u.

Aaqib Javid

unread,

Dec 13, 2022, 4:24:03 PM12/13/22

to EvidentialGene

Hi,

Thank-you for your reply, yes I tried the same it worked.

sorry, I have one more question here : Im working with non model plant, however it wanted to annotate the okay.cds, but could you please throw some explaination on command line itself ?

Note: I have local swissprot and uniref90 databases, but Is there a better way to choose selected lineage for annotation? e.g, like eggnog does.

ref protein blastp x evg okayset env aaset=okayset/$pt.aa refaa=refset/$refaa ncpu=20 datad=`pwd` prog=./run_evgaablast.sh sbatch srun_comet.sh
$evigene/scripts/prot/namegenes.pl -blast $aabltab -refnames refset/$refaa.names -out $pt.names

Thank you, your efforts are worth more than appreciation !

Don Gilbert

unread,

Dec 20, 2022, 1:58:20 PM12/20/22

to Aaqib Javid, EvidentialGene

The SRA2Genes pipeline in EvidentialGene includes protein annotation via blast to reference proteins, and also a step finding busco conserved proteins. It is a minimal but usable annotation.

You need to provide the reference protein set. I suggest between 2 and 5 species with some claim to reference status for your organism. For a plant, the model Arabid. is very useful, plus 1-3 nearest to your species that you think are "well-annotated", which means accurate protein names as well as accurate proteins.
Public databases with good reference proteins are UniProt, which has the most valuable annotations per gene. Also NCBI, EBI, DDBJ have good reference protein sets. For plants, Phytozome is good. These all offer taxonomy searches so you can locate nearest species.

Here are some documents and help for sra2genes,
http://arthropods.eugenes.org/EvidentialGene/other/sra2genes_testdrive/sra2genes4v_testdrive/
run_plant1kYYPE.txt = worked example from The 1000 Plant RNA samples

SRA2Genes is described some and used for this doc:
Genes of the pig, Sus scrofa, reconstructed with EvidentialGene, doi: 10.7717/peerj.6374
See also https://sourceforge.net/p/evidentialgene/blog/2020/03/reference-protein-annotation-with-namegenes/

EvidentialGene uses reference protein homologies of your transcript assembly both for annotations and for selecting a best gene set. The basic method for this is in SRA2Genes omni-pipeline, as STEP8_refblastgenes. You can use evigene sra2genes to make unix script for step8 blastp:
a. need evg tr2aacds data set, matching your "runname" :
runname.trclass and okayset/runname.okay.aa

b. need reference proteins, located in file
refset/refgenes.aa and refset/refgenes.aa.names
aa.names is simple table: ID <tab> protein name, like this
AT1G01010.1 NAC domain containing protein 1
AT1G01020.1 ARV1 family protein

c. need ncbi program blastp and makeblastdb on unix PATH

this will make a unix cluster shell script, run_s08_evgblastp.sh

$evigene/scripts/evgpipe_sra2genes4v.pl -runstep start8 -runname cacao19crncbi_mrna -species cacao -SRAids SRR00000 -ncpu 2

produces this to be run on your computer(s): run_s08_evgblastp.cacao19crncbi_mrna.sh
The cacao19crncbi_mrna.names result table from namegenes.pl has reference names and blast scores for okayset proteins, as

Mod3plEVm000001t1 Midasin-like protein 100%,5400/5400,5400 RefID:AT1G67120.2
Mod3plEVm000001t2 Midasin-like protein 100%,5393/5393,5393 RefID:AT1G67120.1
Mod3plEVm000002t1 Auxin transport protein (BIG) 100%,5098/5098,5098 RefID:AT3G02260.1
Mod3plEVm000002t2 Auxin transport protein (BIG) 100%,5087/5087,5087 RefID:AT3G02260.4
Mod3plEVm000003t1 Zinc finger%2C C3HC4 type (RING finger) family protein 100%,4706/4706,4706 RefID:AT5G23110.1
Mod3plEVm000004t1 Pleckstrin (PH) domain-containing protein 100%,4219/4219,4219 RefID:AT4G17140.3
Mod3plEVm000004t2 Pleckstrin (PH) domain-containing protein 100%,4218/4218,4218 RefID:AT4G17140.2

Main steps in run_s08_evgblastp are:
blastp $blopt -outfmt 7 -db $dbaa -query $qfile -out $onam.blastp
$evigenes/makeblastscore3.pl $mbaopts $aablast > $aabltab # $aabltab is a table of blast hits
$evigenes/prot/namegenes.pl $ngopt -blast $aabltab -refnames $refaa.names -out $nameout

To view this discussion on the web visit https://groups.google.com/d/msgid/evidentialgene/e853ee4f-439f-4c60-b717-b75165ab4246n%40googlegroups.com.

Reply all

Reply to author

Forward