The SRA2Genes pipeline in EvidentialGene includes protein annotation via blast to reference proteins, and also a step finding busco conserved proteins. It is a minimal but usable annotation.
You need to provide the reference protein set. I suggest between 2 and 5 species with some claim to reference status for your organism. For a plant, the model Arabid. is very useful, plus 1-3 nearest to your species that you think are "well-annotated", which means accurate protein names as well as accurate proteins.
Public databases with good reference proteins are UniProt, which has the most valuable annotations per gene. Also NCBI, EBI, DDBJ have good reference protein sets. For plants, Phytozome is good. These all offer taxonomy searches so you can locate nearest species.
Here are some documents and help for sra2genes,
http://arthropods.eugenes.org/EvidentialGene/other/sra2genes_testdrive/sra2genes4v_testdrive/ run_plant1kYYPE.txt = worked example from The 1000 Plant RNA samples
SRA2Genes is described some and used for this doc:
Genes of the pig, Sus scrofa, reconstructed with EvidentialGene, doi: 10.7717/peerj.6374
See also
https://sourceforge.net/p/evidentialgene/blog/2020/03/reference-protein-annotation-with-namegenes/EvidentialGene uses reference protein homologies of your transcript assembly both for annotations and for selecting a best gene set. The basic method for this is in SRA2Genes omni-pipeline, as STEP8_refblastgenes. You can use evigene sra2genes to make unix script for step8 blastp:
a. need evg tr2aacds data set, matching your "runname" :
runname.trclass and okayset/runname.okay.aa
b. need reference proteins, located in file
refset/refgenes.aa and refset/refgenes.aa.names
aa.names is simple table: ID <tab> protein name, like this
AT1G01010.1 NAC domain containing protein 1
AT1G01020.1 ARV1 family protein
c. need ncbi program blastp and makeblastdb on unix PATH
this will make a unix cluster shell script, run_s08_evgblastp.sh
$evigene/scripts/
evgpipe_sra2genes4v.pl -runstep start8 -runname cacao19crncbi_mrna -species cacao -SRAids SRR00000 -ncpu 2
produces this to be run on your computer(s):
run_s08_evgblastp.cacao19crncbi_mrna.shThe cacao19crncbi_mrna.names result table from
namegenes.pl has reference names and blast scores for okayset proteins, as
Mod3plEVm000001t1 Midasin-like protein 100%,5400/5400,5400 RefID:AT1G67120.2
Mod3plEVm000001t2 Midasin-like protein 100%,5393/5393,5393 RefID:AT1G67120.1
Mod3plEVm000002t1 Auxin transport protein (BIG) 100%,5098/5098,5098 RefID:AT3G02260.1
Mod3plEVm000002t2 Auxin transport protein (BIG) 100%,5087/5087,5087 RefID:AT3G02260.4
Mod3plEVm000003t1 Zinc finger%2C C3HC4 type (RING finger) family protein 100%,4706/4706,4706 RefID:AT5G23110.1
Mod3plEVm000004t1 Pleckstrin (PH) domain-containing protein 100%,4219/4219,4219 RefID:AT4G17140.3
Mod3plEVm000004t2 Pleckstrin (PH) domain-containing protein 100%,4218/4218,4218 RefID:AT4G17140.2
Main steps in run_s08_evgblastp are:
blastp $blopt -outfmt 7 -db $dbaa -query $qfile -out $onam.blastp
$evigenes/
makeblastscore3.pl $mbaopts $aablast > $aabltab # $aabltab is a table of blast hits
$evigenes/prot/
namegenes.pl $ngopt -blast $aabltab -refnames $refaa.names -out $nameout