Hello,
I am currently working on comparing de novo transcriptomes from three species, based on orthologous genes. To do so, after assembly, I have used the EvidentialGene software after performing clustering with cd-hit for sequence filtering on EvidentialGene.
The BUSCO results are very good at each step (assembly, cd-hit, EvidentialGene), except for one of the transcriptomes, which loses about 60% of its sequences and shows only 54% completeness on BUSCO after the EvidentialGene step.
I have tested several configurations with different parameters, such as:
MINAA=30, MINAA=20, MINAA=15
pHeterozygosity=1
Other parameters, but I still cannot obtain a good completeness score for this particular transcriptome.
I was wondering if there is anything I can adjust or any specific approach I should try to resolve this issue. I would greatly appreciate any advice or suggestions that could help me improve the completeness of this transcriptome after the EvidentialGene step.
Thank you in advance for your help.
Juliette
--
You received this message because you are subscribed to the Google Groups "EvidentialGene" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidentialgen...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/evidentialgene/6d96dd74-4266-4edd-b2f0-b1f9f66de5cfn%40googlegroups.com.
Hello Don Gilbert,After testing evigene on raw assembly, I obtained 70% of Busco gene completeness. In order to better understand this significant loss compared to busco on raw assembly (97%), I tested a busco on the evigene classes showing the most loss (dropsed). Busco found 42% completeness for the perfectdup class, I really hesitate to add these sequences to those obtained in the okeyset results file because the dropped perfectdup sequences still represent 15% of the total sequences. I'm really afraid of losing biological information.
Thank you again for your help,Have a nice day,JulietteLe mer. 9 avr. 2025 à 10:23, berger juliette <bergerju...@gmail.com> a écrit :Dear Don Gilbert,
I hope you're doing well. I am writing to share the results I obtained after performing an analysis with Evigene on the raw assembly of Cryptocercus. I used the following command:
perl /home/juliette/softwares/evigene/evigene/scripts/prot/tr2aacds.pl -mrna Trinity.fasta -NCPU=2 -MAXMEM=5000 -MINAA=30 -logfile evigene_test1.logResults obtained:
BUSCO score:
After running cd-hit + Evigene, I obtained a 54% completeness score on busco.
After running Evigene on the raw assembly, the score slightly increased to 70%.
However, this score remains relatively low compared to the 97% completeness I obtained from the raw assembly.
Comparison with other species: When comparing with the two other species I assembled (Salganea and Tryonicus), here are the BUSCO scores after raw assembly + Evigene:
S. incerta: 96% completeness
Tryonicus sp.: 96% completeness
In contrast, for Cryptocercus, despite metrics indicating a superior assembly quality (see below), the BUSCO score after Evigene remains relatively low (70%).
Assembly metrics:
Specimen C. punctulatus S. incerta Tryonicus sp. Number of transcripts 599,525 732,825 873,967 GC content (%) 39.44 36.08 36.90 N50 1,012 817 955 Ex90N50 (bp) 1,545 1,516 1,296 Backmapping rate (%) > 97 > 97 > 97 Full-length transcripts (> 80% coverage) 5,369 4,491 4,657 BUSCO completeness (%) 97% 96% 96% Issue:
Despite the overall better metrics for Cryptocercus, the BUSCO score after Evigene is significantly lower than for the other species. I am therefore perplexed as to the cause of this discrepancy, and I would like to know if you have any insights or suggestions on additional steps I could take to improve this score.
I am also attaching the complete Evigene log file for Cryptocercus so you can examine it in more detail.
Thank you in advance for your help. I look forward to your suggestions.
Best regards,
Juliette
#t2ac: EvidentialGene tr2aacds.pl VERSION 2022.04.05
#t2ac: CMD: tr2aacds.pl -mrna Trinity.fasta -NCPU=2 -MAXMEM=5000 -MINAA=30 -logfile evigene_test1.log
#t2ac: app=blastn, path=/usr/bin/blastn
#t2ac: app=makeblastdb, path=/usr/bin/makeblastdb
#t2ac: app=fastanrdb, path=/usr/bin/fastanrdb
#t2ac: app=cd-hit-est, path=/home/juliette/cd-hit-v4.8.1-2019-0228/cd-hit-est
#t2ac: app=cd-hit, path=/home/juliette/cd-hit-v4.8.1-2019-0228/cd-hit
#t2ac: evigeneapp=cdna_bestorf.pl, path=/home/juliette/softwares/evigene/evigene/scripts/prot/../cdna_bestorf.pl
#t2ac: evigeneapp=prot/traa2cds.pl, path=/home/juliette/softwares/evigene/evigene/scripts/prot/../prot/traa2cds.pl
#t2ac: evigeneapp=prot/aaqual.sh, path=/home/juliette/softwares/evigene/evigene/scripts/prot/../prot/aaqual.sh
#t2ac: evigeneapp=rnaseq/asmrna_dupfilter4.pl, path=/home/juliette/softwares/evigene/evigene/scripts/prot/../rnaseq/asmrna_dupfilter4.pl
#t2ac: evigeneapp=rnaseq/asmrna_altreclass4.pl, path=/home/juliette/softwares/evigene/evigene/scripts/prot/../rnaseq/asmrna_altreclass4.pl
#t2ac: evigeneapp=makeblastscore.pl, path=/home/juliette/softwares/evigene/evigene/scripts/prot/../makeblastscore.pl
#t2ac: evigeneapp=prot/cdsqual.sh, path=/home/juliette/softwares/evigene/evigene/scripts/prot/../prot/cdsqual.sh
#t2ac: evigeneapp=genes/blasttrset2exons2.pl, path=/home/juliette/softwares/evigene/evigene/scripts/prot/../genes/blasttrset2exons2.pl
#t2ac: evigeneapp=genes/trclass2pubset.pl, path=/home/juliette/softwares/evigene/evigene/scripts/prot/../genes/trclass2pubset.pl
#t2ac: BEGIN with cdnaseq= Trinity.fasta date= Wed Apr 9 07:57:28 CEST 2025
#t2ac: bestorf_cds== Trinity.cds nrec= 607922
#t2ac: isStrandedRNA=0, f:494/r:506
#t2ac: nonredundant_cds== Trinitynr.cds nrec= 550587
#t2ac: nonredundant_reassignbest= 0 of 0
#t2ac: add_consensus_idset n=0 from .//Trinity_clustered.consensus
#t2ac: cds -extend2utr 900,60 bp in Trinitynrxu.cds
#t2ac: nofragments_cds== Trinitynrxucd1.cds nrec= 543803
#t2ac: blastn_cds= Trinitynrcd1x-self98.blastn
#t2ac: skip step4.1 aadup clustering
#t2ac: CMD= /home/juliette/softwares/evigene/evigene/scripts/prot/../rnaseq/asmrna_dupfilter4.pl -aasize Trinity.aa.qual -CDSALIGN -blastab Trinitynrcd1x-self98.blastn -aconsensus Trinity.consensus -pCDSOK=20 -pCDSBAD=20 -ALTFRAG=0.5 -outeqtab Trinity.alntab -outclass Trinity.trclass >Trinity.adupfilt.log 2>&1
#t2ac: asmdupfilter_cds= Trinity.trclass
# Class Table for Trinity.trclass
class %okay %drop okay drop
althi 2.8 6.1 10602 23090
althi1 3 2.4 11208 9242
althinc 3.1 0 11786 0
altmfrag 0.2 1.7 759 6666
altmid 0.7 5.1 2760 19359
main 4.4 1.4 16403 5338
mainnc 5.5 0 20581 0
noclass 7.1 20.9 26588 78211
noclassnc 11.9 0 44517 0
parthi 0 4.5 0 16884
parthi1 0 1.2 0 4587
perfdupl 0 15.3 0 57335
perffrag 0 1.8 0 6807
smallorf 0 0 0 0
---------------------------------------------
total 38.9 61 145204 227519
=============================================# AA-quality for okay set of Trinity.aa.qual (no okalt): all and longest 1000 summary
okay.top n=1000; average=1602; median=1430; min,max=1094,5317; nfull=922; sum=1602884; gaps=0,0
okay.all n=108089; average=121; median=68; min,max=20,5317; nfull=83113; sum=13129864; gaps=0,0
#t2ac: asmdupfilter_fileset= Trinity.okay.tr Trinity.okalt.tr Trinity.drop.tr Trinity.okay.aa Trinity.okalt.aa Trinity.drop.aa Trinity.okay.cds Trinity.okalt.cds Trinity.drop.cds
#t2ac: tidyup output folders: okayset dropset inputset tmpfiles
#t2ac: CMD= mv Trinity.okay.tr okayset/Trinity.okay.tr
#t2ac: CMD= mv Trinity.okalt.tr okayset/Trinity.okalt.tr
#t2ac: CMD= mv Trinity.okay.aa okayset/Trinity.okay.aa
#t2ac: CMD= mv Trinity.okalt.aa okayset/Trinity.okalt.aa
#t2ac: CMD= mv Trinity.okay.cds okayset/Trinity.okay.cds
#t2ac: CMD= mv Trinity.okalt.cds okayset/Trinity.okalt.cds
#t2ac: CMD= mv Trinity.drop.tr dropset/Trinity.drop.tr
#t2ac: CMD= mv Trinity.drop.aa dropset/Trinity.drop.aa
#t2ac: CMD= mv Trinity.drop.cds dropset/Trinity.drop.cds
#t2ac: CMD= mv Trinity.cds inputset/Trinity.cds
#t2ac: CMD= mv Trinity.aa inputset/Trinity.aa
#t2ac: CMD= mv Trinity.aa.qual inputset/Trinity.aa.qual
#t2ac: CMD= mv Trinitynr.cds tmpfiles/Trinitynr.cds
#t2ac: CMD= mv Trinitynr.aa tmpfiles/Trinitynr.aa
#t2ac: CMD= mv Trinitynrxu.cds tmpfiles/Trinitynrxu.cds
#t2ac: CMD= mv Trinitynrxucd1.cds tmpfiles/Trinitynrxucd1.cds
#t2ac: CMD= mv Trinitynrxucd1.cds.clstr tmpfiles/Trinitynrxucd1.cds.clstr
#t2ac: CMD= mv Trinitynrxucd1.log tmpfiles/Trinitynrxucd1.log
#t2ac: CMD= mv Trinitynrcd1x.cds tmpfiles/Trinitynrcd1x.cds
#t2ac: CMD= mv Trinitynrcd1x-self98.blastn tmpfiles/Trinitynrcd1x-self98.blastn
#t2ac: CMD= mv Trinitynrcd1x_db.log tmpfiles/Trinitynrcd1x_db.log
#t2ac: CMD= mv Trinity.alntab tmpfiles/Trinity.alntab
#t2ac: CMD= mv Trinity.adupfilt.log tmpfiles/Trinity.adupfilt.log
#t2ac: CMD= env outcds=1 /home/juliette/softwares/evigene/evigene/scripts/prot/../prot/cdsqual.sh tmpfiles/Trinitynrcd1x.cds
#t2ac: CMD= /home/juliette/softwares/evigene/evigene/scripts/prot/../makeblastscore.pl -pIDENTMIN 99.999 -pmin 0.01 -CDSSPAN -showspan=2 -tall -sizes tmpfiles/Trinitynrcd1x.cds.qual tmpfiles/Trinitynrcd1x-self98.blastn > tmpfiles/Trinitynrcd1x-self100.btall
#t2ac: CMD= /home/juliette/softwares/evigene/evigene/scripts/prot/../genes/trclass2pubset.pl -onlypub -norealt -noaltdrops -log -debug -class Trinity.trclass
#t2ac: CMD= sort -k7,7nr -k2,2 -k6,6nr -k1,1 tmpfiles/Trinitynrcd1x-self100.btall | env pubids=publicset/Trinity.pubids debug=1 /home/juliette/softwares/evigene/evigene/scripts/prot/../genes/blasttrset2exons2.pl > tmpfiles/Trinitynrcd1x.exontab
#t2ac: CMD= /home/juliette/softwares/evigene/evigene/scripts/prot/../genes/trclass2pubset.pl -noaltdrops -exontab tmpfiles/Trinitynrcd1x.exontab -log -debug -class Trinity.trclass
#t2ac: tidy okayset => okayset1st, stage2 reduction => okayset
#t2ac: DONE at date= Wed Apr 9 08:03:15 CEST 2025
#t2ac: ======================================
(END)