Re: [evidentialgene] Is it possible to search CDS just in the + strand using tr2aacds.pl?

93 views
Skip to first unread message

Don Gilbert

unread,
Jan 5, 2023, 3:22:10 PM1/5/23
to Salvador Gonzalez Juarez, EvidentialGene

use this version, that now correctly  detects stranded RNA, and will become the default version soon :
evigene22may07/scripts/prot/tr2aacds4_22a.pl  This is in current evigene.tar bundle, or also here
tr2aacds.pl -strandedrna   : auto|yes|no; 
default auto == maybe, tests cds-orient, reliable but must recalc cds if stranded
For those of you who know your RNA data is, or is not, stranded, add -stranded yes  (or no). For those who don't know, the program will detect it (all CDS are same direction), but it needs an extra compute step.

On Thu, Jan 5, 2023 at 4:48 AM Salvador Gonzalez Juarez <salvado...@gmail.com> wrote:
Dear Dr. Gilbert,

First, let me thank you for thank you very much for making EvidentialGene.

I am using tr2aacds.pl (v22may07) for collapsing long-read transcriptome assemblies from different samples in a non-model organism. 

$ tr2aacds.pl -NCPU=${task.cpus} -MAXMEM=${mem_MB} -MINAA=${params.minaaLength} -logfile -debug -species=Amex -noutrorf -cdnaseq AmexT_${params.transcriptomeVersion}_stringtie-merge.fa

At earlier steps from my assembly pipeline I determine the strandness of each read, looking to get a stranded transcriptome. By analyzing the output of tr2aacds.pl, I noticed that each FASTA header of the file "inputset/AmexT_v50_stringtie-merge.cds" indicates the strand in which the CDS was found

>AMEX70DD.1.1 type=CDS; aalen=314,28%,partial5-utrbad; clen=3268; strand=+; offs=1-945; codepot=Code/0.0059;

>AMEX70DD.10.1 type=CDS; aalen=116,52%,complete-utrpoor; clen=670; strand=-; offs=611-261; codepot=Code/0.0047;

While this option could allow me to find ORFs in transcripts whose orientation was incorrectly determined, I would like to ask if there is a way to inhibit the search for ORFs in the negative strand of transcripts.

Kind regards,

Salvador

--
You received this message because you are subscribed to the Google Groups "EvidentialGene" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidentialgen...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/evidentialgene/d6507bfd-c6c2-48aa-aabe-3c74c9be83ecn%40googlegroups.com.


--
don gilbert - www.bio.net - bioinformatics - indiana.u.

Don Gilbert

unread,
Jul 19, 2023, 8:46:03 PM7/19/23
to EvidentialGene
The current release of EvidentialGene tr2aacds.pl pipeline for RNA assembly now auto-detects stranded RNA (including long-read RNA).  The version "tr2aacds4_22a.pl" is replaced by tr2aacds4.pl, and tr2aacds.pl is a symbolic link to this.  

Find updated script code at
as evigene23jul15.tar

Please note that in addition to evigene/scripts/prot/tr2aacds.pl for coding-sequence gene assemblies, there is
evigene/scripts/genes/tr2ncrna.pl  a separate pipeline for non-coding gene sets, which is run after tr2aacds, to recover the non-coding RNA genes of your transcript assembly.  

This evigene/scripts/evgpipe_sra2genes4v.pl  pipeline script includes both steps above, coding and non-coding, along with other basic steps in assembling RNA fragments to complete, validated gene sets.  evgpipe_sra2genes can be used by steps that you want, skipping others, as it writes scripts to call on the components of evigene as needed.  It warrrants updating, but after I get done with the below tr2dupgene portion...

There are other updates, notably to Gnodes pipeline for DNA measures of genome assemblies. This includes an un-finished portion that will determine duplicated genes in transcript assemblies, using DNA evidence along with the RNA assembly.
This classification of duplicated genes, based on DNA evidence, is still a difficult problem in genome informatics, one which I don't think there is a suitable solution as yet.   I hope that this update when ready will provide one.

evigene/scripts/genes/tr2dupgeneclass1a.pl
opts: -cds tr2inputs.cds -pubids tr2okset.pubids
-genex tr2okset1cds_dnareads_gnodes.genexcopy
[-refids reference_genes.idtab ]
-outname tr2okset_dupgenes1a

-refids reference_genes.idtab option to compare refgenes (trid locusid) w/ dupgeneclass table, eg w/ NCBI gene refsets built on chrasm,

Requires Evigene data, from SRA2Genes or tr2aacds, with inputset, okayset, dropset transcripts + tables
Requires Gnodes genecds.genexcopy results of DNA depth for gene cds, eg genoasm/gnodes_pipe.pl -cds okayset/main.cds -reads xxx.fastq

On Thursday, January 5, 2023 at 3:22:10 PM UTC-5 Don Gilbert wrote:

use this version, that now correctly  detects stranded RNA, and will become the default version soon :
evigene22may07/scripts/prot/tr2aacds4_22a.pl  This is in current evigene.tar bundle, or also here
tr2aacds.pl -strandedrna : auto|yes|no;
default auto == maybe, tests cds-orient, reliable but must recalc cds if stranded
For those of you who know your RNA data is, or is not, stranded, add -stranded yes (or no). For those who don't know, the program will detect it (all CDS are same direction), but it needs an extra compute step.

Reply all
Reply to author
Forward
0 new messages