Errors using evgmran2tsa.pl

200 views
Skip to first unread message

Feng Tao

unread,
Jan 28, 2020, 5:48:02 PM1/28/20
to EvidentialGene
Dear Don (and other readers),

I am currently constructing Phragmites transcriptome. As the data constraints, I would like to construct a good transcriptome as I can. I followed your philosophy on this. 

So far, I have used four assemblers as Trnitiy, transabyss, shannon and SOAPdenova. I combined them and change the format by trformat.pl script. 

Then I followed the instruction to use the tr2aacds.pl to reduce the the combined assembly, and the code as:
perl evigene/scripts/prot/tr2aacds.pl -mrnaseq Phragmites_RNA/all_assemlies/Jan_20_no_cdhit/all_reformat.tr -NCPU=8 -MAXMEM 200000 1>Phragmites_RNA/all_assemlies/Jan_20_no_cdhit/tr2aacds.log 2>Phragmites_RNA/all_assemlies/Jan_20_no_cdhit/tr2aacds.err

I obtained folders as all_reformatnrcd1_blsplit, all_reformat_split, dropset and okset. I also had files as all_reformatnrcd1_db.perf, all_reformat.trclass, all_reformat.trclass.sum.txt. 

After this, I am trying to use the evgmrna2tsa2 script as:
perl evigene17dec14/scripts/evgmrna2tsa.pl -onlypubset -idprefix PaustralisEVm -class Phragmites_RNA/all_assemlies/Jan_20_cdhit/all_reformat.trclass 

However, some errors occur showing as:
#m2t: EvidentialGene mrna2tsa.pl VERSION 2018.06.18
#m2t: FATAL Missing -mrna .

So my question is which part I did wrong and can you point out and help me to solve it? 

As I am still the newbie in this area, although I checked documents listed in this discussion group and some in evidentialGene website. My final purpose is to contruct the transcriptome and compare gene experssion in different tissues and annotate it. Do you have any suggestion what I should keep in mind in the following analyses? 

Thanks for your help! Look forwarda to your reply. 

Feng

Don Gilbert

unread,
Jan 28, 2020, 9:12:39 PM1/28/20
to Feng Tao, EvidentialGene
Feng,

There may have been an error in results from tr2aacds, or this may be a mix-up for evgmrna2tsa in finding files left by the former.
Your log files may report if tr2aacds had errors finishing.
You can try changing to the directory with the file all_reformat.trclass, which should have a subdirectory okayset/
Check that okayset/ contains large sequence files, likely named all_reformat.okay.aa, .cds, and .tr and all_reformat.okalt.aa, .cds and .tr
Then retry the program evgmrna2tsa.pl .. as per these steps:

cd Phragmites_RNA/all_assemlies/Jan_20_no_cdhit/
ls -l all_reformat.trclass
ls -l okayset/
# do files have data? okayset/all_reformat.okay.aa  okayset/all_reformat.okalt.aa  okayset/all_reformat.okay.tr ..
# assuming this is path to your evigene folder: $HOME/evigene17dec14/scripts/
# run this command, the -debug -log options add information to log file
perl $HOME/evigene17dec14/scripts/evgmrna2tsa.pl -debug -log -onlypubset -idprefix PaustralisEVm -class all_reformat.trclass

If it works now, there will be a new publicset/ folder containing reformatted sequences and tables of gene locus IDs and information.

- Don Gilbert


--
You received this message because you are subscribed to the Google Groups "EvidentialGene" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidentialgen...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/evidentialgene/bfe59f24-a6a9-4f9d-9704-97acfb0c04b5%40googlegroups.com.


--
don gilbert - www.bio.net - bioinformatics - indiana.u.

Feng Tao

unread,
Jan 30, 2020, 11:34:04 AM1/30/20
to EvidentialGene
Thanks for kind reply, Don. 

My log file regarding the tr2aacds is like that:
# Class Table for Phragmites_RNA/all_assemlies/cdhit/all_reformat.trclass
class           %okay   %drop   okay    drop
althi           4.6     6.2     161192  216335
althi1          4       10.1    139198  352626
althia2         0       0.38    0       13367
altmfrag        0.6     0.7     21745   27818
altmfraga2      0.07    0.06    2686    2175
altmid          0.7     0.9     24519   33336
altmida2        0.07    0.04    2594    1620
main            2.8     4.5     98697   156976
maina2          0.17    0.12    6091    4260
noclass         2.4     18.6    85406   649485
noclassa2       0.01    0.08    535     3109
parthi          0       7.9     0       276427
parthi1         0       1.7     0       62438
parthia2        0       0.31    0       11077
perfdupl        0       15.4    0       537639
perffrag        0       16.8    0       586512
---------------------------------------------
total           15.6    84.3    542663  2935200
=============================================
# AA-quality for okay set of Phragmites_RNA/all_assemlies/cdhit/all_reformat.aa.qual (no okalt): all and longest 1000 summary
okay.top         n=1000; average=1252; median=1090; min,max=890,5268; nfull=894; sum=1252583; gaps=31,0
okay.all         n=190729; average=113; median=72; min,max=30,5268; nfull=130791; sum=21702940; gaps=4955,0

The error file is: 
#t2ac: EvidentialGene tr2aacds.pl VERSION 2018.06.18
#t2ac: CMD: tr2aacds.pl  -mrnaseq Phragmites_RNA/all_assemlies/cdhit/all_reformat.tr -NCPU=8 -MAXMEM 2000

I do have the all_reformat.trclass and  okayset/. Then I rerun the evgmrna2tsa program, and now output files as marnalt.tab, mrna2tsa.info, pudids, pubids.old, pubids.realt.log. 

The err file is produced as: 
#m2t: EvidentialGene mrna2tsa.pl VERSION 2018.06.18
#m2t: Warn: make_annotab missing publicset/all_reformat.mrna_pub.aa
#m2t: ERR: openRead publicset/all_reformat.mrna_pub.aa
#m2t: ERR: openRead publicset/all_reformat.mrna_pub.cds

At the end, I should have the *.mrna_pub.fa output, right? Which currently I do not have. 

Thanks for your help! 
To unsubscribe from this group and stop receiving emails from it, send an email to evident...@googlegroups.com.

Don Gilbert

unread,
Jan 30, 2020, 4:21:13 PM1/30/20
to Feng Tao, EvidentialGene
Feng,

I think the  problem  you found is that output files from tr2aacds.pl are not moved into subdirectories: okayset/ dropset/ inputset/ and tmpfiles/
This is what the -tidy option of tr2aacds.pl does, and evgmrna2tsa.pl is expecting sequence files in those subdirectories.  You can find this layout
at this evigene_tr2aacds_test sample:

 arath_TAIR10_20101214up.cdna.gz    : input transcript set

 arath_TAIR10_20101214up.tr2aacds.log   : from -log option
 arath_TAIR10_20101214up.trclass    : tr2aacds output class table

 dropset/   : redundant, dropped transcripts  
 inputset/  : aa and cds sequence of input
 okayset/   : non-redundant sequences
   arath_TAIR10_20101214up.okalt.aa.gz    
   arath_TAIR10_20101214up.okalt.cdna.gz  
   arath_TAIR10_20101214up.okalt.cds.gz  
   arath_TAIR10_20101214up.okay.aa.gz    
   arath_TAIR10_20101214up.okay.cdna.gz  
   arath_TAIR10_20101214up.okay.cds.gz    
 tmpfiles/  : work data files

evgmrna2tsa.pl requires these files: arathup.trclass, and the okayset/arathup.okay* and arathup.okalt* sequence files
(but dont need to be .gz, gzipped).

If you move your tr2aacds output sequences into okayset/ subdirectory, like above, then this command may work right:

  $evigene/scripts/evgmrna2tsa.pl -onlypubset -idprefix ArathEGr -class arath_TAIR10_20101214up.trclass

I suggest you move other data files in the directory with all_reformat.trclass into another, temporary directory.  Including those results of prior attempts with evgmrna2tsa.pl

 - Don GIlbert



To unsubscribe from this group and stop receiving emails from it, send an email to evidentialgen...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/evidentialgene/61c7676b-eb84-4e8b-899f-4caea099aa7d%40googlegroups.com.

Feng Tao

unread,
Feb 19, 2020, 3:43:25 PM2/19/20
to EvidentialGene
Hi, Dan. Thanks for your help, I finally obtained the results from evgmrna2tsa.pl. So the products should be aa, aa_pub.fa, cds, cds_pud.fa, mrna, mrna_pub.fa, ann.txt, right? For the downstream analyses and transcriptome assessment, I should use the mrna_pub.fa. 

I also checked the transcript amount of the reconstructed transcriptome, are 594317 transcripts too many for the plant? 
Reply all
Reply to author
Forward
0 new messages