Dear Don,
After a couple of years' hiatus I'm jumping back into transcriptomics temporarily. I've successfully run '
tr2aacds.pl' on my "over-assembly", resulting in the expected 'okayset', 'inputset', 'dropset' etc.
In the past, my next step was to clean up the names of the okayset files using your script:
perl $evigene/
evgmrna2tsa.pl -onlypubset -idprefix My_species -class evigene_formatted_assembly.trclass
When I went to do this using the most recent version of Evigene, I got a deprecation notice. I was informed that:
However,
tr2aacds.pl (the symlink of
tr2aacds4.pl) did not call the subprogram
trclass2pubset.pl when I ran it as far as I can tell: there were no "publicset" files. Instead I next manually ran
trclass2pubset.pl, but I'm a bit unsure of what to provide for its options (specifically those I've bolded):
$evigene/genes/
trclass2pubset.plEvidentialGene trclass2pubset -trclass myspp.trclass [-idprefix EVGm ]
makes tables of public ids and main-alt linkage, from results of tr2aacds
opts: -idprefix Thecc1EG
-mrna myspp.mrna -names mrna.names -keepdrop keep_drop_ids.table -preserveOldIds=old.pubids
-nosizesort -[no]pubsortseq -debug
version 2020.03.15
In lieu of knowing the correct options, I ran it from within the directory I first ran
tr2aacds.pl as:
$evigene/genes/
trclass2pubset.pl -trclass evigene_formatted_assembly.trclass -idprefix Rtaylori
In the resulting publicset/ directory there are the nicely renamed transcriptome files as desired. However, the number of sequences do not appear to match those in the corresponding okayset/ files like I would hope. For example:
$ grep -c ">" publicset/evigene_formatted_assembly.cds_pub.fa
149053
$ grep -c ">" okayset/evigene_formatted_assembly.okay.cds
96931
Questions:
1. Am I running these correctly?
2. Why don't the number of sequences match?
Thanks,
Charles