James,
Thank you much for helping resolve this. Some mistake caused two people to use this wrong invocation "-trset dropset/
XXX.drop.tr", which may be my error in documents. Do you recall your reason for this?
-- Don Gilbert
The documents for tr2ncrna need improvement, a worked example would help. Here are some notes
* Needs more explanation in tr2ncrna_about.txt
-- Only calls long ncRNA (> 300 nt default). Use other tools to find short ncRNA (eg. tRNA, snoRNA, microRNA, etc), maybe recommend some
-- The final ncRNA sequence set is in ncrnaset/Name.ncrna_pub.fa, which if you wish can be added to the final mRNA set, okayset/Name.mrna.fa, for a single RNA sequence set. They both have consistent, different gene IDs, and alternate form numbers (ncRNA ids start after mRNA), the ncRNA headers have 'type=ncRNA;' versus 'type=mRNA;' in the mrna.fa, and other annotations, 'oid=AT1G72645.1' is your original transcript ID.
-- This is an approximate classification of long ncRNA, it will include some that are coding with long UTR transcripts. It should also contain a full non-redundant set of long ncRNA genes in your transcript set. Using this on mRNA set of the model weed (AT) calls 1000 as ncRNA, or 2200 as you found using their full RNA gene set. These are mostly short-coding, long UTR transcripts, and those models may include joined coding + ncRNA genes, or may be long-UTR coding genes, that are hard to distinguish.
-- Consensus or agreement across transcript assembler methods is a valuable piece of evidence for ncRNA calls. Many users rely on one assembler, ie Trinity, but even if that is *best* (I disagree), use of 3 or more RNA assemblers provides a consensus measure that works, as Adam Voshall pointed out to me and others.
-- Uses step-wise subtraction of mRNA, too short or likely artifacts
-- tr2ncrna.log file lists steps with retained transcripts;
The *.trids files list original IDs for each step.
S1 9354 at18tair_mrna.nomrna.trids : not in okayset/at18tair_mrna.okay.mrna
S2 3487 at18tair_mrna.nodropbigcds.trids : remove drops with large CDS
S3 1658 at18tair_mrna.notokmrna.trids : no aligns with mrna (putative ncrna)
S4 1247 at18tair_mrna.longnok.trids : long enough to avoid artifacts
S5 1067 at18tair_mrna.ncrna_pub.trids : final calls, after drop likely duplicate ncrna models
at18tair_mrna.tr2ncrna.log
#ncrna: BEGIN with input= at18tair_mrna.fa date= Wed Nov 9 15:29:17 EST 2022
#ncrna: tr2ncrna( at18tair_mrna.fa, okayset/at18tair_mrna.okay.mrna)
S1 #ncrna: remove_mrna_oids kept=9354/48147,
at18tair_mrna.nomrna.trS2 #ncrna: remove_bigcdsdrops kept=3505/48147,
at18tair_mrna.nodropbigcds.trS3 #ncrna: remove_mrna_aligned kept=1658/48147,
at18tair_mrna.notokmrna.trS4 #ncrna: long_seqs(>=300) kept=1247/48147,
at18tair_mrna.longnok.trS5 #ncrna: ncrna_selfalign count=1247/48147, at18tair_mrna.longnok.self97.blastn
S5 #ncrna: altparclass loci=1039 for 1247 tr, in at18tair_mrna.longnok.dgclass.tab
S5 #ncrna: nagree=0 in at18tair_mrna.nodropbigcds.tr.consensus ** No multiple-assembler consensus here
S5 #ncrna: make_allevdtab nevd=1240/1247 in at18tair_mrna.longnok.allevd.tab
S5 #ncrna: ncrna_classgenes ngene=1032,ok=1067,drop=173/1247 in at18tair_mrna.longnok.allevd.ncrna_class
S5 #ncrna: ncrna_pubseqset kept=1067/48147, in at18tair_mrna.ncrna_pub.fa