Ok, so if Supertranscripts is not good for annotation what should I do with my Corset output ? fetchClusterSeqs.py is for working with a subset or clusters of interests. But I am not working on subset and what I want to do is use my assembled transcripts from Trinity, remove the redundancy by clustering transcripts into genes (I see people use CD-HIT too, but I'll like to use Corset) and then use these clusters ("genes") for all downstream RNAseq applications such as ORF prediction, annotation, quantification, mapping, DEseq, GO and KEGG enrichments etc...
I just don't understand what should I do if building superTranscripts with Lace is not good for functional annotation. Specifically, I don't understand what should I do with clusters having more than one transcript, how do I use those for the aforementioned downstream applications? Are they all corset "genes" or do somehow I have to filter among those? For example, should I take the longest transcript in the cluster as a representative?
To be even more specific - In one research there were 138.752 trinity transcripts and using corset they came to a final set of 72.826 genes which was used for all further analysis. I don't understand how do I come to that (lower) number if I have multiple transcripts assigned to same Cluster ID. Hope I explained well and I apologise if it's a silly question :)
Best,
Lada