I'm hoping a question(s) of mine can be answered.
I've just been using Corset to cluster transcripts from both Trinity and rnaSPAdes assemblies. To do so, I've first run Salmon v0.13 without the --validateMappings flag. I've then run Corset with the -m flag set to 0 in order to preserve all transcripts. However, it appears Corset is still filtering away transcripts in some fashion as the "corset-clusters.txt file contains fewer lines (transcripts) than the input transcriptomes and Salmon quant files. I realise lowly expressed transcripts might be not useful in many situations, but I'm looking for orthologues between species, and a gene that is expressed at a low level in one species might be interesting if it is being expressed at a high level in another species.
Can Corset be set to retain all transcripts? Or, if not, how is Corset determining which transcripts to dump?
Hope that makes sense.