Missing transcripts?

30 views
Skip to first unread message

John Henry

unread,
Feb 14, 2021, 8:14:23 PM2/14/21
to corset-project
Hi there,

I'm hoping a question(s) of mine can be answered.

I've just been using Corset to cluster transcripts from both Trinity and rnaSPAdes assemblies. To do so, I've first run Salmon v0.13 without the --validateMappings flag. I've then run Corset with the -m flag set to 0 in order to preserve all transcripts. However, it appears Corset is still filtering away transcripts in some fashion as the "corset-clusters.txt file contains fewer lines (transcripts) than the input transcriptomes and Salmon quant files. I realise lowly expressed transcripts might be not useful in many situations, but I'm looking for orthologues between species, and a gene that is expressed at a low level in one species might be interesting if it is being expressed at a high level in another species.

Can Corset be set to retain all transcripts? Or, if not, how is Corset determining which transcripts to dump?

Hope that makes sense.

Many thanks,
John


Nadia Davidson

unread,
Feb 14, 2021, 8:25:22 PM2/14/21
to corset-project
Hi John,

When you run salmon to get he "eq" files, and then corset all the transcripts with 0 reads get removed. This is a consequence of how the intermediate data is stored (only those with read counts are output). If you use bowtie for mapping and then run corset (https://github.com/Oshlack/Corset/wiki/Example#using-bowtie) in theory it should report all transcripts. However this is much slower and although the transcripts will be reported, they will not be clustered together with others. Corset uses the reads to cluster transcripts together, rather than transcript sequence. I hope this answer is helpful.

Cheers,
Nadia.

John Henry

unread,
Feb 14, 2021, 8:30:16 PM2/14/21
to corset-project
Hi Nadia,

Thanks for your incredibly fast response! Your explanation is exactly what I was after. 

Cheers,
John

Reply all
Reply to author
Forward
0 new messages