How to interpret Salmon outputs - lowly expressed contigs?

24 views

Skip to first unread message

Chrissie Madden

unread,

Jul 25, 2017, 8:28:19 PM7/25/17

to Sailfish Users Group

Good morning,

I'm sorry if this is a totally silly question, but I've search all the forums and I can't find any place that shows you how to filter lowly expressed contigs/transcripts from the Salmon output. I have run Salmon using my de novo assembly and mapping the concatenated R1.fasta and R2.fasta from all of my samples back to the assembly. I get a lovely output from Salmon:

Name	Length	EffectiveLength	TPM	NumReads
comp0_c0_seq1	399	251.91	0.0826051	5
comp1_c0_seq1	296	152.045	0.218978	8
comp2_c0_seq1	222	86.0457	0.145102	3
comp3_c0_seq1	255	114.534	0	0
comp3_c0_seq2	278	135.34	0.0307508	1

However, I can't find any instruction of how to filter my lowly expressed contigs. I have ~360,000 contigs.

I can work out the % expression (NumReads/ total contigs*100) but 1% expression is far too high as it eliminated 97% of the contigs! What have other people done to filter out lowly expressed contigs from Salmon?

All instructions I've seen mention filtering >1% per-component (IsoPct) but those are from RSEM packages, which my data doesn't produce (I only get the first 4 columns, no RSEM etc).

Thank you in advance,

Chrissie

Reply all

Reply to author

Forward

0 new messages