How to interpret Salmon outputs - lowly expressed contigs?

24 views
Skip to first unread message

Chrissie Madden

unread,
Jul 25, 2017, 8:28:19 PM7/25/17
to Sailfish Users Group
Good morning,

I'm sorry if this is a totally silly question, but I've search all the forums and I can't find any place that shows you how to filter lowly expressed contigs/transcripts from the Salmon output. I have run Salmon using my de novo assembly and mapping the concatenated R1.fasta and R2.fasta from all of my samples back to the assembly. I get a lovely output from Salmon:

Name Length EffectiveLength TPM NumReads
comp0_c0_seq1 399 251.91 0.0826051 5
comp1_c0_seq1 296 152.045 0.218978 8
comp2_c0_seq1 222 86.0457 0.145102 3
comp3_c0_seq1 255 114.534 0 0
comp3_c0_seq2 278 135.34 0.0307508 1

However, I can't find any instruction of how to filter my lowly expressed contigs. I have ~360,000 contigs.

I can work out the % expression (NumReads/ total contigs*100) but 1% expression is far too high as it eliminated 97% of the contigs! What have other people done to filter out lowly expressed contigs from Salmon?

All instructions I've seen mention filtering >1% per-component (IsoPct) but those are from RSEM packages, which my data doesn't produce (I only get the first 4 columns, no RSEM etc).

Thank you in advance,
Chrissie


Reply all
Reply to author
Forward
0 new messages