Good morning,
I'm sorry if this is a totally silly question, but I've search all the forums and I can't find any place that shows you how to filter lowly expressed contigs/transcripts from the Salmon output. I have run Salmon using my de novo assembly and mapping the concatenated R1.fasta and R2.fasta from all of my samples back to the assembly. I get a lovely output from Salmon:
Name |
Length |
EffectiveLength |
TPM |
NumReads |
comp0_c0_seq1 |
399 |
251.91 |
0.0826051 |
5 |
comp1_c0_seq1 |
296 |
152.045 |
0.218978 |
8 |
comp2_c0_seq1 |
222 |
86.0457 |
0.145102 |
3 |
comp3_c0_seq1 |
255 |
114.534 |
0 |
0 |
comp3_c0_seq2 |
278 |
135.34 |
0.0307508 |
1 |
However, I can't find any instruction of how to filter my lowly expressed contigs. I have ~360,000 contigs.
I can work out the % expression (NumReads/ total contigs*100) but 1% expression is far too high as it eliminated 97% of the contigs! What have other people done to filter out lowly expressed contigs from Salmon?
All instructions I've seen mention filtering >1% per-component (IsoPct) but those are from RSEM packages, which my data doesn't produce (I only get the first 4 columns, no RSEM etc).
Thank you in advance,
Chrissie