Thanks Ken. You're right, I filtered lowly supported transcripts with FPKM cutoff 1. since I have no previous experience, I plan to try other ways that I found the filtering based on expression is the probable best way. You mentioned TransDecoder, I'll be happy to hear your opinion about my task on ORF prediction by Transdecoder.
I made a de novo transcriptome assembly using Illumina reads (PE, 100 bp) generated from a strand-specific library (FR), then filtered lowly supported transcripts as I mentioned above. I tried to find ORF using TransDecoder tool within the Trinity package (20140717). On one hand, given the strand-specific RNA-seq library, I used -S flag as recommended in the TransDecoder guide that generated 13430 peptide sequences (longest_orfs.pep) while with removing the -S, the longest_orfs.pep file contain 46823 sequences. Does it mean that many ORFs located on minus strand, yes? is it normal for such a library? one the other hand, based on blastx of the assembly against uniprot, many hits were on the reverse strand as the qstart was greater than the qend; in your professional view, this results can confirm that many ORF located on minus strand?
I'm new in this field, please help me out to find what happened. It caused me be in a doubt about strand-specific library. I would be highly appreciated for sharing what you know.
Many thanks,