Hi Kris,
Responses below:
On Sun, Dec 11, 2022 at 3:40 PM Kris Alavattam <
kalav...@gmail.com> wrote:
>
> Thanks, Brian—yes, that's very helpful. When you mention "the option to require sufficient overlap among alignments to assemble to again mitigate the neighboring fusion transcript issue," you mean adjusting the --stringent_alignment_overlap parameter when running Launch_PASA_pipeline.pl—is that correct?
Yes, that's right.
>
> In trial experiments I'm running with PASA, in which I'm using .fasta files from genome-guided and genome-free Trinity (but nothing from StringTie/Cufflinks/etc. yet), so it makes me wonder if it would be helpful to increase the value for --stringent_alignment_overlap from 30.0 to perhaps something higher? (Currently, I'm calling Launch_PASA_pipeline.pl with --stringent_alignment_overlap 30.0, following the advice here.) If I understand things correctly, a higher percentage overlap for --stringent_alignment_overlap could/would mitigate the false identification of fusion transcripts that result from working with data from small, gene-dense genomes such as S. cerevisiae—is that right?
>
It'll mitigate PASA contributing more to it, for sure, but it won't
address the problem for those cases where the input transcripts are
already fused. The 30% is probably fine.
> A little experimental context could be helpful here: We're working with a S. cerevisiae knock-out model that increases global antisense transcription, and we want to accurately identify these ncRNA transcripts and use the custom annotations in downstream analyses. In our work so far, we see a lot of both fusion and (apparently) fragmentary transcripts. Do you think that adjusting the value for --stringent_alignment_overlap could be useful in this context? Or perhaps leaving the --stringent_alignment_overlap at 30.0 is reasonable? Thinking of this, I'm reminded also of the --gene_overlap option available in Launch_PASA_pipeline.pl (which should be called together with the -L flag and --annots_gff3 option). Could calling Launch_PASA_pipeline.pl with --gene_overlap set to some value be potentially useful in this context?
Given the high quality of the reference annotations for S. cerevisiae,
using --gene_overlap is easily justified.
If I remember correctly, there aren't many introns in S. cerevisiae.
For antisense transcript to be properly identified as such, you'd need
to have to carefully take into account the transcribed orientation
based on the aligned orientation - with Trinity run in the
strand-specific modes to ensure proper transcript orientation during
reconstruction. Just something to be aware of, but you probably
already dealt with this given you're already deep into the process.
Hope this helps,
~b
> To view this discussion on the web visit
https://groups.google.com/d/msgid/pasapipeline-users/2cafe91e-75b5-4d7b-b2ba-ccfbf5e28ba4n%40googlegroups.com.