Hey
I wanted to try and something like this - run STAR once, and then use the outputs it produces for downstream quantification (via RSEM or Salmon), and fusion finding as well (via STAR-Fusion).
I first set the parameters on STAR as recommended by the RSEM toolkit (the Encode pipeline parameters).
Then I started adding in parameters as recommended by the STAR-Fusion toolkit (by looking at the SF code to see which parameters it uses).
Then I wanted to see how the quantification results (i.e. TPM) change after setting these additional parameters on STAR.
I analyzed this only for one sample currently, and wanted to just ask a few question before continuing on.
So, after running my quantification tool for different parameter-space configurations, I calculated the Pearson coefficient of correlation (if some other measure is better, please do let me know) with respect to the configuration where only the quantification parameters are set (Encode pipeline), and no STAR-Fusion parameters are present at all.
If adding in all parameters that STAR-Fusion 1.9 recommends, I get the Pearson coefficient of 0.95 (for my particular sample in question).
Then I started playing around with the parameters, and noted that the parameters that affect this the most are the peOverlap parameters (i.e. --peOverlapNbasesMin and --peOverlapMMp).
The defaults are 0 and 0.01, while the STAR-Fusion code sets this to 12 and 0.1, respectively.
If leave these two parameters to their default values, I get the Pearson coefficient of 0.99.
So I just wanted to write up my observing and possibly get an opinion on the feasibility of running such a pipeline.
Running alignment once and then doing downstream analysis would surely save a lot of time/compute-power, hence my wish to try this.
Even though the peOverlap parameters aren't in the Encode pipeline (they probably didn't even exist when the Encode pipeline was specified first), it doesn't seem to me that I should always explicitly leave them as defaults, even for quantification - based on the description of the peOverlap parameters, merging of reads with short insert sizes doesn't seem like something critical that should not be done (maybe even the opposite?).
If you have any comments on this (Alex or Brian), I'll gladly listen you your advice!
Thank you upfront for any answers as well!