Hi Alex,
Thank you for providing the clarification and for suggesting the changes to improve the analysis.
I performed the analysis as discussed. 1pass with GTF annotation and 2pass with GTF annotation and combined & filtered SJ_out from all samples. Further for my curiosities sake I performed variation calling on processed alignments (read grouping, indel-realignment and base quality re-calibration) from both 1pass and 2pass.
I filtered the variations as suggested in GATK RNA-Seq best practices and used vcftools to compare them. Following is the comparison result.
Found 153308 SNPs common to both files.
Found 2265 SNPs only in main file.
Found 2894 SNPs only in second file.
There are 2 thousand odd variations specific to either 1pass or 2pass.I loaded the alignments in IGV to check some of the variations which were specific to 1pass and 2pass.
The variations which were specific to 2pass displayed similar coverage in both alignments and similar number of alternate bases but still it was called as variation in 2pass but not in 1pass.
Example screenshot is attached herewith as 2pass_specific_variation.png
The variations which were specific to 1pass displayed much higher coverage in 1pass alignment compared to 2pass alignment.Reads could have been rearranged to a different location.
Example screenshot is attached herewith as 1pass_specific_variation.png
One thing I noted in both alignment was that quite a lot of reads were tagged as "Alignment NOT primary" by IGV. These must be multi-mapped reads. Do you think it is better to align without multi-mapping or remove multi-mapping reads before performing variation calling.
The only reason I am trying to compare both 1pass and 2pass is because this analysis is an addition to an earlier analysis we had performed where we have merged and filtered the reported genes and transcripts from 50 samples to obtain a GTF annotation which is being used for this analysis. And we have performed 1pass with that GTF annotation, so I wanted to see the difference in that alignment vs the 2pass which is GTF as well as the SJ reported in 1pass.
Looking forward to your comments on the comparison of 1pass and 2pass. Thank you.
Regards,
Veer
PS: Well actually the number of multi-mapped reads are 3-4 % as reported by the alignment statistics. So not that many actually