Dear Alex and others,
We are using RNA-seq reads to calculate allele-specific expression (+ quantify gene/isoform expression). We are using RSEM (rsem-calculate-expression) with STAR alignment and outputting a genome aligned .bam file (output-genome-gam, rsem).
Then from the genome aligned (and sorted) bam file, we are using a custom pysam pileup script to count reference and alternative alleles based on a input vcf (containing variants).
However, when running rsem/star multiple times (identical settings), the results are not fully reproducible. The bam files are not identical and the count results are not identical (see plot of difference between alt counts for consecutive runs)..
I suspect that this has to do with the fact that STAR is not fully deterministic in the way multimapping reads are output as primary/secondary alignments and the order of them (as described in the STAR manual under multimappers). Also, since we are running multiple threads, setting the seed does also not seem like a valid solution.
Do you have any suggestions on how to make this process deterministic (each run is identical)?
We are running Rsem v1.3.0 & STAR v.2.5.3a
Best Regards,
Andreas Hoff