Hi Daniel,
Thanks for providing a thorough description of your observations (and how you processed the data). The difference you're seeing is almost certainly a result of running salmon with the --useVBOpt option. When run with the varational bayesian EM, salmon places a weak but uniform prior of 1e-3 nucleotides per-base across the entire transcriptome. The practical effect of this prior is that it reduces the variance among very low expression transcripts (i.e. it acts as a sort of regularizer), however it does this by preferring to allow these transcripts to be expressed at a consistent but low abundance. Thus, the overall results returned by the default optimizer (the EM) will tend to be sparser, while the overall results returned by the VBEM optimizer will tend to be less sparse (in exactly the manner you are seeing here).
I would venture to guess that if you drop the --useVBOpt option, you will see the very low abundance peak go away. Alternatively, I would venture to guess that you would also see this go away if you set a sparser prior (controlled by the --vbPrior option, where the interpretation of this argument is the default per-nucleotide, transcriptome-wide prior). Interestingly, the difference you see between these approaches is consistent with the general differences you'd tend to observe between such optimization approaches based on the strength of the prior (e.g. the default VB prior for Salmon should give you sparsity similar to e.g. BitSeq, while the EM algorithm should give you sparsity similar to RSEM / kallisto). Please let me know if this explains the difference, and if my description makes sense to you. As an aside, it is actually, I would argue, still an active research question as to which approach is "better". A (uniform) prior tends to stabilize variance, especially at low abundance, but the cost paid for this is bias toward less sparse estimates and the calling of certain absent transcripts as expressed at very low levels. Conversely, the EM algorithm tends to produce much sparser estimates and therefore calls 0's as 0's more often, but at the price of substantially higher variance at the low-end of expression. We (and others) are exploring the downstream effects of these different optimization approaches, but I've yet to find a single, comprehensive metric that properly captures a perfectly intuitive and practical notion of "accuracy".
Best,
Rob