Hi Nick,
Thanks for putting this together! I'll do my best to address each of the scenarios.
Scenario 1: You're correct that the behavior of SlidingTimeWindowReservoir is more intuitive. That doesn't make ExponentiallyDecayingReservoir any less correct, it just needs to be interpreted in the context of the rate. Perhaps we should consider changing the default reservoir type?
Scenario 2: If I understand the behavior of the emulator, the first 9960/10000 executions last 30ms and the remaining 40/10000 last 15000ms? If so, then the observed behavior is correct. You will have seen a spike in the 99.9th percentile, but the 95th percentile will not move unless you change 9960 on line 135 to 9500 or less. The 95th percentile is simply less sensitive to such spikes. Lets say you have 3 readings: 30, 30, 15000, the median (50th percentile) is 30. Similarly, if you have 100 readings, the first 96 of which are 30 and the 97-100th are 15000, the 95th percentile is also 30.
Scenario 3: I'll need to parse this a bit more. I can't speak to the accuracy of the sampling, but I will say that it isn't intended to be 100% accurate, it is intended to be highly performant while being accurate enough for the purpose of monitoring a system's responsiveness.
Ryan