John: Given your description of your experiment, was the way you manipulated (selected) the RTT by selecting different pairs of Azure data centres? If so, how did you control for possible differences in packet loss, or concurrent congestion, in these different scenarios? Did you monitor packet loss?
You commented that you did not replicate the tests with TCP, which left me to wonder about the results, and whether variations in packet loss played into this. It would be good IMO to not only do the same experiments using TCP, it would be good to do both experiments somewhat contemporaneously. For instance, alternate back-n-forth between running a TCP then a QUIC then a TCP experiment, etc. You should probably do this alternation until you see low variations for TCP, and low variations for QUIC. You probably can't run the two tests concurrently, or else congestion from one will impact the other <sigh>.
It is also probably interesting to monitor packet loss, and make sure that UDP packets are not being mistreated on one of the paths.
Most commonly, when experiments like yours are done, they are performed in a very controlled (laboratory?) environment. When SPDY was being developed/tested, Mike Belshe even avoided using an Intranet, and instead ran tests on a physical private network, avoiding ANY chance of noise from unexpected concurrent traffic. Mike even worked very hard to reduce/remove not only external spurious traffic, he also reduced (eliminated?) concurrent process CPU costs on his stacked-pile-of-machines, which I affectionately called the "Belshe-Borg!" When running over the Internet, unexpected variations in concurrent flows can wreak havoc on consistency of results.
IMO, unless your goal is to debug performance problems in Azure (and potential Azure focused optimizations!), I'd steal a page from Belshe's playbook, and certainly avoid using cloud computing for this class of experimentation. ...but... YMMV.
p.s., You might also want to monitor the RTT *during* your experiments. If there is bufferbloat variation on one of the paths (due to larger buffers?) that can also have a large impact on the reported results. Here again, I'd watch to see that UDP packets were not being treated differently from TCP packets (monitor RTT for both types of streams). It is more than conceivable (for example) than an "attempt" at fair-queuing in some router/switch misunderstood flows of UDP packets, resulting in more or less bufferbloat for TCP vs UDP.
Opinions expressed are my own, and not that of my company.