Butlooking at those prometheus graphs I see culebre and nidhogg not having any problems in re-rendering tiles, but odin and ysera have, so maybe even if those tiles are invalidated at e.g. odin, that machine is currently not able to catch up on all those invalidation jobs?
As I understand, if the dirty queue is full (8K requests), like on odin and ysera during the day, additional tile requests get dropped and are re-issued with the next request, but may land in the queue or get dropped again.
The changes are published in the minutely replication feed which is consumed by each of the eight (currently) render servers which update their databases and mark any tiles which are affected as dirty - all that will normally happen within a few minutes of the change being made.
Everything else is actually driven from the client when you browse an area - your browser makes a request to a fastly CDN node and that will either return a cached tile or will ask a render server for it.
The rendering continues in the background though so the next time a CDN node asks it will likely get the new version unless the render server has been so busy that it has been dropping render requests - missing tiles take priority over dirty ones so are less likely to suffer from being dropped.
Given that Vector Tiles are under way, I do not this is worth busting an
arcane bug somewhere. But before people getting nuts seeing dirty tiles
even after having cleared out all browser caches, it might be fair to
say that most likely some corner cases simply exist.
3a8082e126