We have multiple ways of computing quantiles now. Let me explain the different ones:
1) Client-side quantiles via summaries.
If you want to measure many latencies of the same type (like request latencies on an HTTP server instance) and you have observations that are more frequent than the scrape interval, then you could use summaries. The upside is that no quantile-computation is needed in the Prometheus server and no manual buckets need to be specified like with histograms. Downside: you can't aggregate precomputed quantiles across dimensions, so this is only good if you care about only individual instances and no subdimensions, vs. e.g. an entire service latency.
2) histogram_quantile() based on histograms
3) quantile_over_time()
This returns the x-th quantile *over time* for each input series. For example, to answer such questions as, "what was my 90th percentile run time of my batch job over the last 7d?", where the batch job run time was saved in a single gauge vs. a quantile or histogram (because it happens much less frequently than the scrape interval, so you don't need to cram multiple observations into one scrape interval via some client-side aggregation).
4) quantile()
This returns the x-th quantile at *one point in time* over *multiple* series. Let's say you have 100 nodes and you want to know the 90th percentile CPU usage % over all of them, then you could use "quantile by(job) (0.9, rate(my_cpu_usage_seconds[5m]))".
Does that make sense?