[PATCH v4 0/4] Prometheus enhancement

50 views
Skip to first unread message

Amnon Heiman

<amnon@scylladb.com>
unread,
Jun 23, 2022, 10:39:56 AM6/23/22
to seastar-dev@googlegroups.com
Replaces: Add summary metrics support

Motivation:
Histograms are the Prometheus killer. They are big to send over the
network and are big to store in Prometheus.
Histograms' size is always an issue, but with Seastar, it becomes even trickier.
Typically, each shard would collect its histograms, so the overall data
is multiplied by the number of shards.

This series addresses the need to report quantile information like latency
without generating massive metrics reports.

A summary is a Prometheus metric type that holds a quantile summary (i.e.
p95, p99).

The downside of summaries is that they cannot be aggregated, which is
needed for a distributed system (i.e., calculate the p99 latency of a cluster).

The series adds four tools for Prometheus performance:
1. Add summaries.
2. Optionally, remove empty metrics. It's common to register metrics for
optional services. It is now possible to mark those metrics as
skip_when_empty and they will not be reported if they were never used.
3. Allow aggregating metrics. The most common case is reporting per-node
metrics instead of per shard. For example, for multi-nodes quantile calculation,
we need a per-node histogram. It is now possible to mark a registered
metric for aggregation. The metrics layer will aggregate this
metric based on a list of labels. (Typically, this will be by shard,
but it could be any other combination of labels).
4. Reuse the stringstream instead of recreating an object on each
iteration.

git https://github.com/amnonh/seastar/tree/metrics_summary

V4
Aggregation and skip_when_empty are now generalized in a sense that it's
a per metric configuration and can be used on any metric, not just
histogram.

The 'aggregate by' now accept labels instead of string which makes the API
cleaner.

The stringstream that is used inside Prometheus text reporting is now
reused instead of recreating on each iteration.

The aggregator object in Prometheus was rename and some comments were
added to make its use clearer.

V3
The main issue with the previous version was that it was too specific.
This version breaks the functionality and pushes some of the logic to
whoever uses the metrics layer.

Summaries are now just another kind of histogram, with no special magic
around them and no need for their own type.

A user can use any combination of labels for aggregation.
It's currently only applied to histograms, but we can do it with other
metrics if it will be helpful.

In the current implementation, to report a summary per shard and a single
histogram per node, the user of the metrics layer would register two metrics,
one for the summary and one for the histogram.

Amnon Heiman (4):
metrics.hh: Support SUMMARY type
metrics.cc: missing count and sum when aggregating histograms
metrics: support aggregate by labels and skip_when_empty
prometheus.cc: Optimize reporting

include/seastar/core/metrics.hh | 31 ++++-
include/seastar/core/metrics_api.hh | 10 +-
src/core/metrics.cc | 28 ++++-
src/core/prometheus.cc | 173 ++++++++++++++++++++++------
4 files changed, 195 insertions(+), 47 deletions(-)

--
2.31.1

Amnon Heiman

<amnon@scylladb.com>
unread,
Jun 23, 2022, 10:39:57 AM6/23/22
to seastar-dev@googlegroups.com
This patch adds a missing part to how histograms are being aggregated,
it needs to aggregate the sum and count as well.

Signed-off-by: Amnon Heiman <am...@scylladb.com>
---
src/core/metrics.cc | 2 ++
1 file changed, 2 insertions(+)

diff --git a/src/core/metrics.cc b/src/core/metrics.cc
index 98a14874a0..ea39b3e72d 100644
--- a/src/core/metrics.cc
+++ b/src/core/metrics.cc
@@ -365,6 +365,8 @@ histogram& histogram::operator+=(const histogram& c) {
buckets[i].count += c.buckets[i].count;
}
}
+ sample_count += c.sample_count;
+ sample_sum += c.sample_sum;
return *this;
}

--
2.31.1

Amnon Heiman

<amnon@scylladb.com>
unread,
Jun 23, 2022, 10:39:57 AM6/23/22
to seastar-dev@googlegroups.com
This patch add support for the summary type on the metrics layer.

A summary is a different kind of histogram, it's buckets are percentile
so the reporting layer (i.e. Prometheus for example) would know to
report it correctly.

Signed-off-by: Amnon Heiman <am...@scylladb.com>
---
include/seastar/core/metrics.hh | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/include/seastar/core/metrics.hh b/include/seastar/core/metrics.hh
index 450db26944..8d8dca3a8f 100644
--- a/include/seastar/core/metrics.hh
+++ b/include/seastar/core/metrics.hh
@@ -256,6 +256,7 @@ enum class data_type : uint8_t {
REAL_COUNTER,
GAUGE,
HISTOGRAM,
+ SUMMARY,
};

template <bool callable, typename T>
@@ -595,6 +596,18 @@ impl::metric_definition_impl make_histogram(metric_name_type name,
return {name, {impl::data_type::HISTOGRAM, "histogram"}, make_function(std::forward<T>(val), impl::data_type::HISTOGRAM), d, {}};
}

+/*!
+ * \brief create a summary metric.
+ *
+ * Summaries are a different kind of histograms. It reports in quantiles.
+ * For example, the p99 and p95 latencies.
+ */
+template<typename T>
+impl::metric_definition_impl make_summary(metric_name_type name,
+ description d, T&& val) {
+ return {name, {impl::data_type::SUMMARY, "summary"}, make_function(std::forward<T>(val), impl::data_type::SUMMARY), d, {}};
+}
+

/*!
* \brief create a total_bytes metric.
--
2.31.1

Amnon Heiman

<amnon@scylladb.com>
unread,
Jun 23, 2022, 10:39:58 AM6/23/22
to seastar-dev@googlegroups.com
Aggregate labels are a mechanism for reporting aggregated results. Most
commonly it allows to report one histogram per node instead of per
shard.

This patch adds an option to mark a metric with a vector of labels.
That vector will be part of the metric meta-data so the reporting
layer would be able to aggregate over it.

Skip when empty, means that metrics that are not in used, will not be
reported.

A common scenario is that user register a metrics, but that metrics is
never used.
The most common case is histogram and summary but it it can also happen
with counters.

This patch adds an option to mark a metric with skip_when_empty.
When done so, if a metric was never used (true for histogram, counters
and summary) it will not be reported.

Signed-off-by: Amnon Heiman <am...@scylladb.com>
---
include/seastar/core/metrics.hh | 18 ++++++++++++++++--
include/seastar/core/metrics_api.hh | 10 +++++++---
src/core/metrics.cc | 26 +++++++++++++++++++++-----
3 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/include/seastar/core/metrics.hh b/include/seastar/core/metrics.hh
index 8d8dca3a8f..570cc3651e 100644
--- a/include/seastar/core/metrics.hh
+++ b/include/seastar/core/metrics.hh
@@ -341,7 +341,16 @@ class metric_value {
const histogram& get_histogram() const {
return std::get<histogram>(u);
}
-
+ /*!
+ * \brief return true if this is metrics was never used
+ *
+ * Histograms, Summaries and counters are ever growing by nature, so
+ * it possible to check if they have been used or not.
+ */
+ bool is_empty() const {
+ return ((_type == data_type::HISTOGRAM || _type == data_type::SUMMARY) && get_histogram().sample_count == 0) ||
+ ((_type == data_type::COUNTER || _type == data_type::REAL_COUNTER) && d() == 0);
+ }
private:
static void ulong_conversion_error(double d);
};
@@ -359,16 +368,21 @@ struct metric_definition_impl {
metric_function f;
description d;
bool enabled = true;
+ bool _skip_when_empty = false;
+ std::vector<std::string> aggregate_labels;
std::map<sstring, sstring> labels;
metric_definition_impl& operator ()(bool enabled);
metric_definition_impl& operator ()(const label_instance& label);
+ metric_definition_impl& aggregate(const std::vector<label>& labels);
+ metric_definition_impl& skip_when_empty(bool skip=true);
metric_definition_impl& set_type(const sstring& type_name);
metric_definition_impl(
metric_name_type name,
metric_type type,
metric_function f,
description d,
- std::vector<label_instance> labels);
+ std::vector<label_instance> labels,
+ std::vector<label> aggregate_labels = {});
};

class metric_groups_def {
diff --git a/include/seastar/core/metrics_api.hh b/include/seastar/core/metrics_api.hh
index 6843df47ab..c2e3e8e2fd 100644
--- a/include/seastar/core/metrics_api.hh
+++ b/include/seastar/core/metrics_api.hh
@@ -152,6 +152,7 @@ struct metric_family_info {
metric_type_def inherit_type;
description d;
sstring name;
+ std::vector<std::string> aggregate_labels;
};


@@ -161,6 +162,7 @@ struct metric_family_info {
struct metric_info {
metric_id id;
bool enabled;
+ bool skip_when_empty;
};


@@ -185,7 +187,7 @@ class registered_metric {
metric_function _f;
shared_ptr<impl> _impl;
public:
- registered_metric(metric_id id, metric_function f, bool enabled=true);
+ registered_metric(metric_id id, metric_function f, bool enabled=true, bool skip_when_empty=false);
virtual ~registered_metric() {}
virtual metric_value operator()() const {
return _f();
@@ -198,7 +200,9 @@ class registered_metric {
void set_enabled(bool b) {
_info.enabled = b;
}
-
+ void set_skip_when_empty(bool skip) {
+ _info.skip_when_empty = skip;
+ }
const metric_id& get_id() const {
return _info.id;
}
@@ -330,7 +334,7 @@ class impl {
return _value_map;
}

- void add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled);
+ void add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, bool skip_when_empty, const std::vector<std::string>& aggregate_labels);
void remove_registration(const metric_id& id);
future<> stop() {
return make_ready_future<>();
diff --git a/src/core/metrics.cc b/src/core/metrics.cc
index ea39b3e72d..cf8ef1a5af 100644
--- a/src/core/metrics.cc
+++ b/src/core/metrics.cc
@@ -118,9 +118,10 @@ bool label_instance::operator!=(const label_instance& id2) const {
label shard_label("shard");
namespace impl {

-registered_metric::registered_metric(metric_id id, metric_function f, bool enabled) :
+registered_metric::registered_metric(metric_id id, metric_function f, bool enabled, bool skip_when_empty) :
_f(f), _impl(get_local_impl()) {
_info.enabled = enabled;
+ set_skip_when_empty(skip_when_empty);
_info.id = id;
}

@@ -146,7 +147,8 @@ metric_definition_impl::metric_definition_impl(
metric_type type,
metric_function f,
description d,
- std::vector<label_instance> _labels)
+ std::vector<label_instance> _labels,
+ std::vector<label> _aggregate_labels)
: name(name), type(type), f(f)
, d(d), enabled(true) {
for (auto i: _labels) {
@@ -155,6 +157,7 @@ metric_definition_impl::metric_definition_impl(
if (labels.find(shard_label.name()) == labels.end()) {
labels[shard_label.name()] = shard();
}
+ aggregate(_aggregate_labels);
}

metric_definition_impl& metric_definition_impl::operator ()(bool _enabled) {
@@ -172,6 +175,18 @@ metric_definition_impl& metric_definition_impl::set_type(const sstring& type_nam
return *this;
}

+metric_definition_impl& metric_definition_impl::aggregate(const std::vector<label>& _labels) {
+ aggregate_labels.reserve(_labels.size());
+ std::transform(_labels.begin(), _labels.end(),std::back_inserter(aggregate_labels),
+ [](const label& l) { return l.name(); });
+ return *this;
+}
+
+metric_definition_impl& metric_definition_impl::skip_when_empty(bool skip) {
+ _skip_when_empty = skip;
+ return *this;
+}
+
std::unique_ptr<metric_groups_def> create_metric_groups() {
return std::make_unique<metric_groups_impl>();
}
@@ -186,7 +201,7 @@ metric_groups_impl& metric_groups_impl::add_metric(group_name_type name, const m

metric_id id(name, md._impl->name, md._impl->labels);

- get_local_impl()->add_registration(id, md._impl->type, md._impl->f, md._impl->d, md._impl->enabled);
+ get_local_impl()->add_registration(id, md._impl->type, md._impl->f, md._impl->d, md._impl->enabled, md._impl->_skip_when_empty, md._impl->aggregate_labels);

_registration.push_back(id);
return *this;
@@ -327,8 +342,8 @@ std::vector<std::vector<metric_function>>& impl::functions() {
return _current_metrics;
}

-void impl::add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled) {
- auto rm = ::seastar::make_shared<registered_metric>(id, f, enabled);
+void impl::add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, bool skip_when_empty, const std::vector<std::string>& aggregate_labels) {
+ auto rm = ::seastar::make_shared<registered_metric>(id, f, enabled, skip_when_empty);
sstring name = id.full_name();
if (_value_map.find(name) != _value_map.end()) {
auto& metric = _value_map[name];
@@ -344,6 +359,7 @@ void impl::add_registration(const metric_id& id, const metric_type& type, metric
_value_map[name].info().d = d;
_value_map[name].info().inherit_type = type.type_name;
_value_map[name].info().name = id.full_name();
+ _value_map[name].info().aggregate_labels = aggregate_labels;
_value_map[name][id.labels()] = rm;
}
dirty();
--
2.31.1

Amnon Heiman

<amnon@scylladb.com>
unread,
Jun 23, 2022, 10:40:00 AM6/23/22
to seastar-dev@googlegroups.com
This patch adds multiple functionality to Prometheus reporting:
1. Add summary reporting. Summaries are used in seastar to report aggregated
percentile information (for example p95 and p99)
The main usage is to report per-shard summary of a latency
histograms.
2. Support aggregated metrics. With an aggregated metrics,
Prometheus would aggregate multiple metrics based on labels and
would report the result. Usually this would be for reporting a single
latency histogram per node instead of per shard. But it could be used
for counters and gauge as well.
3. Skip empty counters, histograms and summaries. It's a common practice to
register lots of metrics even if they are not being used.
Histograms have a huge effect on performance, so not reporting an empty
histogram is a great performance boost both for the application and
for the Prometheus server.
This is true for Summaries and Counters as well, marking a metrics
with skip_when_empty would mean Prometheus will not report those
metrics.
4. As an optimization, the stringstream that is used per metric is
reused and clear insted of recreated.

Signed-off-by: Amnon Heiman <am...@scylladb.com>
---
src/core/prometheus.cc | 173 ++++++++++++++++++++++++++++++++---------
1 file changed, 136 insertions(+), 37 deletions(-)

diff --git a/src/core/prometheus.cc b/src/core/prometheus.cc
index 937eec7572..20352c8459 100644
--- a/src/core/prometheus.cc
+++ b/src/core/prometheus.cc
@@ -51,6 +51,8 @@ static std::string to_str(seastar::metrics::impl::data_type dt) {
return "counter";
case seastar::metrics::impl::data_type::HISTOGRAM:
return "histogram";
+ case seastar::metrics::impl::data_type::SUMMARY:
+ return "summary";
}
return "untyped";
}
@@ -63,6 +65,7 @@ static std::string to_str(const seastar::metrics::impl::metric_value& v) {
case seastar::metrics::impl::data_type::COUNTER:
return std::to_string(v.i());
case seastar::metrics::impl::data_type::HISTOGRAM:
+ case seastar::metrics::impl::data_type::SUMMARY:
break;
}
return ""; // we should never get here but it makes the compiler happy
@@ -478,14 +481,123 @@ metric_family_range get_range(const metrics_families_per_shard& mf, const sstrin

}

+void write_histogram(std::stringstream& s, const config& ctx, const sstring& name, const seastar::metrics::histogram& h, std::map<sstring, sstring> labels) {
+ add_name(s, name + "_sum", labels, ctx);
+ s << h.sample_sum;
+ s << "\n";
+ add_name(s, name + "_count", labels, ctx);
+ s << h.sample_count;
+ s << "\n";
+
+ auto& le = labels["le"];
+ auto bucket = name + "_bucket";
+ for (auto i : h.buckets) {
+ le = std::to_string(i.upper_bound);
+ add_name(s, bucket, labels, ctx);
+ s << i.count;
+ s << "\n";
+ }
+ labels["le"] = "+Inf";
+ add_name(s, bucket, labels, ctx);
+ s << h.sample_count;
+ s << "\n";
+}
+
+void write_summary(std::stringstream& s, const config& ctx, const sstring& name, const seastar::metrics::histogram& h, std::map<sstring, sstring> labels) {
+ if (h.sample_sum) {
+ add_name(s, name + "_sum", labels, ctx);
+ s << h.sample_sum;
+ s << "\n";
+ }
+ if (h.sample_count) {
+ add_name(s, name + "_count", labels, ctx);
+ s << h.sample_count;
+ s << "\n";
+ }
+ auto& le = labels["quantile"];
+ for (auto i : h.buckets) {
+ le = std::to_string(i.upper_bound);
+ add_name(s, name, labels, ctx);
+ s << i.count;
+ s << "\n";
+ }
+}
+/*!
+ * \brief a helper class to aggregate metrics over labels
+ *
+ * This class sum multiple metrics based on a list of labels.
+ * It returns one or more metrics each aggregated by the aggregate_by labels.
+ *
+ * To use it, you define what labels it should aggregate by and then pass to
+ * it metrics with their labels.
+ * For example if a metrics has a 'shard' and 'name' labels and you aggregate by 'shard'
+ * it would return a map of metrics each with only the 'name' label
+ *
+ */
+class metric_aggregate_by_labels {
+ std::vector<std::string> _labels_to_aggregate_by;
+ std::unordered_map<std::map<sstring, sstring>, seastar::metrics::impl::metric_value> _values;
+public:
+ metric_aggregate_by_labels(std::vector<std::string> labels) : _labels_to_aggregate_by(std::move(labels)) {
+ }
+ /*!
+ * \brief add a metric
+ *
+ * This method gets a metric and its labels and add it to the aggregated metric.
+ * For example, if a metric has the labels {'shard':'0', 'name':'myhist'} and we are aggregating
+ * over 'shard'
+ * The metric would be added to the aggregated metric with labels {'name':'myhist'}.
+ *
+ */
+ void add(const seastar::metrics::impl::metric_value& m, std::map<sstring, sstring> labels) {
+ for (auto&& l : _labels_to_aggregate_by) {
+ labels.erase(l);
+ }
+ if (_values.find(labels) == _values.end()) {
+ _values.emplace(labels, m);
+ } else {
+ _values[labels] += m;
+ }
+ }
+ const std::unordered_map<std::map<sstring, sstring>, seastar::metrics::impl::metric_value>& get_values() const {
+ return _values;
+ }
+ bool empty() const {
+ return _values.empty();
+ }
+};
+
+std::string get_value_as_string(std::stringstream& s, const mi::metric_value& value) {
+ std::string value_str;
+ try {
+ value_str = to_str(value);
+ } catch (const std::range_error& e) {
+ seastar_logger.debug("prometheus: write_text_representation: {}: {}", s.str(), e.what());
+ value_str = "NaN";
+ } catch (...) {
+ auto ex = std::current_exception();
+ // print this error as it's ignored later on by `connection::start_response`
+ seastar_logger.error("prometheus: write_text_representation: {}: {}", s.str(), ex);
+ std::rethrow_exception(std::move(ex));
+ }
+ return value_str;
+}
+
future<> write_text_representation(output_stream<char>& out, const config& ctx, const metric_family_range& m) {
return seastar::async([&ctx, &out, &m] () mutable {
bool found = false;
+ std::stringstream s;
for (metric_family& metric_family : m) {
auto name = ctx.prefix + "_" + metric_family.name();
found = false;
- metric_family.foreach_metric([&out, &ctx, &found, &name, &metric_family](auto value, auto value_info) mutable {
- std::stringstream s;
+ metric_aggregate_by_labels aggregated_values(metric_family.metadata().aggregate_labels);
+ bool should_aggregate = !metric_family.metadata().aggregate_labels.empty();
+ metric_family.foreach_metric([&s, &out, &ctx, &found, &name, &metric_family, &aggregated_values, should_aggregate](auto value, auto value_info) mutable {
+ s.clear();
+ s.str("");
+ if (value_info.skip_when_empty && value.is_empty()) {
+ return;
+ }
if (!found) {
if (metric_family.metadata().d.str() != "") {
s << "# HELP " << name << " " << metric_family.metadata().d.str() << "\n";
@@ -493,48 +605,35 @@ future<> write_text_representation(output_stream<char>& out, const config& ctx,
s << "# TYPE " << name << " " << to_str(metric_family.metadata().type) << "\n";
found = true;
}
- if (value.type() == mi::data_type::HISTOGRAM) {
- auto&& h = value.get_histogram();
- std::map<sstring, sstring> labels = value_info.id.labels();
- add_name(s, name + "_sum", labels, ctx);
- s << h.sample_sum;
- s << "\n";
- add_name(s, name + "_count", labels, ctx);
- s << h.sample_count;
- s << "\n";
-
- auto& le = labels["le"];
- auto bucket = name + "_bucket";
- for (auto i : h.buckets) {
- le = std::to_string(i.upper_bound);
- add_name(s, bucket, labels, ctx);
- s << i.count;
- s << "\n";
- }
- labels["le"] = "+Inf";
- add_name(s, bucket, labels, ctx);
- s << h.sample_count;
- s << "\n";
+ if (should_aggregate) {
+ aggregated_values.add(value, value_info.id.labels());
+ } else if (value.type() == mi::data_type::SUMMARY) {
+ write_summary(s, ctx, name, value.get_histogram(), value_info.id.labels());
+ } else if (value.type() == mi::data_type::HISTOGRAM) {
+ write_histogram(s, ctx, name, value.get_histogram(), value_info.id.labels());
} else {
add_name(s, name, value_info.id.labels(), ctx);
- std::string value_str;
- try {
- value_str = to_str(value);
- } catch (const std::range_error& e) {
- seastar_logger.debug("prometheus: write_text_representation: {}: {}", s.str(), e.what());
- value_str = "NaN";
- } catch (...) {
- auto ex = std::current_exception();
- // print this error as it's ignored later on by `connection::start_response`
- seastar_logger.error("prometheus: write_text_representation: {}: {}", s.str(), ex);
- std::rethrow_exception(std::move(ex));
- }
- s << value_str;
+ s << get_value_as_string(s, value);
s << "\n";
}
out.write(s.str()).get();
thread::maybe_yield();
});
+ if (!aggregated_values.empty()) {
+ for (auto&& h : aggregated_values.get_values()) {
+ s.clear();
+ s.str("");
+ if (h.second.type() == mi::data_type::HISTOGRAM) {
+ write_histogram(s, ctx, name, h.second.get_histogram(), h.first);
+ } else {
+ add_name(s, name, h.first, ctx);
+ s << get_value_as_string(s, h.second);
+ s << "\n";
+ }
+ out.write(s.str()).get();
+ thread::maybe_yield();
+ }
+ }
}
});
}
--
2.31.1

Vlad Lazar

<vlad@redpanda.com>
unread,
Jun 23, 2022, 2:02:26 PM6/23/22
to seastar-dev
Nice patch set! Thank you for incorporating Ben's comments. I tested it against
our project and it all seems to work fine. I've got a small nit comment in line,
but otherwise looks good to me.

Should 'clear' be called on 'aggregate_labels' here? I know that 'aggregate' is only supposed to
be called on this object once, but it would be good to guard against misuse.

Ben Pope

<ben@redpanda.com>
unread,
Jun 24, 2022, 7:48:48 AM6/24/22
to seastar-dev@googlegroups.com
0 seems like a valid value. Perhaps `metric_value::u` could gain a
`std::monostate` and empty check?

> + }
> private:
> static void ulong_conversion_error(double d);
> };
> @@ -359,16 +368,21 @@ struct metric_definition_impl {
> metric_function f;
> description d;
> bool enabled = true;
> + bool _skip_when_empty = false;

This could be an `seastar::bool_class` to help prevent mistakes.

Ben Pope

<ben@redpanda.com>
unread,
Jun 24, 2022, 7:49:24 AM6/24/22
to seastar-dev@googlegroups.com


On 23/06/2022 15:39, 'Amnon Heiman' via seastar-dev wrote:
Is SUMMARY type missing here?

Amnon Heiman

<amnon@scylladb.com>
unread,
Jun 24, 2022, 10:40:59 AM6/24/22
to Ben Pope, seastar-dev
As explained in the header, for counters, histogram, real counter and summary, zero means it was never used.

All of them by definition have an ever growing counter (that's why it's not valid for gauge).


> +    }
>   private:
>       static void ulong_conversion_error(double d);
>   };
> @@ -359,16 +368,21 @@ struct metric_definition_impl {
>       metric_function f;
>       description d;
>       bool enabled = true;
> +    bool _skip_when_empty = false;

This could be an `seastar::bool_class` to help prevent mistakes.
It's an internal class, what kind of mistake could it be?

--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/374e347e-24a1-46b2-b702-7165dab25029%40redpanda.com.

Amnon Heiman

<amnon@scylladb.com>
unread,
Jun 24, 2022, 10:42:02 AM6/24/22
to Ben Pope, seastar-dev
By design, summaries cannot be aggregated, aggregating a summary returns a meaningless result.
 

> +                    } else {
> +                        add_name(s, name, h.first, ctx);
> +                        s << get_value_as_string(s, h.second);
> +                        s << "\n";
> +                    }
> +                    out.write(s.str()).get();
> +                    thread::maybe_yield();
> +                }
> +            }
>           }
>       });
>   }

--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.

Benny Halevy

<bhalevy@scylladb.com>
unread,
Jun 29, 2022, 2:25:02 AM6/29/22
to Amnon Heiman, seastar-dev@googlegroups.com
On Thu, 2022-06-23 at 17:39 +0300, 'Amnon Heiman' via seastar-dev wrote:
> This patch adds a missing part to how histograms are being aggregated,
> it needs to aggregate the sum and count as well.

If this patches fixes a bug, let's open a github issue for it
and mark it Fixed here so that it can tracked for backporting
into scylla seastar submodule release branches.

Benny Halevy

<bhalevy@scylladb.com>
unread,
Jun 29, 2022, 2:38:24 AM6/29/22
to Amnon Heiman, seastar-dev@googlegroups.com
On Thu, 2022-06-23 at 17:39 +0300, 'Amnon Heiman' via seastar-dev wrote:
nit: s/this is metrics/this metric/

> +     *
> +     * Histograms, Summaries and counters are ever growing by nature, so
> +     * it possible to check if they have been used or not.

nit: s/it/it is/

> +     */
> +    bool is_empty() const {

const noexcept please.

> +        return ((_type == data_type::HISTOGRAM || _type == data_type::SUMMARY) && get_histogram().sample_count == 0) ||
> +                ((_type == data_type::COUNTER || _type == data_type::REAL_COUNTER) && d() == 0);
> +    }
>  private:
>      static void ulong_conversion_error(double d);
>  };
> @@ -359,16 +368,21 @@ struct metric_definition_impl {
>      metric_function f;
>      description d;
>      bool enabled = true;
> +    bool _skip_when_empty = false;

I agree with other comment on this thread about
using a bool_class here.

> +    std::vector<std::string> aggregate_labels;
>      std::map<sstring, sstring> labels;
>      metric_definition_impl& operator ()(bool enabled);
>      metric_definition_impl& operator ()(const label_instance& label);
> +    metric_definition_impl& aggregate(const std::vector<label>& labels);
> +    metric_definition_impl& skip_when_empty(bool skip=true);

skip_when_empty sounds like a predicate.
Better call this method set_skip_when_empty like the respective registered_metric method.
can be noexcept
There's a bit too much going on in this patch.
Can you break into two for skip_when_empty vs. aggregate labels?

>          : name(name), type(type), f(f)
>          , d(d), enabled(true) {
>      for (auto i: _labels) {
> @@ -155,6 +157,7 @@ metric_definition_impl::metric_definition_impl(
>      if (labels.find(shard_label.name()) == labels.end()) {
>          labels[shard_label.name()] = shard();
>      }
> +    aggregate(_aggregate_labels);
>  }
>  
>  metric_definition_impl& metric_definition_impl::operator ()(bool _enabled) {
> @@ -172,6 +175,18 @@ metric_definition_impl& metric_definition_impl::set_type(const sstring& type_nam
>      return *this;
>  }
>  
> +metric_definition_impl& metric_definition_impl::aggregate(const std::vector<label>& _labels) {
> +    aggregate_labels.reserve(_labels.size());
> +    std::transform(_labels.begin(), _labels.end(),std::back_inserter(aggregate_labels),
> +            [](const label& l) { return l.name(); });
> +    return *this;
> +}
> +
> +metric_definition_impl& metric_definition_impl::skip_when_empty(bool skip) {

noexcept

Benny Halevy

<bhalevy@scylladb.com>
unread,
Jun 29, 2022, 2:58:14 AM6/29/22
to Amnon Heiman, seastar-dev@googlegroups.com
On Thu, 2022-06-23 at 17:39 +0300, 'Amnon Heiman' via seastar-dev wrote:
s << '\n' should be a bit more efficient.
That said, why not implement the above in a single statement, like:
s << h.sample_sum << '\n';
add empty line for consistency.

> +/*!
> + * \brief a helper class to aggregate metrics over labels
> + *
> + * This class sum multiple metrics based on a list of labels.
> + * It returns one or more metrics each aggregated by the aggregate_by labels.
> + *
> + * To use it, you define what labels it should aggregate by and then pass to
> + * it metrics with their labels.
> + * For example if a metrics has a 'shard' and 'name' labels and you aggregate by 'shard'
> + * it would return a map of metrics each with only the 'name' label
> + *
> + */
> +class metric_aggregate_by_labels {
> +    std::vector<std::string> _labels_to_aggregate_by;
> +    std::unordered_map<std::map<sstring, sstring>, seastar::metrics::impl::metric_value> _values;
> +public:
> +    metric_aggregate_by_labels(std::vector<std::string>  labels) : _labels_to_aggregate_by(std::move(labels)) {
> +    }
> +    /*!
> +     * \brief add a metric
> +     *
> +     * This method gets a metric and its labels and add it to the aggregated metric.

nit: s/add/adds/

> +     * For example, if a metric has the labels {'shard':'0', 'name':'myhist'} and we are aggregating
> +     * over 'shard'
> +     * The metric would be added to the aggregated metric with labels {'name':'myhist'}.
> +     *
> +     */
> +    void add(const seastar::metrics::impl::metric_value& m, std::map<sstring, sstring> labels) {
> +        for (auto&& l : _labels_to_aggregate_by) {
> +            labels.erase(l);
> +        }
> +        if (_values.find(labels) == _values.end()) {
> +            _values.emplace(labels, m);

I think that passing std::move(labels) can avoid an extraneous copy here.

> +        } else {
> +            _values[labels] += m;

You can use the iterator returned by _values.find above.
The [] operator will need to search again as this could
be pretty heavy with enough labels.

> +        }
> +    }
> +    const std::unordered_map<std::map<sstring, sstring>, seastar::metrics::impl::metric_value>& get_values() const {

noexcept

> +        return _values;
> +    }
> +    bool empty() const {

noexcept

> +        return _values.empty();
> +    }
> +};
> +
> +std::string get_value_as_string(std::stringstream& s, const mi::metric_value& value) {
> +    std::string value_str;
> +    try {
> +        value_str = to_str(value);
> +    } catch (const std::range_error& e) {
> +        seastar_logger.debug("prometheus: write_text_representation: {}: {}", s.str(), e.what());

But this function is get_value_as_string.
The error message can be confusing.

> +        value_str = "NaN";
> +    } catch (...) {
> +        auto ex = std::current_exception();
> +        // print this error as it's ignored later on by `connection::start_response`
> +        seastar_logger.error("prometheus: write_text_representation: {}: {}", s.str(), ex);

same here
s << '\n' as commented above.

Amnon Heiman

<amnon@scylladb.com>
unread,
Jun 29, 2022, 10:35:21 AM6/29/22
to seastar-dev@googlegroups.com
Replaces: Add summary metrics support

Motivation:
Histograms are the Prometheus killer. They are big to send over the
network and are big to store in Prometheus.
Histograms' size is always an issue, but with Seastar, it becomes even trickier.
Typically, each shard would collect its histograms, so the overall data
is multiplied by the number of shards.

This series addresses the need to report quantile information like latency
without generating massive metrics reports.

A summary is a Prometheus metric type that holds a quantile summary (i.e.
p95, p99).

The downside of summaries is that they cannot be aggregated, which is
needed for a distributed system (i.e., calculate the p99 latency of a cluster).

The series adds four tools for Prometheus performance:
1. Add summaries.
2. Optionally, remove empty metrics. It's common to register metrics for
optional services. It is now possible to mark those metrics as
skip_when_empty and they will not be reported if they were never used.
3. Allow aggregating metrics. The most common case is reporting per-node
histogram instead of per shard. For example, for multi-nodes quantile calculation,
we need a per-node histogram. It is now possible to mark a registered
metric for aggregation. The metrics layer will aggregate this
metric based on a list of labels. (Typically, this will be by shard,
but it could be any other combination of labels).
4. Reuse the stringstream instead of recreating an object on each
iteration.

git https://github.com/amnonh/seastar/tree/metrics_summary

V5
Main changes:
* split the aggregate and skip_on_empty into two patches
* Use bool_class for skip_when_empty this also allows using the parenthesis
operator to mark that a metric should use skip_when_empty.
* Fix some typos in the comments
* Add noexcept when needed
* Change "\n" to '\n' when possible.

V4
Aggregation and skip_when_empty are now generalized in a sense that it's
a per metric configuration and can be used on any metric, not just
histogram.

The 'aggregate by' now accept labels instead of string which makes the API
cleaner.

The stringstream that is used inside Prometheus text reporting is now
reused instead of recreating on each iteration.

The aggregator object in Prometheus was rename and some comments were
added to make its use clearer.

V3
The main issue with the previous version was that it was too specific.
This version breaks the functionality and pushes some of the logic to
whoever uses the metrics layer.

Summaries are now just another kind of histogram, with no special magic
around them and no need for their own type.

A user can use any combination of labels for aggregation.
It's currently only applied to histograms, but we can do it with other
metrics if it will be helpful.

In the current implementation, to report a summary per shard and a single
histogram per node, the user of the metrics layer would register two metrics,
one for the summary and one for the histogram.


Amnon Heiman (5):
metrics.hh: Support SUMMARY type
metrics.cc: missing count and sum when aggregating histograms
metrics: support skip_when_empty
metrics: support aggregate by labels
prometheus.cc: Optimize reporting

include/seastar/core/metrics.hh | 33 +++++-
include/seastar/core/metrics_api.hh | 10 +-
src/core/metrics.cc | 33 +++++-
src/core/prometheus.cc | 169 +++++++++++++++++++++-------
4 files changed, 198 insertions(+), 47 deletions(-)

--
2.31.1

Amnon Heiman

<amnon@scylladb.com>
unread,
Jun 29, 2022, 10:35:22 AM6/29/22
to seastar-dev@googlegroups.com
This patch add support for the summary type on the metrics layer.

A summary is a different kind of histogram, it's buckets are percentile
so the reporting layer (i.e. Prometheus for example) would know to
report it correctly.

Signed-off-by: Amnon Heiman <am...@scylladb.com>
---
include/seastar/core/metrics.hh | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/include/seastar/core/metrics.hh b/include/seastar/core/metrics.hh
index 450db26944..8d8dca3a8f 100644
--- a/include/seastar/core/metrics.hh
+++ b/include/seastar/core/metrics.hh

Amnon Heiman

<amnon@scylladb.com>
unread,
Jun 29, 2022, 10:35:24 AM6/29/22
to seastar-dev@googlegroups.com
This patch adds a missing part to how histograms are being aggregated,
it needs to aggregate the sum and count as well.

Fixes #1114

Signed-off-by: Amnon Heiman <am...@scylladb.com>
---
src/core/metrics.cc | 2 ++
1 file changed, 2 insertions(+)

diff --git a/src/core/metrics.cc b/src/core/metrics.cc
index 98a14874a0..ea39b3e72d 100644
--- a/src/core/metrics.cc
+++ b/src/core/metrics.cc

Amnon Heiman

<amnon@scylladb.com>
unread,
Jun 29, 2022, 10:35:24 AM6/29/22
to seastar-dev@googlegroups.com
Skip when empty, means that metrics that are not in used, will not be
reported.

A common scenario is that a user register a metric, but that metric is
never used.
The most common case is histogram and summary but it is also valid
for counters.

This patch adds an option to mark a metric with skip_when_empty.
When done so, if a metric was never used (true for histogram, counters
and summaries) it will not be reported.

Signed-off-by: Amnon Heiman <am...@scylladb.com>
---
include/seastar/core/metrics.hh | 15 +++++++++++++++
include/seastar/core/metrics_api.hh | 9 ++++++---
src/core/metrics.cc | 19 +++++++++++++++----
3 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/include/seastar/core/metrics.hh b/include/seastar/core/metrics.hh
index 8d8dca3a8f..3e91d779b0 100644
--- a/include/seastar/core/metrics.hh
+++ b/include/seastar/core/metrics.hh
@@ -30,6 +30,7 @@
#include <map>
#include <seastar/core/metrics_types.hh>
#include <seastar/util/std-compat.hh>
+#include <seastar/util/bool_class.hh>

/*! \file metrics.hh
* \brief header for metrics creation.
@@ -105,6 +106,7 @@ class double_registration : public std::runtime_error {
using metric_type_def = sstring; /*!< Used to hold an inherit type (like bytes)*/
using metric_name_type = sstring; /*!< The metric name'*/
using instance_id_type = sstring; /*!< typically used for the shard id*/
+using skip_when_empty = bool_class<class skip_when_empty_tag>;

/*!
* \brief Human-readable description of a metric/group.
@@ -342,6 +344,16 @@ class metric_value {
return std::get<histogram>(u);
}

+ /*!
+ * \brief return true if this metric was never used
+ *
+ * Histograms, Summaries and counters are ever growing by nature, so
+ * it is possible to check if they have been used or not.
+ */
+ bool is_empty() const noexcept {
+ return ((_type == data_type::HISTOGRAM || _type == data_type::SUMMARY) && get_histogram().sample_count == 0) ||
+ ((_type == data_type::COUNTER || _type == data_type::REAL_COUNTER) && d() == 0);
+ }
private:
static void ulong_conversion_error(double d);
};
@@ -359,9 +371,12 @@ struct metric_definition_impl {
metric_function f;
description d;
bool enabled = true;
+ skip_when_empty _skip_when_empty = skip_when_empty::no;
std::map<sstring, sstring> labels;
metric_definition_impl& operator ()(bool enabled);
metric_definition_impl& operator ()(const label_instance& label);
+ metric_definition_impl& operator ()(skip_when_empty skip) noexcept;
+ metric_definition_impl& set_skip_when_empty(bool skip=true) noexcept;
metric_definition_impl& set_type(const sstring& type_name);
metric_definition_impl(
metric_name_type name,
diff --git a/include/seastar/core/metrics_api.hh b/include/seastar/core/metrics_api.hh
index 6843df47ab..6cf2d68178 100644
--- a/include/seastar/core/metrics_api.hh
+++ b/include/seastar/core/metrics_api.hh
@@ -161,6 +161,7 @@ struct metric_family_info {
struct metric_info {
metric_id id;
bool enabled;
+ skip_when_empty skip_when_empty;
};


@@ -185,7 +186,7 @@ class registered_metric {
metric_function _f;
shared_ptr<impl> _impl;
public:
- registered_metric(metric_id id, metric_function f, bool enabled=true);
+ registered_metric(metric_id id, metric_function f, bool enabled=true, skip_when_empty skip=skip_when_empty::no);
virtual ~registered_metric() {}
virtual metric_value operator()() const {
return _f();
@@ -198,7 +199,9 @@ class registered_metric {
void set_enabled(bool b) {
_info.enabled = b;
}
-
+ void set_skip_when_empty(skip_when_empty skip) noexcept {
+ _info.skip_when_empty = skip;
+ }
const metric_id& get_id() const {
return _info.id;
}
@@ -330,7 +333,7 @@ class impl {
return _value_map;
}

- void add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled);
+ void add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, skip_when_empty skip);
void remove_registration(const metric_id& id);
future<> stop() {
return make_ready_future<>();
diff --git a/src/core/metrics.cc b/src/core/metrics.cc
index ea39b3e72d..b26248e9d8 100644
--- a/src/core/metrics.cc
+++ b/src/core/metrics.cc
@@ -118,9 +118,10 @@ bool label_instance::operator!=(const label_instance& id2) const {
label shard_label("shard");
namespace impl {

-registered_metric::registered_metric(metric_id id, metric_function f, bool enabled) :
+registered_metric::registered_metric(metric_id id, metric_function f, bool enabled, skip_when_empty skip) :
_f(f), _impl(get_local_impl()) {
_info.enabled = enabled;
+ _info.skip_when_empty = skip;
_info.id = id;
}

@@ -167,11 +168,21 @@ metric_definition_impl& metric_definition_impl::operator ()(const label_instance
return *this;
}

+metric_definition_impl& metric_definition_impl::operator ()(skip_when_empty skip) noexcept {
+ _skip_when_empty = skip;
+ return *this;
+}
+
metric_definition_impl& metric_definition_impl::set_type(const sstring& type_name) {
type.type_name = type_name;
return *this;
}

+metric_definition_impl& metric_definition_impl::set_skip_when_empty(bool skip) noexcept {
+ _skip_when_empty = skip_when_empty(skip);
+ return *this;
+}
+
std::unique_ptr<metric_groups_def> create_metric_groups() {
return std::make_unique<metric_groups_impl>();
}
@@ -186,7 +197,7 @@ metric_groups_impl& metric_groups_impl::add_metric(group_name_type name, const m

metric_id id(name, md._impl->name, md._impl->labels);

- get_local_impl()->add_registration(id, md._impl->type, md._impl->f, md._impl->d, md._impl->enabled);
+ get_local_impl()->add_registration(id, md._impl->type, md._impl->f, md._impl->d, md._impl->enabled, md._impl->_skip_when_empty);

_registration.push_back(id);
return *this;
@@ -327,8 +338,8 @@ std::vector<std::vector<metric_function>>& impl::functions() {
return _current_metrics;
}

-void impl::add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled) {
- auto rm = ::seastar::make_shared<registered_metric>(id, f, enabled);
+void impl::add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, skip_when_empty skip) {
+ auto rm = ::seastar::make_shared<registered_metric>(id, f, enabled, skip);
sstring name = id.full_name();
if (_value_map.find(name) != _value_map.end()) {
auto& metric = _value_map[name];
--
2.31.1

Amnon Heiman

<amnon@scylladb.com>
unread,
Jun 29, 2022, 10:35:25 AM6/29/22
to seastar-dev@googlegroups.com
Aggregate by labels is a mechanism for reporting aggregated results. Most
commonly it is used to report one histogram per node instead of per
shard, but it can be used for counters, and gauges.

This patch adds an option to mark a metric with a vector of labels.
That vector will be part of the metric meta-data so the reporting
layer would be able to aggregate over it.
---
include/seastar/core/metrics.hh | 5 ++++-
include/seastar/core/metrics_api.hh | 3 ++-
src/core/metrics.cc | 16 +++++++++++++---
3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/seastar/core/metrics.hh b/include/seastar/core/metrics.hh
index 3e91d779b0..1902b4e08b 100644
--- a/include/seastar/core/metrics.hh
+++ b/include/seastar/core/metrics.hh
@@ -372,10 +372,12 @@ struct metric_definition_impl {
description d;
bool enabled = true;
skip_when_empty _skip_when_empty = skip_when_empty::no;
+ std::vector<std::string> aggregate_labels;
std::map<sstring, sstring> labels;
metric_definition_impl& operator ()(bool enabled);
metric_definition_impl& operator ()(const label_instance& label);
metric_definition_impl& operator ()(skip_when_empty skip) noexcept;
+ metric_definition_impl& aggregate(const std::vector<label>& labels) noexcept;
metric_definition_impl& set_skip_when_empty(bool skip=true) noexcept;
metric_definition_impl& set_type(const sstring& type_name);
metric_definition_impl(
@@ -383,7 +385,8 @@ struct metric_definition_impl {
metric_type type,
metric_function f,
description d,
- std::vector<label_instance> labels);
+ std::vector<label_instance> labels,
+ std::vector<label> aggregate_labels = {});
};

class metric_groups_def {
diff --git a/include/seastar/core/metrics_api.hh b/include/seastar/core/metrics_api.hh
index 6cf2d68178..a7dd0b52f9 100644
--- a/include/seastar/core/metrics_api.hh
+++ b/include/seastar/core/metrics_api.hh
@@ -152,6 +152,7 @@ struct metric_family_info {
metric_type_def inherit_type;
description d;
sstring name;
+ std::vector<std::string> aggregate_labels;
};


@@ -333,7 +334,7 @@ class impl {
return _value_map;
}

- void add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, skip_when_empty skip);
+ void add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, skip_when_empty skip, const std::vector<std::string>& aggregate_labels);
void remove_registration(const metric_id& id);
future<> stop() {
return make_ready_future<>();
diff --git a/src/core/metrics.cc b/src/core/metrics.cc
index b26248e9d8..8c5e38337f 100644
--- a/src/core/metrics.cc
+++ b/src/core/metrics.cc
@@ -147,7 +147,8 @@ metric_definition_impl::metric_definition_impl(
metric_type type,
metric_function f,
description d,
- std::vector<label_instance> _labels)
+ std::vector<label_instance> _labels,
+ std::vector<label> _aggregate_labels)
: name(name), type(type), f(f)
, d(d), enabled(true) {
for (auto i: _labels) {
@@ -156,6 +157,7 @@ metric_definition_impl::metric_definition_impl(
if (labels.find(shard_label.name()) == labels.end()) {
labels[shard_label.name()] = shard();
}
+ aggregate(_aggregate_labels);
}

metric_definition_impl& metric_definition_impl::operator ()(bool _enabled) {
@@ -178,6 +180,13 @@ metric_definition_impl& metric_definition_impl::set_type(const sstring& type_nam
return *this;
}

+metric_definition_impl& metric_definition_impl::aggregate(const std::vector<label>& _labels) noexcept {
+ aggregate_labels.reserve(_labels.size());
+ std::transform(_labels.begin(), _labels.end(),std::back_inserter(aggregate_labels),
+ [](const label& l) { return l.name(); });
+ return *this;
+}
+
metric_definition_impl& metric_definition_impl::set_skip_when_empty(bool skip) noexcept {
_skip_when_empty = skip_when_empty(skip);
return *this;
@@ -197,7 +206,7 @@ metric_groups_impl& metric_groups_impl::add_metric(group_name_type name, const m

metric_id id(name, md._impl->name, md._impl->labels);

- get_local_impl()->add_registration(id, md._impl->type, md._impl->f, md._impl->d, md._impl->enabled, md._impl->_skip_when_empty);
+ get_local_impl()->add_registration(id, md._impl->type, md._impl->f, md._impl->d, md._impl->enabled, md._impl->_skip_when_empty, md._impl->aggregate_labels);

_registration.push_back(id);
return *this;
@@ -338,7 +347,7 @@ std::vector<std::vector<metric_function>>& impl::functions() {
return _current_metrics;
}

-void impl::add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, skip_when_empty skip) {
+void impl::add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, skip_when_empty skip, const std::vector<std::string>& aggregate_labels) {
auto rm = ::seastar::make_shared<registered_metric>(id, f, enabled, skip);
sstring name = id.full_name();
if (_value_map.find(name) != _value_map.end()) {
@@ -355,6 +364,7 @@ void impl::add_registration(const metric_id& id, const metric_type& type, metric

Amnon Heiman

<amnon@scylladb.com>
unread,
Jun 29, 2022, 10:35:26 AM6/29/22
to seastar-dev@googlegroups.com
This patch adds multiple functionality to Prometheus reporting:
1. Add summary reporting. Summaries are used in seastar to report aggregated
percentile information (for example p95 and p99)
The main usage is to report per-shard summary of a latency
histograms.
2. Support aggregated metrics. With an aggregated metrics,
Prometheus would aggregate multiple metrics based on labels and
would report the result. Usually this would be for reporting a single
latency histogram per node instead of per shard. But it could be used
for counters and gauge as well.
3. Skip empty counters, histograms and summaries. It's a common practice to
register lots of metrics even if they are not being used.
Histograms have a huge effect on performance, so not reporting an empty
histogram is a great performance boost both for the application and
for the Prometheus server.
This is true for Summaries and Counters as well, marking a metric
with skip_when_empty would mean Prometheus will not report the
metric if it was never used.
4. As an optimization, the stringstream that is used per metric is
reused and clear insted of recreated.

Signed-off-by: Amnon Heiman <am...@scylladb.com>
---
src/core/prometheus.cc | 168 +++++++++++++++++++++++++++++++----------
1 file changed, 130 insertions(+), 38 deletions(-)

diff --git a/src/core/prometheus.cc b/src/core/prometheus.cc
index 937eec7572..eeb995dac8 100644
--- a/src/core/prometheus.cc
+++ b/src/core/prometheus.cc
@@ -51,6 +51,8 @@ static std::string to_str(seastar::metrics::impl::data_type dt) {
return "counter";
case seastar::metrics::impl::data_type::HISTOGRAM:
return "histogram";
+ case seastar::metrics::impl::data_type::SUMMARY:
+ return "summary";
}
return "untyped";
}
@@ -63,6 +65,7 @@ static std::string to_str(const seastar::metrics::impl::metric_value& v) {
case seastar::metrics::impl::data_type::COUNTER:
return std::to_string(v.i());
case seastar::metrics::impl::data_type::HISTOGRAM:
+ case seastar::metrics::impl::data_type::SUMMARY:
break;
}
return ""; // we should never get here but it makes the compiler happy
@@ -478,14 +481,118 @@ metric_family_range get_range(const metrics_families_per_shard& mf, const sstrin

}

+void write_histogram(std::stringstream& s, const config& ctx, const sstring& name, const seastar::metrics::histogram& h, std::map<sstring, sstring> labels) noexcept {
+ add_name(s, name + "_sum", labels, ctx);
+ s << h.sample_sum << '\n';
+
+ add_name(s, name + "_count", labels, ctx);
+ s << h.sample_count << '\n';
+
+ auto& le = labels["le"];
+ auto bucket = name + "_bucket";
+ for (auto i : h.buckets) {
+ le = std::to_string(i.upper_bound);
+ add_name(s, bucket, labels, ctx);
+ s << i.count << '\n';
+ }
+ labels["le"] = "+Inf";
+ add_name(s, bucket, labels, ctx);
+ s << h.sample_count << '\n';
+}
+
+void write_summary(std::stringstream& s, const config& ctx, const sstring& name, const seastar::metrics::histogram& h, std::map<sstring, sstring> labels) noexcept {
+ if (h.sample_sum) {
+ add_name(s, name + "_sum", labels, ctx);
+ s << h.sample_sum << '\n';
+ }
+ if (h.sample_count) {
+ add_name(s, name + "_count", labels, ctx);
+ s << h.sample_count << '\n';
+ }
+ auto& le = labels["quantile"];
+ for (auto i : h.buckets) {
+ le = std::to_string(i.upper_bound);
+ add_name(s, name, labels, ctx);
+ s << i.count << '\n';
+ }
+}
+/*!
+ * \brief a helper class to aggregate metrics over labels
+ *
+ * This class sum multiple metrics based on a list of labels.
+ * It returns one or more metrics each aggregated by the aggregate_by labels.
+ *
+ * To use it, you define what labels it should aggregate by and then pass to
+ * it metrics with their labels.
+ * For example if a metrics has a 'shard' and 'name' labels and you aggregate by 'shard'
+ * it would return a map of metrics each with only the 'name' label
+ *
+ */
+class metric_aggregate_by_labels {
+ std::vector<std::string> _labels_to_aggregate_by;
+ std::unordered_map<std::map<sstring, sstring>, seastar::metrics::impl::metric_value> _values;
+public:
+ metric_aggregate_by_labels(std::vector<std::string> labels) : _labels_to_aggregate_by(std::move(labels)) {
+ }
+ /*!
+ * \brief add a metric
+ *
+ * This method gets a metric and its labels and adds it to the aggregated metric.
+ * For example, if a metric has the labels {'shard':'0', 'name':'myhist'} and we are aggregating
+ * over 'shard'
+ * The metric would be added to the aggregated metric with labels {'name':'myhist'}.
+ *
+ */
+ void add(const seastar::metrics::impl::metric_value& m, std::map<sstring, sstring> labels) noexcept {
+ for (auto&& l : _labels_to_aggregate_by) {
+ labels.erase(l);
+ }
+ std::unordered_map<std::map<sstring, sstring>, seastar::metrics::impl::metric_value>::iterator i = _values.find(labels);
+ if ( i == _values.end()) {
+ _values.emplace(std::move(labels), m);
+ } else {
+ i->second += m;
+ }
+ }
+ const std::unordered_map<std::map<sstring, sstring>, seastar::metrics::impl::metric_value>& get_values() const noexcept {
+ return _values;
+ }
+ bool empty() const noexcept {
+ return _values.empty();
+ }
+};
+
+std::string get_value_as_string(std::stringstream& s, const mi::metric_value& value) noexcept {
+ std::string value_str;
+ try {
+ value_str = to_str(value);
+ } catch (const std::range_error& e) {
+ seastar_logger.debug("prometheus: get_value_as_string: {}: {}", s.str(), e.what());
+ value_str = "NaN";
+ } catch (...) {
+ auto ex = std::current_exception();
+ // print this error as it's ignored later on by `connection::start_response`
+ seastar_logger.error("prometheus: get_value_as_string: {}: {}", s.str(), ex);
@@ -493,48 +600,33 @@ future<> write_text_representation(output_stream<char>& out, const config& ctx,
s << "# TYPE " << name << " " << to_str(metric_family.metadata().type) << "\n";
found = true;
}
- if (value.type() == mi::data_type::HISTOGRAM) {
- auto&& h = value.get_histogram();
- std::map<sstring, sstring> labels = value_info.id.labels();
- s << "\n";
+ s << get_value_as_string(s, value) << '\n';
}
out.write(s.str()).get();
thread::maybe_yield();
});
+ if (!aggregated_values.empty()) {
+ for (auto&& h : aggregated_values.get_values()) {
+ s.clear();
+ s.str("");
+ if (h.second.type() == mi::data_type::HISTOGRAM) {
+ write_histogram(s, ctx, name, h.second.get_histogram(), h.first);
+ } else {
+ add_name(s, name, h.first, ctx);
+ s << get_value_as_string(s, h.second) << '\n';

Benny Halevy

<bhalevy@scylladb.com>
unread,
Jun 29, 2022, 11:28:25 AM6/29/22
to Amnon Heiman, seastar-dev@googlegroups.com
LGTM

Nadav Har'El

<nyh@scylladb.com>
unread,
Jun 30, 2022, 4:23:27 AM6/30/22
to Amnon Heiman, seastar-dev
I see (in clang 14.0.0 on my shiny new Fedora 36) compilation errors:

                 from /home/nyh/seastar.maintainer/src/core/reactor.cc:41:
/home/nyh/seastar.maintainer/include/seastar/core/metrics_api.hh:165:21: error: declaration of ‘seastar::metrics::skip_when_empty seastar::metrics::impl::metric_info::skip_when_empty’ changes meaning of ‘skip_when_empty’ [-fpermissive]
  165 |     skip_when_empty skip_when_empty;
      |                     ^~~~~~~~~~~~~~~
In file included from /home/nyh/seastar.maintainer/include/seastar/core/metrics_api.hh:24:
/home/nyh/seastar.maintainer/include/seastar/core/metrics.hh:109:7: note: ‘skip_when_empty’ declared here as ‘using skip_when_empty = class seastar::bool_class<seastar::metrics::skip_when_empty_tag>’
  109 | using skip_when_empty = bool_class<class skip_when_empty_tag>;
      |       ^~~~~~~~~~~~~~~
ninja: build stopped: subcommand failed.

You can send the next version as a pull request :-)

--
Nadav Har'El
n...@scylladb.com


--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.

Nadav Har'El

<nyh@scylladb.com>
unread,
Jun 30, 2022, 4:41:12 AM6/30/22
to Amnon Heiman, seastar-dev
I didn't understand the goal of this patch, which sounds so general that I think I need a specific example...

If I have a per-shard metric, and want to aggregate it to a per-node metric, what does this new "aggregate_labels" contain?  The name of the "shard" label? Something else?
Is there any other use for this option that is not about aggregating shards into nodes?

--
Nadav Har'El
n...@scylladb.com

On Wed, Jun 29, 2022 at 5:35 PM Amnon Heiman <am...@scylladb.com> wrote:
--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.

Commit Bot

<bot@cloudius-systems.com>
unread,
Jun 30, 2022, 7:41:35 AM6/30/22
to seastar-dev@googlegroups.com, Amnon Heiman
From: Amnon Heiman <am...@scylladb.com>
Committer: Amnon Heiman <am...@scylladb.com>
Branch: master

metrics.hh: Support SUMMARY type

This patch add support for the summary type on the metrics layer.

A summary is a different kind of histogram, it's buckets are percentile
so the reporting layer (i.e. Prometheus for example) would know to
report it correctly.

Signed-off-by: Amnon Heiman <am...@scylladb.com>

---
diff --git a/include/seastar/core/metrics.hh b/include/seastar/core/metrics.hh
--- a/include/seastar/core/metrics.hh
+++ b/include/seastar/core/metrics.hh

Commit Bot

<bot@cloudius-systems.com>
unread,
Jun 30, 2022, 7:41:37 AM6/30/22
to seastar-dev@googlegroups.com, Amnon Heiman
From: Amnon Heiman <am...@scylladb.com>
Committer: Amnon Heiman <am...@scylladb.com>
Branch: master

metrics.cc: missing count and sum when aggregating histograms

This patch adds a missing part to how histograms are being aggregated,
it needs to aggregate the sum and count as well.

Fixes #1114

Signed-off-by: Amnon Heiman <am...@scylladb.com>

---
diff --git a/src/core/metrics.cc b/src/core/metrics.cc
--- a/src/core/metrics.cc
+++ b/src/core/metrics.cc

Commit Bot

<bot@cloudius-systems.com>
unread,
Jun 30, 2022, 7:41:38 AM6/30/22
to seastar-dev@googlegroups.com, Amnon Heiman
From: Amnon Heiman <am...@scylladb.com>
Committer: Amnon Heiman <am...@scylladb.com>
Branch: master

metrics: support skip_when_empty

Skip when empty, means that metrics that are not in used, will not be
reported.

A common scenario is that a user register a metric, but that metric is
never used.
The most common case is histogram and summary but it is also valid
for counters.

This patch adds an option to mark a metric with skip_when_empty.
When done so, if a metric was never used (true for histogram, counters
and summaries) it will not be reported.

Signed-off-by: Amnon Heiman <am...@scylladb.com>

---
diff --git a/include/seastar/core/metrics.hh b/include/seastar/core/metrics.hh
--- a/include/seastar/core/metrics.hh
+++ b/include/seastar/core/metrics.hh
@@ -30,6 +30,7 @@
#include <map>
#include <seastar/core/metrics_types.hh>
#include <seastar/util/std-compat.hh>
+#include <seastar/util/bool_class.hh>

/*! \file metrics.hh
* \brief header for metrics creation.
@@ -105,6 +106,7 @@ public:
using metric_type_def = sstring; /*!< Used to hold an inherit type (like bytes)*/
using metric_name_type = sstring; /*!< The metric name'*/
using instance_id_type = sstring; /*!< typically used for the shard id*/
+using skip_when_empty = bool_class<class skip_when_empty_tag>;

/*!
* \brief Human-readable description of a metric/group.
@@ -342,6 +344,16 @@ public:
return std::get<histogram>(u);
}

+ /*!
+ * \brief return true if this metric was never used
+ *
+ * Histograms, Summaries and counters are ever growing by nature, so
+ * it is possible to check if they have been used or not.
+ */
+ bool is_empty() const noexcept {
+ return ((_type == data_type::HISTOGRAM || _type == data_type::SUMMARY) && get_histogram().sample_count == 0) ||
+ ((_type == data_type::COUNTER || _type == data_type::REAL_COUNTER) && d() == 0);
+ }
private:
static void ulong_conversion_error(double d);
};
@@ -359,9 +371,12 @@ struct metric_definition_impl {
metric_function f;
description d;
bool enabled = true;
+ skip_when_empty _skip_when_empty = skip_when_empty::no;
std::map<sstring, sstring> labels;
metric_definition_impl& operator ()(bool enabled);
metric_definition_impl& operator ()(const label_instance& label);
+ metric_definition_impl& operator ()(skip_when_empty skip) noexcept;
+ metric_definition_impl& set_skip_when_empty(bool skip=true) noexcept;
metric_definition_impl& set_type(const sstring& type_name);
metric_definition_impl(
metric_name_type name,
diff --git a/include/seastar/core/metrics_api.hh b/include/seastar/core/metrics_api.hh
--- a/include/seastar/core/metrics_api.hh
+++ b/include/seastar/core/metrics_api.hh
@@ -161,6 +161,7 @@ struct metric_family_info {
struct metric_info {
metric_id id;
bool enabled;
+ skip_when_empty should_skip_when_empty;
};


@@ -185,7 +186,7 @@ class registered_metric {
metric_function _f;
shared_ptr<impl> _impl;
public:
- registered_metric(metric_id id, metric_function f, bool enabled=true);
+ registered_metric(metric_id id, metric_function f, bool enabled=true, skip_when_empty skip=skip_when_empty::no);
virtual ~registered_metric() {}
virtual metric_value operator()() const {
return _f();
@@ -198,7 +199,9 @@ public:
void set_enabled(bool b) {
_info.enabled = b;
}
-
+ void set_skip_when_empty(skip_when_empty skip) noexcept {
+ _info.should_skip_when_empty = skip;
+ }
const metric_id& get_id() const {
return _info.id;
}
@@ -330,7 +333,7 @@ public:
return _value_map;
}

- void add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled);
+ void add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, skip_when_empty skip);
void remove_registration(const metric_id& id);
future<> stop() {
return make_ready_future<>();
diff --git a/src/core/metrics.cc b/src/core/metrics.cc
--- a/src/core/metrics.cc
+++ b/src/core/metrics.cc
@@ -118,9 +118,10 @@ bool label_instance::operator!=(const label_instance& id2) const {
label shard_label("shard");
namespace impl {

-registered_metric::registered_metric(metric_id id, metric_function f, bool enabled) :
+registered_metric::registered_metric(metric_id id, metric_function f, bool enabled, skip_when_empty skip) :
_f(f), _impl(get_local_impl()) {
_info.enabled = enabled;
+ _info.should_skip_when_empty = skip;
_info.id = id;
}

@@ -167,11 +168,21 @@ metric_definition_impl& metric_definition_impl::operator ()(const label_instance
return *this;
}

+metric_definition_impl& metric_definition_impl::operator ()(skip_when_empty skip) noexcept {
+ _skip_when_empty = skip;
+ return *this;
+}
+
metric_definition_impl& metric_definition_impl::set_type(const sstring& type_name) {
type.type_name = type_name;
return *this;
}

+metric_definition_impl& metric_definition_impl::set_skip_when_empty(bool skip) noexcept {
+ _skip_when_empty = skip_when_empty(skip);
+ return *this;
+}
+
std::unique_ptr<metric_groups_def> create_metric_groups() {
return std::make_unique<metric_groups_impl>();
}
@@ -186,7 +197,7 @@ metric_groups_impl& metric_groups_impl::add_metric(group_name_type name, const m

metric_id id(name, md._impl->name, md._impl->labels);

- get_local_impl()->add_registration(id, md._impl->type, md._impl->f, md._impl->d, md._impl->enabled);
+ get_local_impl()->add_registration(id, md._impl->type, md._impl->f, md._impl->d, md._impl->enabled, md._impl->_skip_when_empty);

_registration.push_back(id);
return *this;
@@ -327,8 +338,8 @@ std::vector<std::vector<metric_function>>& impl::functions() {
return _current_metrics;
}

-void impl::add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled) {
- auto rm = ::seastar::make_shared<registered_metric>(id, f, enabled);
+void impl::add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, skip_when_empty skip) {
+ auto rm = ::seastar::make_shared<registered_metric>(id, f, enabled, skip);
sstring name = id.full_name();
if (_value_map.find(name) != _value_map.end()) {

Commit Bot

<bot@cloudius-systems.com>
unread,
Jun 30, 2022, 7:41:38 AM6/30/22
to seastar-dev@googlegroups.com, Amnon Heiman
From: Amnon Heiman <am...@scylladb.com>
Committer: Amnon Heiman <am...@scylladb.com>
Branch: master

metrics: support aggregate by labels

Aggregate by labels is a mechanism for reporting aggregated results. Most
commonly it is used to report one histogram per node instead of per
shard, but it can be used for counters, and gauges.

This patch adds an option to mark a metric with a vector of labels.
That vector will be part of the metric meta-data so the reporting
layer would be able to aggregate over it.

---
diff --git a/include/seastar/core/metrics.hh b/include/seastar/core/metrics.hh
--- a/include/seastar/core/metrics.hh
+++ b/include/seastar/core/metrics.hh
@@ -372,18 +372,21 @@ struct metric_definition_impl {
description d;
bool enabled = true;
skip_when_empty _skip_when_empty = skip_when_empty::no;
+ std::vector<std::string> aggregate_labels;
std::map<sstring, sstring> labels;
metric_definition_impl& operator ()(bool enabled);
metric_definition_impl& operator ()(const label_instance& label);
metric_definition_impl& operator ()(skip_when_empty skip) noexcept;
+ metric_definition_impl& aggregate(const std::vector<label>& labels) noexcept;
metric_definition_impl& set_skip_when_empty(bool skip=true) noexcept;
metric_definition_impl& set_type(const sstring& type_name);
metric_definition_impl(
metric_name_type name,
metric_type type,
metric_function f,
description d,
- std::vector<label_instance> labels);
+ std::vector<label_instance> labels,
+ std::vector<label> aggregate_labels = {});
};

class metric_groups_def {
diff --git a/include/seastar/core/metrics_api.hh b/include/seastar/core/metrics_api.hh
--- a/include/seastar/core/metrics_api.hh
+++ b/include/seastar/core/metrics_api.hh
@@ -152,6 +152,7 @@ struct metric_family_info {
metric_type_def inherit_type;
description d;
sstring name;
+ std::vector<std::string> aggregate_labels;
};


@@ -333,7 +334,7 @@ public:
return _value_map;
}

- void add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, skip_when_empty skip);
+ void add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, skip_when_empty skip, const std::vector<std::string>& aggregate_labels);
void remove_registration(const metric_id& id);
future<> stop() {
return make_ready_future<>();
diff --git a/src/core/metrics.cc b/src/core/metrics.cc
--- a/src/core/metrics.cc
+++ b/src/core/metrics.cc
@@ -147,7 +147,8 @@ metric_definition_impl::metric_definition_impl(
metric_type type,
metric_function f,
description d,
- std::vector<label_instance> _labels)
+ std::vector<label_instance> _labels,
+ std::vector<label> _aggregate_labels)
: name(name), type(type), f(f)
, d(d), enabled(true) {
for (auto i: _labels) {
@@ -156,6 +157,7 @@ metric_definition_impl::metric_definition_impl(
if (labels.find(shard_label.name()) == labels.end()) {
labels[shard_label.name()] = shard();
}
+ aggregate(_aggregate_labels);
}

metric_definition_impl& metric_definition_impl::operator ()(bool _enabled) {
@@ -178,6 +180,13 @@ metric_definition_impl& metric_definition_impl::set_type(const sstring& type_nam
return *this;
}

+metric_definition_impl& metric_definition_impl::aggregate(const std::vector<label>& _labels) noexcept {
+ aggregate_labels.reserve(_labels.size());
+ std::transform(_labels.begin(), _labels.end(),std::back_inserter(aggregate_labels),
+ [](const label& l) { return l.name(); });
+ return *this;
+}
+
metric_definition_impl& metric_definition_impl::set_skip_when_empty(bool skip) noexcept {
_skip_when_empty = skip_when_empty(skip);
return *this;
@@ -197,7 +206,7 @@ metric_groups_impl& metric_groups_impl::add_metric(group_name_type name, const m

metric_id id(name, md._impl->name, md._impl->labels);

- get_local_impl()->add_registration(id, md._impl->type, md._impl->f, md._impl->d, md._impl->enabled, md._impl->_skip_when_empty);
+ get_local_impl()->add_registration(id, md._impl->type, md._impl->f, md._impl->d, md._impl->enabled, md._impl->_skip_when_empty, md._impl->aggregate_labels);

_registration.push_back(id);
return *this;
@@ -338,7 +347,7 @@ std::vector<std::vector<metric_function>>& impl::functions() {
return _current_metrics;
}

-void impl::add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, skip_when_empty skip) {
+void impl::add_registration(const metric_id& id, const metric_type& type, metric_function f, const description& d, bool enabled, skip_when_empty skip, const std::vector<std::string>& aggregate_labels) {
auto rm = ::seastar::make_shared<registered_metric>(id, f, enabled, skip);
sstring name = id.full_name();
if (_value_map.find(name) != _value_map.end()) {

Commit Bot

<bot@cloudius-systems.com>
unread,
Jun 30, 2022, 7:41:40 AM6/30/22
to seastar-dev@googlegroups.com, Amnon Heiman
From: Amnon Heiman <am...@scylladb.com>
Committer: Amnon Heiman <am...@scylladb.com>
Branch: master

prometheus.cc: Optimize reporting

This patch adds multiple functionality to Prometheus reporting:
1. Add summary reporting. Summaries are used in seastar to report aggregated
percentile information (for example p95 and p99)
The main usage is to report per-shard summary of a latency
histograms.
2. Support aggregated metrics. With an aggregated metrics,
Prometheus would aggregate multiple metrics based on labels and
would report the result. Usually this would be for reporting a single
latency histogram per node instead of per shard. But it could be used
for counters and gauge as well.
3. Skip empty counters, histograms and summaries. It's a common practice to
register lots of metrics even if they are not being used.
Histograms have a huge effect on performance, so not reporting an empty
histogram is a great performance boost both for the application and
for the Prometheus server.
This is true for Summaries and Counters as well, marking a metric
with skip_when_empty would mean Prometheus will not report the
metric if it was never used.
4. As an optimization, the stringstream that is used per metric is
reused and clear insted of recreated.

Signed-off-by: Amnon Heiman <am...@scylladb.com>

---
diff --git a/src/core/prometheus.cc b/src/core/prometheus.cc
--- a/src/core/prometheus.cc
+++ b/src/core/prometheus.cc
@@ -51,6 +51,8 @@ static std::string to_str(seastar::metrics::impl::data_type dt) {
return "counter";
case seastar::metrics::impl::data_type::HISTOGRAM:
return "histogram";
+ case seastar::metrics::impl::data_type::SUMMARY:
+ return "summary";
}
return "untyped";
}
@@ -63,6 +65,7 @@ static std::string to_str(const seastar::metrics::impl::metric_value& v) {
case seastar::metrics::impl::data_type::COUNTER:
return std::to_string(v.i());
case seastar::metrics::impl::data_type::HISTOGRAM:
+ case seastar::metrics::impl::data_type::SUMMARY:
break;
}
return ""; // we should never get here but it makes the compiler happy
@@ -478,63 +481,152 @@ metric_family_range get_range(const metrics_families_per_shard& mf, const sstrin
+ /*!
+ if (value_info.should_skip_when_empty && value.is_empty()) {
+ return;
+ }
if (!found) {
if (metric_family.metadata().d.str() != "") {
s << "# HELP " << name << " " << metric_family.metadata().d.str() << "\n";
}
Reply all
Reply to author
Forward
0 new messages