Greetings SIG-Auth folks,
During today's biweekly meeting, we discussed the possible deprecation and removal of the
apiserver_envelope_encryption_key_id_hash_total metric. I proposed removing it for four reasons. First, it no longer works as intended. Second, it is causing a performance bottleneck due to lock contention. Third, the information we intended for
the metric to convey can be obtained by other means. Fourth, we do not know of many users who are using this metric.
It no longer works as intended because the kube-apiserver doesn’t know when the keys are no longer used. Thus, querying the metric can yield inaccurate information.
The performance bottleneck caused by lock contention is documented here:
To paraphrase Mo's comment (
https://github.com/kubernetes/kubernetes/issues/127772#issuecomment-2386433795),
it used to be possible to delete the metrics,`get` the encrypted resources to repopulate the metrics, then query the metrics for an accurate report of the key IDs in use. This is no longer possible, and the only other way to use this metric to get the information
we intended to convey (the list of KeyIDs in use), is to restart the API Server. This is not a reasonable recommendation to make.
Rather than recommending that users restart the API Server, we can ask them to scrape the list of KeyIDs from etcd since that information is persisted there.
A brief discussion on today's call revealed none of us are using this metric today. If this is a misunderstanding and you depend on this this metric in some way, please let us know ASAP so that we can discuss how to move forward.
Thanks,
Peter Engelbert