> Assuming the second metric goes missing how is the binary expression evaluated exactly?
The same as it always is. Remember that the left-hand side and the right-hand side are both vectors, containing zero or more values, each value having a distinct set of labels. Noting the documentation
here:
vector1 and vector2 results in a vector consisting of the elements of vector1 for which there are elements in vector2 with exactly matching label sets. Other elements are dropped. The metric name and values are carried over from the left-hand side vector.
Therefore, if the RHS of "and" is an empty vector, then the result of the entire "and" expression is an empty vector - since there is nothing in vector2 for vector1 to match.
> In the "normal" case, i.e. "foo and bar" we would not have points but in the case of "absent(foo) and bar", from my tests, it seems to me the "bar" filtering is simply ignored.
I don't understand what mean by that. Can you give examples of the LHS and the RHS vectors, and the combined expression, which don't behave how you expect?
Note that "foo and bar" and "absent(foo) and bar" will both be empty if bar is empty, as just described.
"absent(foo)" is an unusual function:
- if the input vector has one or more values, i.e. any non-empty vector, its output is an empty vector (no values)
- if the input vector is empty, its output is one-element vector with a single value "1". The label set of that value depends on the exact form of the expression inside the parentheses; it tries to do "the right thing" but at worst you could have value 1 with empty label set {}
In your case,
absent(our_metric{environment="pro",service="bar",stack="foo"})
will return
{environment="pro",service="bar",stack="foo"} 1
i.e. a single-element vector with empty metric name, those labels, and the value 1.
Going back to the whole original expression:
absent(our_metric{environment="pro",service="bar",stack="foo"}) and on(stack, environment) up{service="bar",source="app"} == 1
ISTM that is saying you want to generate an alert if our_metric{environment="pro",service="bar",stack="foo"} is missing, but only if metric up{service="bar",source="app"} exists *and* has value 1. That means the alert is suppressed if either:
(a) up{service="bar",source="app"} exists but its value is not 1
(b) up{service="bar",source="app"} does not exist - i.e. that expression returns an empty vector. ("up" is a special metric in prometheus; if it doesn't exist, it means there is no configured scrape job with those labels)
If that's not what you want, then think about what you actually want, and then how to express that. For example, if you want to suppress the alert in case (a) but not in case (b), then you can do this:
absent(our_metric{environment="pro",service="bar",stack="foo"}) unless on(stack, environment) up{service="bar",source="app"} != 1
------
If you don't mind, I will make an observation about the use of "and on(...)". Since the LHS and RHS are vectors, an expression needs to identify corresponding values in the LHS vector and the RHS vector, to generate a vector of results. The on(...) part is when the LHS and RHS vectors don't have exactly the same label sets, and you need to ignore some when matching them up. I think you know all this already.
I find your expression rather confusing, because:
- we know that any values in the LHS vector must have labels {environment="pro",service="bar",stack="foo"}
- we know that any values in the RHS vector must have labels {service="bar",source="app"}
- "on(stack,environment)" says to pair up LHS and RHS values where the "stack" and "environment" labels match
- therefore, the RHS vector must also have stack="foo" and environment="pro"
- as this a one-to-one vector match: it will fail if a particular pair of (stack,environment) labels returns multiple values for the LHS and one or more for the RHS, or vice versa. Therefore we know (stack,environment) must be a unique match for a given service (*)
Therefore, implicitly I think all of (environment, service, stack) must match, i.e. this expression is the same as:
absent(our_metric{environment="pro",service="bar",stack="foo"}) and on(environment, service, stack) up{environment="pro",service="bar",stack="foo",source="app"} == 1
And this can be simplified to:
absent(our_metric{environment="pro",service="bar",stack="foo"}) and on(environment, service, stack) up{source="app"} == 1
I find the second version easier to read and reason about, because the environment/service/stack matching is all in one place, but you may disagree :-)
(*) This does provide another reason why an alert could fail to trigger. If the "and" expression returns multiple values for the same (stack,environment) pair on either the LHS or the RHS, with at least one match on the other side, then the whole expression will generate an error.
However, I think it's unlikely in this particular case. We know the LHS can only possibly return a single-element vector, so this error condition could only occur if up{service="bar",source="app"} == 1 returns multiple values with the same pair of (stack,environment) labels. That is, it would only be a problem if you had something like this:
up{environment="pro",service="bar",stack="foo",source="app",xxx="yyy"} 1
up{environment="pro",service="bar",stack="foo",source="app",xxx="zzz"} 1