> I do not really understand how expr works in prom rules - is it something that simply evaluates to either 1 or 'true' as a go bool type?
No. It's not boolean logic at all.
PromQL works with *vectors*: a vector contains zero or more values, each with a distinct set of labels. An alert fires whenever the vector is non-empty, regardless of the value. That is, a value of 0 triggers an alert just as much as a value of 1000. It's the presence or absence of a value which controls alerting.
Take, for example, the promql query "foo". It might return the following, all current values of metric foo:
foo{instance="aaa"} 7
foo{instance="bbb"} 3
foo{instance="ccc"} 1
That's a vector with three values.
Now take the promql query "foo > 2". It returns a vector with 2 values:
foo{instance="aaa"} 7
foo{instance="bbb"} 3
If you use "foo > 2" as an alerting expression, then you'll have two alerts firing. If the value of foo{instance="bbb"} drops to 2 or less, then the alerting expression returns an instant vector with only one value, so the bbb alert resolves, but the aaa alert continues.
This is the reason why "resolved" messages show the most recent value which triggered the alert, not the current (non-alerting) value. The current value is below the threshold, so is filtered out entirely from the PromQL results.
Now, an expression like count({__name__=~"tcpsocket(.+)Inbound"}) also gives a vector as its result. If there are no timeseries inside the parentheses, then it is the empty vector. If there are one or more timeseries, then you get a single-element vector containing a single value (which is the count of timeseries) and an empty label set. You can try this for yourself in the PromQL query browser:
count({__name__=~"blah_nonexistent(.*)"}) # empty result
count({__name__=~"node_filesystem(.*)"}) # {} 1234 where {} means "empty label set"
Now, when you do a binary operation between two vector values, by default the result vector has one entry for every label set which matches exactly between the LHS and RHS vectors. Any label set on the LHS which is not matched on the RHS, or vice versa, is discarded and gives no value in the result vector. But in this case, since the LHS and RHS will (almost) always have a single entry with empty label set, it will match.
Therefore, what I think you want is simply:
expr: count({__name__=~"tcpsocket(.+)Inbound"}) offset 30s != count({__name__=~"tcpsocket(.+)Inbound"})
That should do what you want *unless* __name__=~"tcpsocket(.+)Inbound" matches no timeseries at all, in which case the vector will be empty (on either the LHS or the RHS) and therefore the count() will be empty, and there's nothing to match to the other side. If this is an important case for you then you can fake up a vector with empty labels:
expr: count({__name__=~"tcpsocket(.+)Inbound"}) offset 30s != count({__name__=~"tcpsocket(.+)Inbound"}) or vector(0)
Again, PromQL's "or" operator doesn't behave like boolean expression. What "or" does is to match the vectors on the LHS and the RHS:
- for any value on the LHS, use the value and label set from the LHS in the result (whether or not it matches something in the RHS)
- for any value on the RHS, whose label set does not exist in the LHS, then add it to the result.
vector(0) is a static value: an instant vector containing one element whose label set is empty with value 0. So if the previous expression doesn't contain an element with empty label set, "... or vector(0)" will add it to the result, and that will trigger the alert (with value 0).