Multiple superfluous response.WriteHeader errors from Prometheus/Thanos

Khusro Jaleel

unread,

Oct 25, 2019, 6:10:43 AM10/25/19

to Prometheus Users

Hello,

We saw lots of these errors in our Prometheus servers this morning after a compaction had already finished around 8am, and then a TSDB reload finished around 08:11:00

====================

level=info ts=2019-10-25T07:00:24.094Z caller=compact.go:496 component=tsdb msg="write block" mint=1571976000000 maxt=1571983200000 ulid=01DR......... duration=24.051855065s

level=info ts=2019-10-25T07:00:30.328Z caller=head.go:596 component=tsdb msg="head GC completed" duration=2.1240968s

level=info ts=2019-10-25T07:00:58.919Z caller=head.go:666 component=tsdb msg="WAL checkpoint complete" first=4857 last=4871 duration=28.590403845s

level=error ts=2019-10-25T07:12:50.377Z caller=stdlib.go:89 component=web caller="http: superfluous response.WriteHeader call from github.com/opentracing-contrib/go-stdlib/nethttp.(*statusCodeTracker).WriteHeader (status-code-tracker.go:19" msg=)

level=error ts=2019-10-25T07:12:58.076Z caller=stdlib.go:89 component=web caller="http: superfluous response.WriteHeader call from github.com/opentracing-contrib/go-stdlib/nethttp.(*statusCodeTracker).WriteHeader (status-code-tracker.go:19" msg=)

level=error ts=2019-10-25T07:12:58.087Z caller=stdlib.go:89 component=web caller="http: superfluous response.WriteHeader call from github.com/opentracing-contrib/go-stdlib/nethttp.(*statusCodeTracker).WriteHeader (status-code-tracker.go:19" msg=)

====================

Around the same time, the CPU usage of the prometheus pods spiked really high as you can see in the graphs attached. We have 2 different replicas (on completely different K8s nodes) and they both spiked in CPU at the same time. Due to this, Grafana was unable to pull any metrics from Prometheus and we saw errors happening in Grafana.

There is a Thanos sidecar within the same pod doing a "remote read" from the prometheus instances, could this be related in any way?

I have found one or two other people reporting a similar error but not exactly the same thing so am wondering if anyone has seen this before?

Thanks.

Screen Shot 2019-10-25 at 11.05.29.png

Screen Shot 2019-10-25 at 11.04.32.png

Screen Shot 2019-10-25 at 11.04.22.png

Simon Pasquier

unread,

Oct 25, 2019, 6:49:15 AM10/25/19

to Khusro Jaleel, Prometheus Users

Which version of Prometheus are you using?
The response.WriteHeader errors should be gone in v2.13.1 [1].

To investigate memory and CPU usage, you can use Go profiling (see the
"promtool debug pprof" command).

[1] https://github.com/prometheus/prometheus/issues/6139

> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c2b12b7a-4122-4785-bea3-63728af20946%40googlegroups.com.

Khusro Jaleel

unread,

Oct 25, 2019, 7:03:21 AM10/25/19

to Prometheus Users

Hi thanks for replying, I saw the bug to fix those errors after I posted my message and I can do an upgrade, however I'm not convinced that this would have caused Prometheus to become unavailable?

I'm going to have a look at "promtool debug pprof" output now thanks. I've attached a few more graphs below in case it helps - Allocations seem to have gone up from 70MB to almost 310MB at the same time, as well as a slight bump in the "Allocated" metric in the Memory usage graph.

I am assuming that the "Allocations" going up means that a lot more data was suddenly received from targets? Is that correct?

Screen Shot 2019-10-25 at 11.55.56.png

Reply all

Reply to author

Forward