this is two types of errors that i see in distributor, no error logs in ingesters (only 400 errors in ingesters)
level=warn ts=2020-09-11T15:14:15.55091129Z caller=logging.go:62 traceID=1e80d0d72c7dfb18 msg="POST /api/prom/push (500) 11.40001159s Response: \"context canceled\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 74202; Content-Type: application/x-protobuf; User-Agent: Prometheus/2.16.0; X-Forwarded-For: 10.254.178.57; X-Forwarded-Host:
cortex.devops.app.umusic.net; X-Forwarded-Port: 80; X-Forwarded-Proto: http; X-Prometheus-Remote-Write-Version: 0.1.0; X-Real-Ip: 10.254.178.57; X-Request-Id: aa786f8ba1483741acdcbb8503f9fb0d; X-Scheme: http; X-Scope-Orgid: eks-11; "
level=warn ts=2020-09-11T15:14:09.942532161Z caller=logging.go:62 traceID=69a628f39a21de24 msg="POST /api/prom/push (500) 6.100572749s Response: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 5908; Content-Type: application/x-protobuf; User-Agent: Prometheus/2.13.1; X-Forwarded-For: 10.104.33.77; X-Forwarded-Host:
cortex.devops.app.umusic.net; X-Forwarded-Port: 80; X-Forwarded-Proto: http; X-Prometheus-Remote-Write-Version: 0.1.0; X-Real-Ip: 10.104.33.77; X-Request-Id: 3859a4b2f0e3b3badc281b95c9d7b852; X-Scheme: http; X-Scope-Orgid: eks-13; "
Prometheus logs:
ts=2020-09-11T15:32:05.667Z caller=dedupe.go:112 component=remote level=error remote_name=435af2 url=
http://cortex.devops.local.int/api/prom/push/aws10-eks msg="non-recoverable error" count=361 err="context canceled"
ts=2020-09-11T15:32:05.667Z caller=dedupe.go:112 component=remote level=error remote_name=435af2 url=
http://cortex.devops.local.int/api/prom/push/aws10-eks msg="non-recoverable error" count=60 err="context canceled"
ts=2020-09-11T15:32:05.635Z caller=dedupe.go:112 component=remote level=error remote_name=435af2 url=
http://cortex.devops.local.int/api/prom/push/aws10-eks msg="Failed to flush all samples on shutdown"
ts=2020-09-11T15:32:02.947Z caller=dedupe.go:112 component=remote level=error remote_name=435af2 url=
http://cortex.devops.local.int/api/prom/push/aws10-eks msg="non-recoverable error" count=1000 err="server returned HTTP status 400 Bad Request: user=aws10-eks: sample timestamp out of order; last timestamp: 1599838222.874, incoming timestamp: 1599838162.874 for series {__name__=\"kube_pod_status_ready\", app_kubernetes_io_instance=\"kube-state-metrics\", app_kubernetes_io_managed_by=\"H"
ts=2020-09-11T15:32:02.665Z caller=dedupe.go:112 component=remote level=error remote_name=435af2 url=
http://cortex.devops.local.int/api/prom/push/aws10-eks msg="non-recoverable error" count=1000 err="server returned HTTP status 400 Bad Request: user=aws10-eks: sample timestamp out of order; last timestamp: 1599838222.874, incoming timestamp: 1599838162.874 for series {__name__=\"kube_secret_info\", app_kubernetes_io_instance=\"kube-state-metrics\", app_kubernetes_io_managed_by=\"Helm\","
........
ts=2020-09-11T15:01:22.707Z caller=dedupe.go:112 component=remote level=error remote_name=435af2 url=
http://cortex.devops.local.int/api/prom/push/aws10-eks msg="Remote storage resharding" from=3 to=5
level=info ts=2020-09-11T15:00:08.014Z caller=head.go:731 component=tsdb msg="WAL checkpoint complete" first=232 last=234 duration=1.254153897s
level=info ts=2020-09-11T15:00:06.759Z caller=head.go:661 component=tsdb msg="head GC completed" duration=77.995686ms
level=info ts=2020-09-11T15:00:06.314Z caller=compact.go:496 component=tsdb msg="write block" mint=1599825600000 maxt=1599832800000 ulid=01EHYTWDPEX8SSCGBQT4PVCP95 duration=2.908463458s
ts=2020-09-11T14:36:42.706Z caller=dedupe.go:112 component=remote level=info remote_name=435af2 url=
http://cortex.devops.local.int/api/prom/push/aws10-eks msg="Remote storage resharding" from=2 to=3
But i also want to increase resend retries time so I don't end up in same situation.
What is right value for 30 min in prom config ( min_backoff: 30m ) is this right ?
Im open if you have any recommendation for cortex (what can be misconfigured so i'm getting messages above in distributer )