Thank you for these hints ,
I am using tempo helm chart with very basics config ,only Azure storage . I think service monitor helm value does not work fully . I use the following tempo helm values file :
But in exchange I have the metrics from the source of the spans of tempo , Otel Collector from the same 6h time range of first post and it looks lie a max burst of 50 spans :
image18251089 86.4 KB
Tempo 1.4 was just released today. This release does include some memory improvements during compaction, but we are still struggling with the same basic problem as above.
-in-grafana-tempo-1.4-introducing-the-metrics-
generator/
Release v1.4.0 grafana/tempo GitHub
We have switched to the microservices version and our workload is much much larger . Our grafana tempo deployment is about 20 Gigs in microservices memory combined. We have 10 k spans per minute !
Memory snapshot of tempo for 10k spans per minuge:
image827477 40.9 KB
I am using Grafana 9.5.2 with tempo 2.0.1 in docker. When I click on Explore select Temp and then type in a trace id to search and click run query I get an error. How can get to the root of what is causing this ? I have tried tempo 2.1.1 image with the same results.
Check this metric: tempo/distributor.go at 52462ae0d47cdf5818a4daf36e4ac5f47e6bbf60 grafana/tempo GitHub it should show ingester push failures by ingester. This will help us narrow down which ingester is failing.
Hi, the metric tempo_distributor_ingester_append_failures_total means the distributor component had trouble forwarding traffic to the ingesters. More detail will be in the distributor logs, possibly the error pusher failed to consume trace data. Based on your screenshot it looks like some traffic was ok because the bottom left panel Ingester Traces Created has data.
12 mo ago, I was playing with this awesome combo and published my experiments here: GitHub - florinpatrascu/elixir_grafana_loki_tempo: A very simple Elixir demo/project used for experimenting with Grafana Tempo and Loki #opentelemetry #observability #tracing, maybe you can find something useful in it?! hth
We are using the grafana agent for tracing and metrics (we also used to use it for logging but have since moved to fly logs shipper to capture log messages that are generated by fly infrastructure as well).
We also need to change the JAEGER_AGENT_HOST variable in HOTROD (hotrod-deployment.yaml) to tempo for the correct identification of traces. Incorrect value or missing value may lead to the following error:
It is possible to use the Grafana Tempo tracing backend exposing the Jaeger API.tempo-query is aJaeger storage plugin. It accepts the full Jaeger query API and translates theserequests into Tempo queries.
When the Tempo instance is deployed with the needed configurations, you have tosetmeshConfig.defaultConfig.tracing.zipkin.addressfrom Istio to the Tempo Distributor service and the Zipkin port. Tanka will deploythe service in distributor.tempo.svc.cluster.local:9411.
Now, you are ready to configure themeshConfig.defaultConfig.tracing.zipkin.addressfield in your Istio installation. It needs to be set to the 9411 port of theTempo Distributor service. For the previous example, this value will betempo-smm-distributor.tempo.svc.cluster.local:9411.
Now, you need to configure the in_cluster_url setting from Kiali to accessthe Jaeger API. You can point to the 16685 port to use GRPC or 16686 if not.For the given example, the value would be -ssm-query-frontend.tempo.svc.cluster.local:16685.
The following TempoStack CR enables authentication with multitenancy and configures two tenants with the names, dev and prod. The example uses S3 object storage with secret tempostack-dev-minio. Refer to the documentation to learn how to set it up.
The operator creates a tempo-simplest-gateway route to access the UI, however we need to create a ClusterRole to allow users accessing the UI. The following ClusterRole gives all OpenShift authenticated users access to the dev tenant. Then the Jaeger UI can be accessed on this URL: -simplest-gateway-observability.OpenShift base domain/api/traces/v1/dev/search).
Having a diagram which displays all elements involved in a request through a microservices increases the speed to find bugs or to understand what happened in your system when running a postmortem analysis. Reducing that time, you increase efficiency so that your developers can keep working and producing more business requirements. In my personal opinion, that is the key point: Increase the business productivity. Traceability and, in this case, grafana stack helps you to accomplish that.
dca57bae1f