I'm running 3 Queriers with 12 cpu's each behind 2 query frontends
but only getting around 500MB of bytes processed per second. This
speed makes Loki metric queries unusable, and after testing Loki for
2 months we are highly disappointed with the performance we are
getting.
I fear there is some bottleneck between S3 and
the queriers but i'm not really sure about that at all.
What
actions go into increasing query speeds? I removed all caches from
the configuration so I can better test the s3 to queriers speed.
Logcli stats
results of a 4 hour query:
Ingester.TotalReached 0
Ingester.TotalChunksMatched 0
Ingester.TotalBatches 0
Ingester.TotalLinesSent 0
Ingester.HeadChunkBytes 0B
Ingester.HeadChunkLines 0
Ingester.DecompressedBytes 0
Ingester.DecompressedLines 0
Ingester.CompressedBytes 0B
Ingester.TotalDuplicates 0
Store.TotalChunksRef 450
Store.TotalChunksDownloaded 450
Store.ChunksDownloadTime 25.736060467s
Store.HeadChunkBytes 0B
Store.HeadChunkLines 0
Store.DecompressedBytes 7.5GB
Store.DecompressedLines 42448956
Store.CompressedBytes 1.5GB
Store.TotalDuplicates 6294356
Summary.BytesProcessedPerSecond 586MB
Summary.LinesProcessedPerSecond 3297466
Summary.TotalBytesProcessed 7.5GB
Summary.TotalLinesProcessed 42448956
Summary.ExecTime 12.873201912s
My
Loki Configuration:
auth_enabled:
false
server:
http_listen_port: 3100
grpc_listen_port: 9096
grpc_server_max_concurrent_streams: 1000
grpc_server_max_recv_msg_size: 104857600
grpc_server_max_send_msg_size: 104857600
http_server_idle_timeout: 120s
http_server_write_timeout: 1m
distributor:
ring:
kvstore:
store: memberlist
ingester:
chunk_encoding: snappy
chunk_block_size: 262144
chunk_target_size: 4000000
chunk_idle_period: 15m
lifecycler:
heartbeat_period: 5s
join_after: 30s
num_tokens: 512
ring:
heartbeat_timeout: 1m
kvstore:
store: memberlist
replication_factor: 3
final_sleep: 0s
max_transfer_retries: 60
ingester_client:
grpc_client_config:
max_recv_msg_size: 67108864
remote_timeout: 1s
frontend:
compress_responses: true
log_queries_longer_than: 5s
max_outstanding_per_tenant: 1024
frontend_worker:
frontend_address: frontend:9096
grpc_client_config:
max_send_msg_size: 104857600
parallelism: 12
limits_config:
enforce_metric_name: false
ingestion_burst_size_mb: 10
ingestion_rate_mb: 5
ingestion_rate_strategy: local
max_global_streams_per_user: 10000
max_query_length: 12000h
max_query_parallelism: 254
max_streams_per_user: 0
reject_old_samples: true
reject_old_samples_max_age: 168h
querier:
query_ingesters_within: 2h
query_range:
align_queries_with_step: true
cache_results: false
max_retries: 5
parallelise_shardable_queries: false
split_queries_by_interval: 15m
compactor:
working_directory: /opt/loki/compactor
shared_store: s3
compaction_interval: 30m
schema_config:
configs:
- from: 2020-06-10
store: boltdb-shipper
object_store: s3
schema: v11
index:
prefix: test_
period: 24h
storage_config:
aws:
bucketnames: bucket_name
endpoint: endpoint
region: region
access_key_id: mysecret_key_id
secret_access_key: mysecret_access_key
http_config:
idle_conn_timeout: 90s
response_header_timeout: 0s
insecure_skip_verify: true
s3forcepathstyle: true
boltdb-shipper:
active_index_directory /opt/loki/boltdb-shipper-active
cache_location /opt/loki/boltdb-shipper-cache
shared_store: s3
memberlist:
abort_if_cluster_join_fails: false
bind_addr:
- the_bind_ip_address
bind_port: 7946
join_members:
- ip_address:7946
- off_all_the_loki_components:7946