Increase Bytes Processed per Second

186 views
Skip to first unread message

גיל מרי

unread,
Jul 20, 2021, 2:29:53 AM7/20/21
to lokiproject

I'm running 3 Queriers with 12 cpu's each behind 2 query frontends but only getting around 500MB of bytes processed per second. This speed makes Loki metric queries unusable, and after testing Loki for 2 months we are highly disappointed with the performance we are getting.

I fear there is some bottleneck between S3 and the queriers but i'm not really sure about that at all.

What actions go into increasing query speeds? I removed all caches from the configuration so I can better test the s3 to queriers speed.


Logcli stats results of a 4 hour query:

Ingester.TotalReached 0

Ingester.TotalChunksMatched 0

Ingester.TotalBatches 0

Ingester.TotalLinesSent 0

Ingester.HeadChunkBytes 0B

Ingester.HeadChunkLines 0

Ingester.DecompressedBytes 0

Ingester.DecompressedLines 0

Ingester.CompressedBytes 0B

Ingester.TotalDuplicates 0

Store.TotalChunksRef 450

Store.TotalChunksDownloaded 450

Store.ChunksDownloadTime 25.736060467s

Store.HeadChunkBytes 0B

Store.HeadChunkLines 0

Store.DecompressedBytes 7.5GB

Store.DecompressedLines 42448956

Store.CompressedBytes 1.5GB

Store.TotalDuplicates 6294356

Summary.BytesProcessedPerSecond 586MB

Summary.LinesProcessedPerSecond 3297466

Summary.TotalBytesProcessed 7.5GB

Summary.TotalLinesProcessed 42448956

Summary.ExecTime 12.873201912s



My Loki Configuration:
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  grpc_server_max_concurrent_streams: 1000
  grpc_server_max_recv_msg_size: 104857600
  grpc_server_max_send_msg_size: 104857600
  http_server_idle_timeout: 120s
  http_server_write_timeout: 1m

distributor:
  ring:
    kvstore:
      store: memberlist

ingester:
  chunk_encoding: snappy
  chunk_block_size: 262144
  chunk_target_size: 4000000
  chunk_idle_period: 15m
  lifecycler:
    heartbeat_period: 5s
    join_after: 30s
    num_tokens: 512
    ring:
      heartbeat_timeout: 1m
      kvstore:
        store: memberlist
      replication_factor: 3 
      final_sleep: 0s
      max_transfer_retries: 60

ingester_client:
  grpc_client_config:
    max_recv_msg_size: 67108864
  remote_timeout: 1s

frontend:
  compress_responses: true
  log_queries_longer_than: 5s
  max_outstanding_per_tenant: 1024

frontend_worker:
  frontend_address: frontend:9096
  grpc_client_config:
    max_send_msg_size: 104857600
    parallelism: 12

limits_config:
  enforce_metric_name: false
  ingestion_burst_size_mb: 10
  ingestion_rate_mb: 5
  ingestion_rate_strategy: local
  max_global_streams_per_user: 10000
  max_query_length: 12000h
  max_query_parallelism: 254
  max_streams_per_user: 0
  reject_old_samples: true
  reject_old_samples_max_age: 168h
 
querier:
  query_ingesters_within: 2h

query_range:
  align_queries_with_step: true
  cache_results: false
  max_retries: 5
  parallelise_shardable_queries: false
  split_queries_by_interval: 15m

compactor:
  working_directory: /opt/loki/compactor
  shared_store: s3
  compaction_interval: 30m

schema_config:
  configs:
    - from: 2020-06-10
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index: prefix: test_
      period: 24h

storage_config:
  aws:
    bucketnames: bucket_name
    endpoint: endpoint
    region: region
    access_key_id: mysecret_key_id
    secret_access_key: mysecret_access_key
    http_config:
      idle_conn_timeout: 90s
      response_header_timeout: 0s
      insecure_skip_verify: true
      s3forcepathstyle: true
    boltdb-shipper:
      active_index_directory /opt/loki/boltdb-shipper-active
      cache_location /opt/loki/boltdb-shipper-cache
      shared_store: s3

memberlist:
  abort_if_cluster_join_fails: false
  bind_addr:
    - the_bind_ip_address
  bind_port: 7946
  join_members:
    - ip_address:7946 - off_all_the_loki_components:7946
Reply all
Reply to author
Forward
0 new messages