Enhance Availability with Thanos Time-Based Sharding: Efficient Resource Utilization

What is Time-Based Sharding?

Thanos time-based sharding splits Prometheus metrics data into time-based segments (shards), such as daily, weekly, or monthly. This enhances performance and scalability by managing and querying large volumes of historical data more efficiently.

Benefits of Time-Based Sharding:
1. Improved Query Performance: Limits the amount of data scanned for time-specific queries, speeding up response times.
2. Scalability: Facilitates horizontal scaling by distributing data across storage backend.
3. Efficient Resource Utilization: Independent management and querying of each shard optimize computational and storage resources.
4. Simplified Data Management: Allows effective application of data retention policies.
5. Independent Resource Configuration: Different resources can be allocated based on shard size and index.

Example Scenario
Before implementing time-based sharding, we used 6 replicas for storegateway, each with the same resources and data on different PVCs (0-5), consuming excessive resources. Time-based sharding made our setup more reliable.

We create the sharding index from the latest to the oldest data because it is easier to split the oldest index if needed without having to rebuild all the sharding indexes.

TL;DR

Thanos mendukung konfigurasi time based sharding dengan opsi kebebasan mengatur resource tiap sharding.

Berikut contoh konfigurasi time base sharding yg dipasang dengan bantuan helm ke kubernetes.

storegateway:
  enabled: true
  logLevel: info
  logFormat: json

  replicaCount: 2

  resources:
    limits:
      cpu: 2
      memory: 3.6Gi # 1.2x request
    requests:
      cpu: 1
      memory: 3Gi

  persistence:
    enabled: true
    storageClass: faster
    accessModes:
    - ReadWriteOnce
    size: 64Gi

  sharded:
    enabled: true
    timePartitioning:
      # One store for data newer than 2 days
      - max: ""
        min: -2d
      # One store for data newer than 1 weeks and older than 2 day
      - max: -2d
        min: -1w
      # One store for data newer than 6 weeks and older than 1 weeks
      - max: -1w
        min: -6w
      # One store for data older than 6 weeks
      - max: -6w
        min: ""
        resources:
          limits:
            cpu: 2
            memory: 9.6Gi # 1.2x request
          requests:
            cpu: 1
            memory: 8Gi

Ref: https://thanos.io/tip/thanos/quick-tutorial.md/#components

Demikian, semoga membantu

Estu~

Tinggalkan komentar

Situs ini menggunakan Akismet untuk mengurangi spam. Pelajari bagaimana data komentar Anda diproses.