[RELEASE] Scylla Monitoring Stack 3.5.1

326 views
Skip to first unread message

Amnon Heiman

<amnon@scylladb.com>
unread,
Nov 15, 2020, 8:56:10 AM11/15/20
to scylladb-dev, ScyllaDB users
The Scylla team announces the release of Scylla Monitoring Stack 3.5.1
Scylla Monitoring Stack is an open-source stack for monitoring Scylla Enterprise and Scylla Open Source, based on Prometheus and Grafana. Scylla Monitoring Stack 3.5.1 supports:
  • Scylla Open Source versions 3.3, 4.0, 4.1, and 4.2

  • Scylla Enterprise versions 2019.x and 2020.x

  • Scylla Manager 2.1.x and 2.2.x


Related Links


Bug Fixes
  • sstable reads units are wrong #1125
  • Links on the "Nodes" table on "Overview" don't go to the correct place #1122
  • Hack that added support for the old and new ports doesn't work with the Agent graphs on the Manager dashboard #1120
  • The -N flag is ignored in start-all.sh #1116
  • Remove the avg line from the multi-graph panels in the Overview dashboard #1115
  • The advanced dashboard has a dot in the uid #1112
  • Average read latency is miscalculated #1110
  • Manager showing "offline" despite metrics being present #1109
  • branch-3.5 grafana failure #1104
  • Alternator: "Average UpdateItem latency by Instance", show data in seconds and not milliseconds #1101
  • Alternator dashboard doesn't have GetItem Latencies #1100

Known issues

Following Scylla-Monitoring 2.2 ports change, Prometheus will listen to both the old port and the new to help during the migration.
This is was found to cause issues when the port in scylla_manager_server.yml is changed to the new 5090 port.

We suggest that following a Scylla-Manager upgrade to version 2.2, edit prometheus/prometheus.yml.template and remove scylla_manager1 job from it.

Amnon Heiman

<amnon@scylladb.com>
unread,
Nov 24, 2020, 2:53:28 AM11/24/20
to scylladb-dev, ScyllaDB users
The Scylla team announces the release of Scylla Monitoring Stack 3.5.2
Scylla Monitoring Stack is an open-source stack for monitoring Scylla Enterprise and Scylla Open Source, based on Prometheus and Grafana. Scylla Monitoring Stack 3.5.2 supports:
  • Scylla Open Source versions 3.3, 4.0, 4.1, and 4.2

  • Scylla Enterprise versions 2019.x and 2020.x

  • Scylla Manager 2.1.x and 2.2.x


Related Links


    Bug Fixes
    • Alternator dashboard node-table should use the names of the new dashboards #1134
    • start-grafana.sh looks for the docker IP which breaks on Podman #1145
    • prometheus.yml.template: remove the second manager job #1150

    Notice to users who update Scylla-Manager to version 2.2

    Following Scylla-Monitoring 2.2 ports change, you will need to update scylla_manager_server.yml with the new port.


    Amnon Heiman

    <amnon@scylladb.com>
    unread,
    Jan 18, 2021, 4:20:48 AM1/18/21
    to scylladb-dev, ScyllaDB users

    The Scylla team is pleased to announce the release of Scylla Monitoring Stack 3.6.


    Scylla Monitoring Stack is an open-source stack for monitoring Scylla Enterprise and Scylla Open Source, based on Prometheus and Grafana. Scylla Monitoring Stack 3.6 supports:

    • Scylla Open Source versions 4.1, 4.2 and 4.3

    • Scylla Enterprise versions 2019.x and 2020.x

    • Scylla Manager 2.1.x and 2.2.x


      New in Scylla Monitoring Stack 3.6

      • Adding the Advisor section #1162

      The Advisor is a new concept in Scylla Monitoring. It identifies potential problems and notifies them. The Advisor section in the Overview dashboard has two parts, one for various issues detected, like unprepared statements. The second is an indication of how balanced the system is. When the cluster works properly, all nodes and shards should act the same. An outlier shard could be a result of a problem. For example, if the number of CQL connections per shard varies between shards, it indicates a driver configuration issue.

       

      • Use Loki as data source #1147

      Grafana Loki is a log aggregation system inspired by Prometheus. The monitoring stack will use Loki for alert and metrics generation.  Note that it does not act as a centralized monitoring system. In Scylla Monitoring, Loki gets the traces using rsyslog. Make sure to configure the rsyslog client on the Scylla servers. 

      • Add Scylla Open Source 4.3 dashboards #1144

      • New look to the node table #1097

      The node tables are part of the Datacenter section in the Overview dashboard. The table is now more organized and more informative. 

      This is how it looks like when a node joins the cluster

      • Collapsible rows #973

      Collapsible rows are now used in various places on the dashboard. You can open them for additional information.

      • New Lightweight Transactions (LWT) metrics for the dashboard #936

      LWT involved multiple Paxos messages. New panels in the LWT section now show the number of Paxos messages. This gives an insight into the actual traffic involved in the LWT operations.

      • Easy way to capture the entire dashboard, in one click #248

      At the bottom of each dashboard, there are now two buttons, one to report an issue on the page and another to take a snapshot of the dashboard as a download image file.

      • Support dynamic intervals #957

      Many graphs on the dashboards use a rate interval; some activity measured over a period of time. There has been a long discussion in the Grafana community as to which interval to use for a timescale. 

      In general, when looking at graphs of different time ranges (i.e., last hour vs. last week), i the time rate interval should make sense.

      Grafana 7.2 came with a dynamic interval to solve this issue. You can read more about it here.

      • Grafana: Use UTC by default #1065

      Time shown in graphs is now displayed in UTC time instead of the browser local time.

      • Upgrade to Grafana 7.3.5 #1061

       

      Operational Changes

      • Configure rsyslog on the Scylla hosts. Scylla monitoring uses Loki to generate metrics and alerts from logs. It gets the traces from rsyslog. For the full functionality to work, you need an rsyslog agent running on each of the Scylla machines and to add the scylla monitoring as an rsyslog target.

      • Use docker-compose as an optional replacement for start-all.sh #273

      • A command line option to add Prometheus targets  #1197

       


      Bug Fixes

      • Passing --no-loki got illegal option: --error #1152

      Amnon Heiman

      <amnon@scylladb.com>
      unread,
      Feb 8, 2021, 1:39:42 PM2/8/21
      to scylladb-dev, ScyllaDB users
      The Scylla team announces the release of Scylla Monitoring Stack 3.6.1
      Scylla Monitoring Stack is an open-source stack for monitoring Scylla Enterprise and Scylla Open Source, based on Prometheus and Grafana. Scylla Monitoring Stack 3.6.1 supports:
      • Scylla Open Source versions 4.1, 4.2, 4.3 and 4.4

      • Scylla Enterprise versions 2019.x and 2020.x

      • Scylla Manager 2.1.x and 2.2.x


        Bug Fixes
        • Write latency and write count should not include hints/streaming scheduling group #1265
        • Update all advisor / cql dashboard queries taking into account only the user gerenated queries and not internal ones #1263
        These bug fixes are relevant for Scylla Open-source 4.2, 4.3, 4.4 users and for Sclla enterprise 2020.1 users

        Amnon Heiman

        <amnon@scylladb.com>
        unread,
        Mar 15, 2021, 4:40:06 AM3/15/21
        to scylladb-dev, ScyllaDB users
        The Scylla team announces the release of Scylla Monitoring Stack 3.6.2
        Scylla Monitoring Stack is an open-source stack for monitoring Scylla Enterprise and Scylla Open Source, based on Prometheus and Grafana. Scylla Monitoring Stack 3.6.2 supports:
        • Scylla Open Source versions 4.1, 4.2, 4.3 and 4.4

        • Scylla Enterprise versions 2019.x and 2020.x

        • Scylla Manager 2.1.x and 2.2.x


          Bug Fixes
          • Timeouts and latencies per shards panels are missing #1294
          • Non Token Aware queries for counters - A work around for #804

          Amnon Heiman

          <amnon@scylladb.com>
          unread,
          Mar 22, 2021, 4:42:56 PM3/22/21
          to scylladb-dev, ScyllaDB users
          The Scylla team announces the release of Scylla Monitoring Stack 3.6.3
          Scylla Monitoring Stack is an open-source stack for monitoring Scylla Enterprise and Scylla Open Source, based on Prometheus and Grafana. Scylla Monitoring Stack 3.6.3 supports:
          • Scylla Open Source versions 4.1, 4.2, 4.3 and 4.4

          • Scylla Enterprise versions 2019.x and 2020.x

          • Scylla Manager 2.1.x, 2.2.x and 2.3.x


          New Dashboards
          • Scylla Manager 2.3.x
          Bug Fixes
          • loki container breaks -A option in start-all.sh #1326

          Amnon Heiman

          <amnon@scylladb.com>
          unread,
          Apr 26, 2021, 4:57:40 AM4/26/21
          to scylladb-dev, ScyllaDB users

          The Scylla team is pleased to announce the release of Scylla Monitoring Stack 3.7.


          Scylla Monitoring Stack is an open-source stack for monitoring Scylla Enterprise and Scylla Open Source, based on Prometheus and Grafana. Scylla Monitoring Stack 3.7 supports:

          • Scylla Open Source versions 4.2, 4.3 and 4.4

          • Scylla Enterprise versions 2019.x, 2020.x and 2021.x

          • Scylla Manager 2.2.x, 2.3.x


          Related Links


            Versions updates Scylla Monitoring Stack 3.7

            • set Prometheus version to 2.25.2 #1333

            • Update the Alertmanager plugin to 1.0 #1288

            • Switch the Alertmanager to the new table panels #1071

            New in Scylla Advisor

            • New Advisor feature: more detailed advice. (Learn more about Scylla Advisor here.) 


            New in Scylla Monitoring Stack 3.7

            • Overview dashboard enhancements:

              • Add manager task progress indication to the overview dashboard #1250

            The Manager progress is now part of the header rows, for example, this is how a backup looks like:

            • Hinted handoffs being accumulated/being sent - annotation #1258

            When a node is temporarily down, the updates that would have been sent to it are stored as hints, when the node is up again, those hints are sent. This translates to extra load on other nodes. There are optional annotations for storing and sending hints.

            • Secondary Indexes/Materialized Views background-built - annotation #1257

            When adding a Secondary Index or a Materialized View to an existing table, the new index will be built in the background. This will add extra load on the nodes. You can use the MV annotation to see when a Materialized View or Secondary index is being built.

             

            • Provide an indication of coordinator / replica errors per node #1229

            Visually present error/no error on the node table #1035

            The Node Table, found on the DC section that is part of the overview dashboard, can indicate when there are CQL optimization warnings, and when there are errors on the node.

             

            • CQL Dashboard

              • Add CQL errors to dashboards #1276

            Scylla 4.4 comes with additional  CQL errors. The new CQL Errors panel, found on the CQL dashboard, will show those errors. Please note, that for clarity, only active errors will be shown.

            • Add more info to client table (Scylla Open Source 4.4) #1259

            Scylla Open Source 4.4 adds additional information to the client table found on the CQL dashboard.

            • Update Scylla Manager Dashboard #1180

            The manager dashboard got a facelift. It now shows the last success and last failure of Backup and Repair tasks.

             

            • Panel for Scylla HWLB #907

            Heat Weighted Load Balancing (HWLB) is an optimization mechanism that distributes queries according to the probability a requested value will be in the cache.

             

            Bug Fixes

            • Loki container breaks -A option in start-all.sh #1326

            • Inconsistent use of legends in DC panel #1290

            • Disk space alerts not working due to wrong metrics used #1282

            • When using a private network Loki and Alertmanager do not work #1252

            Reply all
            Reply to author
            Forward
            0 new messages