[design-review.ganeti] push by ultrot...@google.com - Add Status reporting system design... on 2012-10-11 14:07 GMT

1 view
Skip to first unread message

gan...@googlecode.com

unread,
Oct 11, 2012, 10:07:57 AM10/11/12
to ganeti-...@googlegroups.com
Revision: 6e945cb67f9f
Author: Guido Trotter <ultr...@google.com>
Date: Thu Oct 11 07:06:32 2012
Log: Add Status reporting system design

Signed-off-by: Guido Trotter <ultr...@google.com>

http://code.google.com/p/ganeti/source/detail?r=6e945cb67f9f&repo=design-review

Added:
/doc/design-status.rst

=======================================
--- /dev/null
+++ /doc/design-status.rst Thu Oct 11 07:06:32 2012
@@ -0,0 +1,224 @@
+====================
+Ganeti status report
+====================
+
+.. contents:: :depth: 4
+
+This is a design document detailing the implementation of a Ganeti
+status report system, that can be queried by a monitoring system to
+calculate health information for a Ganeti cluster.
+
+Current state and shortcomings
+==============================
+
+There is currently no monitoring support in Ganeti. While we don't want
+to build something like Nagios or Pacemaker as part of Ganeti it would
+be useful if such tools could easily extract information from a Ganeti
+machine in order to take actions (example actions include logging an
+outage for future reporting or alerting a person or system about it).
+
+Proposed changes
+================
+
+Each Ganeti node should export a status page that can be queried by a
+monitoring system. Such status page will be exported on a network port
+and will be encoded in JSON (simple text) over HTTP.
+
+The choice of json is obvious as we already depend on it in Ganeti and
+thus we don't need to add extra libraries to use it, as opposed to what
+would happen for XML or some other markup format.
+
+Location of status report
+-------------------------
+
+The status report will be available from all nodes, and be concerned for
+all node-local resources. This allows more real-time information to be
+available, at the cost of querying all nodes.
+
+Information reported
+--------------------
+
+The status report system will report on the following basic information:
+
+- Instances status
+- Instance disks status
+- Node instance storage status
+- Ganeti daemons status, CPU usage, memory footprint
+- Hypervisor resources report (memory, CPU, network interfaces)
+- Node OS resources report (memory, CPU, network interfaces)
+- Information from a plugin system
+
+Instances status
+++++++++++++++++
+
+At the moment each node knows which instances are running on it, which
+instances it is primary for, but not the cause why an instance might not
+be running. On the other hand we don't want to distribute full instance
+"admin" status information to all nodes, because of the performance
+impact this would have. As such we propose the following:
+
+- Instance shutdown and startup RPC operations (and other operations
+ that affect the instance status) will be modified to report their
+ activity to the status system on the local node. The status system can
+ then use this information to know if a change in status has been
+ triggered by Ganeti or by a different issue (eg. crash)
+- Operations that affect the instance status will also pass a "reason"
+ to the status system, when available, that the status system can
+ export as part of its report. This allows to distinguish an admin
+ request from an end user request, supposing the relevant reason is
+ populated correctly. By default Ganeti will just use what it knows
+ about the source of the request: for example a cli shutdown operation
+ will have "cli:shutdown" as a reason, a cli failover operation will
+ have "cli:failover". Operations coming from the remote API will use
+ "rapi" instead of "cli". Of course it will be possible to set
+ arbitrary reasons that Ganeti could not know about (eg.
+ "scheduled-maintenance")
+
+Instance Disk status
+++++++++++++++++++++
+
+While Ganeti is aware of the details of the instance disks, depending on
+its type (drbd, rbd, plain) it wouldn't be good to export only those to
+the monitoring system. As such the disk status will be split into a
+"generic" disk status (healthy or error) while more details will be
+provided by the relevant storage driver.
+
+For error conditions the storage type driver can specify futher
+information. For example it can report that a drbd is disconnected, or
+that a backend instance disk is missing.
+
+Ganeti daemons status
++++++++++++++++++++++
+
+Ganeti will report what information it has about its own daemons: this
+includes memory usage, uptime, CPU usage. This should allow identifying
+eventual problems with the Ganeti system itself: for example memory
+leaks, crashes and high resource utilization should be evident by
+analyzing this information.
+
+Ganeti daemons will also be able to export extra internal information to
+the status reporting, through the plugin system (see below).
+
+Node instance storage status
+++++++++++++++++++++++++++++
+
+The node will also be reporting on all storage typs it knows about for
+the current node (this is right now hardcoded to the enabled storage
+types, and in the future tied to the enabled storage pools for the
+nodegroup). For this kind of information also we will report both a
+generic health status (healthy or error) for each type of storage, and
+some more generic statistics (free space, used space, total visible
+space). Further to that type specific information can be exported: for
+example in case of error the nature of the error can be disclosed as a
+type specific information. Examples of these are "backend pv
+unavailable" for lvm storage, "unreachable" for network based storage or
+"filesystem error" for filesystem based implementations.
+
+Hypervisor resources report
++++++++++++++++++++++++++++
+
+Each hypervisor has a view of system resources that sometimes is
+different than the one the OS sees (for example in Xen the Node os,
+running as Dom0, has access to only part of those resources). In this
+section we'll report all information we can in a "non hypervisor
+specific" way. Each hypervisor can then add extra specific information
+that are not generic enough be abstracted.
+
+Node OS resources report
+++++++++++++++++++++++++
+
+Since Ganeti assumes it's running on Linux, it's useful to export some
+basic information as seen by the host system. This includes number and
+status of CPUs, memory, filesystems and network intefaces.
+
+Note that we won't go into any hardware specific details (eg. querying a
+node RAID is outside the scope of this, and can be implemented as a
+plugin) but we can easily just report the information above, since it's
+standard enough across all systems.
+
+Plugin system
++++++++++++++
+
+The monitoring system will be equipped with a plugin system, that can
+export specific local information through it. The plugin system will be
+in the format of either scripts whose output will be provided, plain
+text files which will be inserted into the report or local unix or
+network sockets from which the information has to be read. This should
+allow most flexibility for implementing an efficient system, while being
+able to keep it as simple as possible.
+
+The plugin system is expected to be used by local installations to
+export any installation specific information that they want to be
+monitored. Examples are countless and include information on specific
+hardware, system configuration, etc.
+
+
+Format of the query
+-------------------
+
+The query will be an HTTP GET request on a particular port. At the
+beginning it will only be possible to query the full status report.
+
+
+Format of the report
+--------------------
+
+TBD (this part needs to be completed with the format of the json and the
+types of the various variables exported, as they get evaluated and
+decided)
+
+
+Mode of operation
+-----------------
+
+In order to be able to report information fast the status daemon will
+keep an in-memory or on-disk cache of the status, which will be returned
+when queries are made. The status system will then periodically check
+resources to make sure the status is up to date.
+
+Different parts of the report will be queried at different speeds. These
+will depend on:
+- how often they vary (or we expect them to vary)
+- how fast they are to query
+- how important their freshness is
+
+Of course the last parameter is installation specific, and while we'll
+try to have defaults it will be configurable. The first two instead we
+can use adaptively to query a certain resource faster or slower
+depending on those two parameters.
+
+
+Implementation place
+--------------------
+
+The status daemon will be implemented as a standalone haskell daemon. In
+the future it should be easy to merge multiple daemons into one with
+multiple entry points, should we find out it saves resources and doesn't
+impact functionality.
+
+Future work
+===========
+
+As a future step it can be useful to "centralize" all this reporting
+data on a single place. This for example can be just the master node, or
+all the master candidates. We will evaluate doing this after the first
+node-local version has been developed and tested.
+
+Another possible change is replacing the "read-only" RPCs with queries
+to the status system, thus having only one way of collecting information
+from the nodes from a monitoring system and for Ganeti itself.
+
+One extra feature we may need is a way to query for only sub-parts of
+the report (eg. instances status only). This can be done by passing
+arguments to the HTTP GET, which will be defined when we get to this
+funtionality.
+
+Finally the :doc:`autorepair system design <design-autorepair>`. system
+(see its design) can be expanded to use the status system as a source of
+information to decide which repairs it can perform.
+
+.. vim: set textwidth=72 :
+.. Local Variables:
+.. mode: rst
+.. fill-column: 72
+.. End:
Reply all
Reply to author
Forward
0 new messages