Announcing Bosun, an alerting system backed by OpenTSDB

697 views
Skip to first unread message

Kyle Brandt

unread,
Nov 10, 2014, 3:41:39 PM11/10/14
to open...@googlegroups.com
Matt Jibson and I have been working on a new monitoring system backed by OpenTSDB for about a year now called Bosun. We have also made a collector called scollector. The main highlight is alerting. You find out more at the following two links if you are interested:


Thanks!
Kyle 

ManOLamancha

unread,
Nov 21, 2014, 8:06:39 PM11/21/14
to open...@googlegroups.com
That looks awesome guys! Do you have any data on how much throughput your setup can handle? Great job! 

Kyle Brandt

unread,
Nov 26, 2014, 12:18:02 PM11/26/14
to open...@googlegroups.com
We through some money at this so we may not be representative (hbase cluster has SSDs, bosun has a crap ton of CPU and Memory). But right now we move 15k datapoints a second through bosun (which bosun can nicely answer for us):

The biggest blocker on scalability is actually that sometimes we load result sets that are too big for the browser (web interface). Should have that fixed in a week or two. How well the alerts scale really depends on your evaluation frequency, and how complicated the alerts (how many opentsdb queries does it make and how much processing does bosun have to do).

Currently everything gets relayed through bosun. This is primarily because we need to track the state of things for a few reasons:

1. For autocompletion etc on the graphing page
2. To trigger unknown when an instance of an alert (which is the alertname and opentsdb's tagset) has no data for the duration of the query (times 2 I think)
3. Bosun handles metadata itself

So there are a few things that I think OpenTSDB can do, but we didn't do because when we started development this stuff was just too rough (also, having it memory on the Bosun server makes it really fast). The other thing that impacts bosun's performance is that all reduction functions are done by bosun since OpenTSDB doesn't have these. So basically bosun has to slurp in all the data for most of the things it does since OpenTSDB doesn't support these operations natively. None of this is actually a real problem for us at our scale currently, but I thought you might find it interesting.

Also be sure to checkout scollector. It doesn't have to be used with Bosun https://github.com/bosun-monitor/scollector . It is a single binary so no dependencies, and also Windows is just as important as Linux and scollector runs natively on Windows as well. So I think there are some advantages there over tcollector.

Keep working away at OpenTSDB, bosun depends on it! :-)

Best,
Kyle

Peter Speybrouck

unread,
Nov 27, 2014, 8:40:08 AM11/27/14
to open...@googlegroups.com
Kyle,

That scollector seems interesting.
How much work would it be to add an option to send data to influxDB?
InfluxDB also uses a HTTP api so I guess that part can easily be adapted. However, since the tags are currently not implemented (yet), I suppose the metrics need to be adapted as well to get around this.

Peter

Kyle Brandt

unread,
Dec 1, 2014, 7:52:49 PM12/1/14
to open...@googlegroups.com
Bosun ties into OpenTSDB in particular in few ways:

 - The query function q(...) expects an OpenTSDB query
 - Alerts are instantiated based on Tag Sets
 - Alerts scale well because you define alerts on things like host=*, and get back a time series tagged by each host (which can then be an alert instance).

So, you could extend Bosun to use other storage backends as long as they support tag sets in a similar way. As far the opentsdb query function q(), you could just make an alternate like influxq() or something. If someone does something like this I don't think Matt or I would be against merging it, but we are not likely to do the work ourselves.

Best,
Kyle

ManOLamancha

unread,
Feb 11, 2015, 12:40:25 AM2/11/15
to open...@googlegroups.com
Hey Matt & Kyle, they're interested in your work on Bosun for HBaseCon 2015 if you'd like to present. Email me if you want the details or sign up at http://www.hbasecon.com/?elq=e1529a8a73f0413989cdab424ae57eb3&elqCampaignId=792. Thanks!

Kyle Brandt

unread,
Mar 14, 2015, 2:03:16 PM3/14/15
to open...@googlegroups.com
I'm not sure what sort of talk we would since honestly we don't have a great handle on the HBase side. Basically using cloudera on CentOS with a few additions: Snapshotting and some basic memory tuning. So I imagine the cluster isn't even optimally configured. If someone gave a talk on tuning HBase for OpenTSDB I would like to watch it :-)

Bryan Hernandez

unread,
Mar 14, 2015, 3:23:27 PM3/14/15
to Kyle Brandt, open...@googlegroups.com
"If someone gave a talk on tuning HBase for OpenTSDB I would like to watch it :-)"  I'll second that!

Best,

Bryan

Yarden Bar

unread,
Mar 22, 2015, 12:09:57 PM3/22/15
to open...@googlegroups.com
Hey Matt & Kyle,
As I didn't find a designated Bosun mailing list or google group, I thought that this post would be the most relevant for my questions.

In the backends section of Bosun configuration, you mentioned that Elasticsearch can be used as backend.
  • Does that mean that Bosun should populate ES indices?
  • I've read 'scollector' documentation, but didn't understood what type of metrics it can collect from elasticsearch?
  • How can I use an Elasticsearch query to feed Bosun alert? (query like "the count of 404 in the last hour" with thresholds for warning and critical states).
On another topic, can you recommend on an IRC/google-group/mailing-list for Bosun-non-openTSDB questions?
Sorry if I misplaced this post, but I really didn't find any other place.

Thanks in advance for your answers.
Yarden

Kyle Brandt

unread,
Mar 23, 2015, 4:19:44 PM3/23/15
to open...@googlegroups.com
#bosun-monitor on irc.freenode.org is a bridge to our slack room. If you want to email me or twitter direct message me your email I will authorize you for the room. I also follow the bosun tags on serverfault and stackoverflow, finally we currently don't mind questions as issues in github.
  • Does that mean that Bosun should populate ES indices?
No, bosun it just looking for indexes with the default logstash naming scheme, name-date where the date is in the format of time.Parse("2006.01.02", date)
  • I've read 'scollector' documentation, but didn't understood what type of metrics it can collect from elasticsearch?
A lot, but you have to read the code in cmd/bsoun/collectors/elasticsearch.go . 
  • How can I use an Elasticsearch query to feed Bosun alert? (query like "the count of 404 in the last hour" with thresholds for warning and critical states).
See the lscount() and lsstat() functions under configuration in http://bosun.org

One other little tidbit - elastic querying is currently broken :-/ I hoped to work on it this weekend but didn't, I'll shoot for tomorrow morning.

Divya Nagaraj

unread,
Mar 16, 2018, 3:44:20 AM3/16/18
to OpenTSDB
Hi Kyle,

Need some information on bosun performance.

  • how many alerts a single instance of Bosun can handle at a given time, and at what run frequency.

Please provide a way to measure this. and how bosun specific metrics helps here to identify bosun performance.

Thanks in advance.

---------------------------------------------------------------------------------------------------
Reply all
Reply to author
Forward
0 new messages