Horizon Randomly Stops Reading

59 views
Skip to first unread message

Matthew Slane

unread,
Feb 24, 2015, 10:27:19 AM2/24/15
to skyli...@googlegroups.com
Hi,

We've put up a test vm to experiment with Skyline.  We've discovered that after a while Horizon stops receiving metrics and the redis collection becomes empty.  Restarting horizon fixes the issue.  We're improving our metrics and information to see if we can get to the bottom of what's happening, but in the meantime does anyone have any similar experiences and a fix that might save us some work?

Thanks

Matt

earthgecko

unread,
Feb 24, 2015, 11:50:47 AM2/24/15
to skyli...@googlegroups.com
Hi Matt

We have had this issue since the beginning with horizon (and analyzer connection timing out).  We handle this with a couple of scripts parsing the horizon and analyzer logs and restarting the services as appropriate when we encounter the trigger conditions.

I have created some gists to highlight these, they are run via cron every minute - the gists ARE examples only but they define the patterns.  We also use monit for skyline services as well and these handle race conditions with monit - they are rhel base, so please note service/init differences etc.

Hope these help, they have worked for us since 20140114.

horizon - https://gist.github.com/earthgecko/5588dc17c8ebe2a7c082
analyzer - https://gist.github.com/earthgecko/ec181cc95dbfce9a9bc6

Abe Stanway

unread,
Feb 24, 2015, 12:03:33 PM2/24/15
to earthgecko, skyli...@googlegroups.com
There is reconnection logic in Horizon - is it a bug in that? See line 169 in listen.py.

Or does the listen.py process quit entirely?

Sent from not my laptop
--
You received this message because you are subscribed to the Google Groups "skyline-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to skyline-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michał Łowicki

unread,
Mar 16, 2015, 11:12:20 AM3/16/15
to skyli...@googlegroups.com, gary....@of-networks.co.uk
There was a bug when listener losts connection to f.ex. carbon-relay. Fixed on https://github.com/earthgecko/skyline/commit/2045bdd479cd4d2b3415f631adb603f48270c441.

Made pull requests on https://github.com/etsy/skyline/pull/115 but don't think will be handled soon..
Reply all
Reply to author
Forward
0 new messages