Adding a plugin for BOSH Health Monitor of score

mei...@gmail.com

unread,

Jan 10, 2015, 3:33:40 AM1/10/15

to bosh...@cloudfoundry.org

Hi

I'm part of open source project score

http://www.openscore.io

: And i have an idea on integrating score with BOSH

score is a workflow engine, that eases the creation of process based orchestrations and automation

.We also introduced slang (score-language) YAML based DSL to build your flows on top of score

I think that maybe adding a score plugin to the BOSH health Monitor, can provide a way

for BOSH users to create custom and more elaborate workflows of monitoring and remediation

... similar to the BOSH Resurrector

? WDYT

? you see value in this

Meir Wahnon

Dmitriy Kalinin

unread,

Jan 12, 2015, 12:51:18 PM1/12/15

to bosh...@cloudfoundry.org, mei...@gmail.com

It sounds interesting, though hard to tell without listing out more concrete cases. Do you guys have some remediation steps in mind?

Meir Wahnon

unread,

Jan 12, 2015, 3:11:45 PM1/12/15

to Dmitriy Kalinin, bosh...@cloudfoundry.org

Hi Dmitriy,

Thanks for your replay.

Sure we have some use-cases:

Removing old/unused docker images from a DockerHost
Rollback app to previous version
Rollback app configuration to default
Restarting the service
Restarting server
Scaling up
DB purging/migrating data

We have seen users of our commercial product (Operations Orchestration) use this pattern .

Dmitriy Kalinin

unread,

Jan 12, 2015, 5:51:27 PM1/12/15

to Meir Wahnon, bosh...@cloudfoundry.org

Here are some quick thoughts for each one of the use cases. HM is mostly concerned with tasks that happen in a background that are more or less regular operations e.g. bring back VM that was deleted / or just disappeared.

Removing old/unused docker images from a DockerHost

> this sounds like something that would be handled by an individual release e.g. docker-release that is responsible for docker daemon

Rollback app to previous version

> this is a core feature of bosh which is controlled by `bosh deploy` mechanism. not sure if it makes sense to do this action as a result of certain failures

Rollback app configuration to default

> not sure what "default configuration" means. when an operator deploys certain release with BOSH it is configured as operator has requested

Restarting the service

> already automatically happens inside VMs with the help of monit. each VM has different processes monitored by monit

Restarting server

> restarting a server does not typically solve much problems in BOSH land, since we monit is already trying to restart specific processes. i could see this potentially being useful but it would be more easily implemented (prolly few lines) in the existing resurrector code.

Scaling up

> this is an interesting one though current metrics you get cpu, ram, etc which are not necessarily good scaling factors

DB purging/migrating data

> this would typically happen as part of the specific releases e.g. run migrations, etc.

I'll think a bit more about each one of those cases.

Meir Wahnon

unread,

Jan 13, 2015, 4:30:23 PM1/13/15

to Dmitriy Kalinin, bosh...@cloudfoundry.org

You raised very good points,

Rollback app configuration to default

> not sure what "default configuration" means. when an operator deploys certain release with BOSH it is configured as operator has requested

-> i meant maybe the flow can restore the deployment Environment variables to previous deployment values

Scaling up

> this is an interesting one though current metrics you get cpu, ram, etc which are not necessarily good scaling factors

-> which metrics you do find important for scaling? maybe http request latency?

other possibility for integration :

1. Extend drain scripts

2. Extend BOSH Health Monitor with more elaborate health checks?

WDYT?

mei...@gmail.com

unread,

Mar 2, 2015, 12:21:05 PM3/2/15

to bosh...@cloudfoundry.org, dkal...@pivotal.io, mei...@gmail.com

hii

any response on that?

do you think there is value in such integration ?

בתאריך יום שלישי, 13 בינואר 2015 בשעה 23:30:23 UTC+2, מאת Meir Wahnon:

Dmitriy Kalinin

unread,

Mar 3, 2015, 6:38:47 PM3/3/15

to bosh...@cloudfoundry.org, dkal...@pivotal.io, mei...@gmail.com

Sorry thread got lost in the pile. inline...

On Monday, March 2, 2015 at 9:21:05 AM UTC-8, mei...@gmail.com wrote:

hii

any response on that?
do you think there is value in such integration ?

בתאריך יום שלישי, 13 בינואר 2015 בשעה 23:30:23 UTC+2, מאת Meir Wahnon:
You raised very good points,

Rollback app configuration to default
> not sure what "default configuration" means. when an operator deploys certain release with BOSH it is configured as operator has requested
-> i meant maybe the flow can restore the deployment Environment variables to previous deployment values

BOSH does not have a notion of env vars. It deals with conf files which are already on the VMs and get rolled back if operator chooses so.

Scaling up
> this is an interesting one though current metrics you get cpu, ram, etc which are not necessarily good scaling factors
-> which metrics you do find important for scaling? maybe http request latency?

I would imagine it depends on kinds of applications managed. For example for apps that are pushed to CF, there is an auto scaling service that allows users to set up a time schedule. I dont remember if it deals with HTTP requests. Since BOSH itself does not know what kind of software it deployed there is currently no way to get to more details metrics. I could see in the future if BOSH polled some kind of API to determine what should be optimal capacity and adjust the deployments but I think we would want to see how this related to the Diego [1] first.

[1] https://github.com/cloudfoundry-incubator/diego-release

other possibility for integration :
1. Extend drain scripts
2. Extend BOSH Health Monitor with more elaborate health checks?

I think it's hard to predict if such integrations make sense with score at this point until someone actually runs BOSH operated env and openscore side by side for some time and has specific problems to solve. A lot of assumptions shift when BOSH is involved since it hides automation of typical tasks.

Meir Wahnon

unread,

Mar 4, 2015, 11:43:30 AM3/4/15

to Dmitriy Kalinin, bosh...@cloudfoundry.org

Hi Dmitriy,

So maybe a different approach,

Is there a need from CF users to integrate it with other tools ?

For example , for post/post deployment hooks? (using a notification API after app deployment)...

Dmitriy Kalinin

unread,

Mar 4, 2015, 1:40:06 PM3/4/15

to bosh...@cloudfoundry.org, dkal...@pivotal.io, mei...@gmail.com

That would be a question to ask on vcap-dev and see if anyone is interested.

Reply all

Reply to author

Forward