Use of spec.index in jobs

Mike Youngstrom

unread,

Mar 18, 2014, 1:44:43 PM3/18/14

to vcap...@cloudfoundry.org

We deploy our cloud foundry deployment as several individual bosh deployments. This is mostly because we have to deploy to multiple vsphere datacenters and there is a one to one relationship restriction between Bosh director and vsphere datacenter.

This causes us problems when cloud foundry uses the "spec.index" to configure a job since each bosh deployment will reset spec.index to 0.

spec.index is used in several places in cf-release today:

* In all syslog_forwarder.conf.erb files.

* In the Cloud Controller: https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/monit#L19

* In DEA Logging agent: https://github.com/cloudfoundry/cf-release/blob/master/jobs/dea_logging_agent/templates/dea_logging_agent.json.erb#L2

* In Loggregator: https://github.com/cloudfoundry/cf-release/blob/master/jobs/loggregator/templates/loggregator.json.erb#L6

We currently survive by adding tons of custom "offset" properties to these jobs then incrementing the index by the offset for each of our deployments similar to what go router does today:

https://github.com/cloudfoundry/cf-release/blob/master/jobs/gorouter/templates/gorouter.yml.erb#L33

I've attempted to submit a PR that added offset to a component in the past but it was rejected:

https://github.com/cloudfoundry/cf-release/pull/131

Though the justification for that PR was to fix collector indexing, recent use of "spec.index" is impacting actual functionality in Cloud Foundry more and more. If we submit PRs today that add offsets to components would they get accepted? And would the CF team consider officially supporting offsets in future use of spec.index so we don't have to search for spec.index surprises when we merge? Or is there any other recommendations to how we can handle this issue?

thanks,

Mike

James Bayer

unread,

Mar 19, 2014, 7:38:58 AM3/19/14

to vcap...@cloudfoundry.org

Mike I'm Ooo the rest of this week, but the VMware vsphere CPI is about to support multiple datacenter/cluster options from what I understand, I would like to have a solution to this problem and if it's not addressed while I'm out, we can revisit it next week

Sent from my mobile

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.

Mike Youngstrom

unread,

Mar 19, 2014, 12:13:19 PM3/19/14

to vcap...@cloudfoundry.org

Thanks James. I might add that in addition to the vsphere CPI forcing us to have multiple deployments we actually like having multiple deployments for another reason.

For whatever reason with Bosh we've frequently run into situations where we have had to delete a deployment in order to fix one problem or another. Persistent disks not attaching after a stemcell upgrade, our vsphere administrators deleting stemcell snapshots images, Bosh director upgrades going bad when upgrading for one reason or another, etc.

In such situations it is nice to know that we can delete a deployment and apps running on DEAs with routers in another deployment won't experience down time.

So, although the vsphere CPI not supporting multiple datacenters is the main reason we have multiple deployments we've actually grown somewhat fond of the practice. :) That said if cf-release wishes to require all CF installations have all components in a single deployment we will probably follow suite even though we think there is some value in breaking a CF installation into multiple bosh deployments.

Mike

Iwasaki Yudai

unread,

Mar 19, 2014, 4:20:55 PM3/19/14

to vcap...@cloudfoundry.org

There is another problem on using indices in templates.

The CCng job template uses indices to choose one server(instance) to run the db:migrate task [1]. However, when the CCng job template is used by multiple jobs, the migration task is executed more than once. You may want to create two CCng jobs, for example, to deploy CCng on two zones, like ccng_a and ccng_b. In this case, both ccng_a/0 and ccng_b/0 run the migration task.

This causes no problem for now, as the db:migration has no side effects even if it is executed more than once.

However, it seems dangerous to use indices to ensure that a procedure is executed on a single server . Somebody may inject a process with side effects in the feature.

I'm wondering if BOSH errand jobs would solve this kind of problems.

[1] https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/templates/cloud_controller_ng_ctl.erb#L73-L80

Mike Youngstrom

unread,

Mar 19, 2014, 4:27:16 PM3/19/14

to vcap...@cloudfoundry.org

Right, we have solved the CC migration issue along with all the other spec.index issues by applying an offset and updating the job template to use that offset. So the CC job in one zone has index 0 and the CC in the other zone has index 10. We patch the cloud controller startup script to use the offset (https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/templates/cloud_controller_ng_ctl.erb#L73).

That has worked fine for us so far. But it is error prone to go into all the jobs, find their use of 'spec.index' and patch them all to use an offset.

Mike

Mike Youngstrom

unread,

Mar 27, 2014, 1:50:29 PM3/27/14

to vcap...@cloudfoundry.org

Bump James or team. I know you're busy and back from vacation. It'd be great if this thread could get some attention.

Mike

On Wed, Mar 19, 2014 at 5:38 AM, James Bayer <jba...@gopivotal.com> wrote:

Mark Kropf

unread,

Mar 31, 2014, 5:33:52 PM3/31/14

to vcap...@cloudfoundry.org

Mike,

I just synced up again with the team on this and the guidance Maria provided in the PR is still valid. We would like to not bring the notion of more than one deployment into the job. Would the tagging solution proposed in the PR work for your use case?

Mark

On Thursday, March 27, 2014 10:50:29 AM UTC-7, Mike Youngstrom wrote:

Bump James or team. I know you're busy and back from vacation. It'd be great if this thread could get some attention.

Mike

On Wed, Mar 19, 2014 at 5:38 AM, James Bayer wrote:

Mike I'm Ooo the rest of this week, but the VMware vsphere CPI is about to support multiple datacenter/cluster options from what I understand, I would like to have a solution to this problem and if it's not addressed while I'm out, we can revisit it next week

Sent from my mobile

On Mar 18, 2014, at 10:44 AM, Mike Youngstrom wrote:

We deploy our cloud foundry deployment as several individual bosh deployments. This is mostly because we have to deploy to multiple vsphere datacenters and there is a one to one relationship restriction between Bosh director and vsphere datacenter.

This causes us problems when cloud foundry uses the "spec.index" to configure a job since each bosh deployment will reset spec.index to 0.

spec.index is used in several places in cf-release today:

* In all syslog_forwarder.conf.erb files.
* In the Cloud Controller: https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/monit#L19

* In DEA Logging agent: https://github.com/cloudfoundry/cf-release/blob/master/jobs/dea_logging_agent/templates/dea_logging_agent.json.erb#L2

* In Loggregator: https://github.com/cloudfoundry/cf-release/blob/master/jobs/loggregator/templates/loggregator.json.erb#L6

We currently survive by adding tons of custom "offset" properties to these jobs then incrementing the index by the offset for each of our deployments similar to what go router does today:

https://github.com/cloudfoundry/cf-release/blob/master/jobs/gorouter/templates/gorouter.yml.erb#L33

I've attempted to submit a PR that added offset to a component in the past but it was rejected:
https://github.com/cloudfoundry/cf-release/pull/131

Though the justification for that PR was to fix collector indexing, recent use of "spec.index" is impacting actual functionality in Cloud Foundry more and more. If we submit PRs today that add offsets to components would they get accepted? And would the CF team consider officially supporting offsets in future use of spec.index so we don't have to search for spec.index surprises when we merge? Or is there any other recommendations to how we can handle this issue?

thanks,
Mike

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+unsubscribe@cloudfoundry.org.

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+unsubscribe@cloudfoundry.org.

Mike Youngstrom

unread,

Mar 31, 2014, 5:42:42 PM3/31/14

to vcap...@cloudfoundry.org

The tagging solution only applied to collector and is not the main point of this email. The bigger problem now is that spec.index is being used in more places in cf-release.

For example, if I have 2 deployments I will have 2 cloud_controller_ng 0 components. Each of them will execute db migrations with logic like this: https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/monit#L19

Another example, Loggregator uses spec.index. Will loggegator work if I have 2 deployments each with a loggregator:0 working together in the same cloud foundry system? https://github.com/cloudfoundry/cf-release/blob/master/jobs/loggregator/templates/loggregator.json.erb#L8

spec.index is being used more and more to uniquely identify a component in an entire cloud foundry system. But it actually only uniquely identifies a component in a bosh deployment. Our cloud foundry system consists of multiple deployments. Do you see the problem?

Mike

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.

Mike Youngstrom

unread,

Mar 31, 2014, 6:12:53 PM3/31/14

to vcap...@cloudfoundry.org

Perhaps if the jobs in cf-release could reference the deployment name perhaps to help create unique identifiers where needed that could help in some cases. If loggregator were enhanced to support a string for its index you could change:

https://github.com/cloudfoundry/cf-release/blob/master/jobs/loggregator/templates/loggregator.json.erb#L8

To something like (not sure how to get the deployment name in a template):

"Index": <%= deployment.name %>:<%= spec.index %>,

Then that would uniquely identify this component in an entire multi-deployment system. That would work for us.

That wouldn't work for the CC db migration solution though. Perhaps simply adding a flag to the job telling that deployment to run the migration in addition to spec.index == 0 would work for that.

There are several options. Offset is the one we hack in now since it works in all cases spec.index is referenced.

What I'm mainly looking for is:

1. Acknowledgment that our multi-deployment use cases are valid.

2. A strategy to support multi-deployment systems with cf-release.

Mike

Johannes Tuchscherer

unread,

Apr 3, 2014, 1:12:29 PM4/3/14

to vcap...@cloudfoundry.org

We could change the usage spec.index in the loggregator jobs and use either a unique identifier containing the deployment name or an offset, but there are also the other jobs (uaa, login, etcd, dea) that use the spec.index in the same way as loggregator. I didn't dig too deep into the code of each component, but it looks like the index - in most cases - is used to announce the component to nats. I am afraid that changing the format of the announce message would break downstream components that subscribe to this message

Also, taking the spec.index out of the syslog_forwarder is certainly possible, but not desirable. We use it to make it easy to identify the unique source of a syslog message in your syslog drain. You could achieve the same goal with the IP address that is part of the syslog message, but it is just not as convenient.

Unfortunately, I can't answer whether your deployment use-case is valid or not. I have very little vSphere experience. I think the bosh team might have an answer for that.

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.

Mike Youngstrom

unread,

Apr 3, 2014, 3:52:35 PM4/3/14

to vcap...@cloudfoundry.org

Thanks for the response Johannes. It is good to know that loggregator could work with something other than a spec.index.

And, I see the value in using the spec.index. We, however, are stuck in a situation where the vsphere cpi doesn't support cross datacenter deploys. We, need cross datacenter deploys. cf-release, because of using spec.index, doesn't support cross datacenter deploys.

Since, this thread is absent of ideas let me propose one.

Adding offsets to the spec.index works for us and wouldn't impact single deployment installations. Will the runtime team support cf-release PRs that add offsets to spec.index references?

If not, is there an alternative solution?

I started a thread [1] on bosh-users about the vsphere cpi, multiple datacenters and multiple deployments to gather input from that side of the fence.

Mike

[1] https://groups.google.com/a/cloudfoundry.org/d/msg/bosh-users/fT5r93_IRpk/RZ3M3wdRBGsJ

Mike Youngstrom

unread,

Apr 8, 2014, 3:00:22 PM4/8/14

to vcap...@cloudfoundry.org

Recently had a live discussion with the runtime team and this issue has been resolved. It turns out that cf-release uses the job name + index to uniquely identify components and not index alone. In our multiple deployments we were using the same job name across deployments which caused the issues.

I was unaware that spiff based deployments have essentially the same issue since each zone must have a unique job and hence spec.index reset. The difference is multi zone deployments in a single bosh deploy force you to have different job names. Where as our multi-bosh deployment situation did not.

Our solution is to use unique job names for each of our deployments to our different vcenters and that should match what spiff based deployments are doing for multi-zone deployments.

Looking back at the messages I now see how the Pivotal team and I were not connecting and I feel foolish. :) Hope this thread will be useful to others.

Thanks,

Mike

Dr Nic Williams

unread,

Apr 11, 2014, 1:45:27 PM4/11/14

to vcap...@cloudfoundry.org

Thanks Mike for summarizing the workaround!

--

Dr Nic Williams

Stark & Wayne LLC - consultancy for Cloud Foundry users

http://drnicwilliams.com

http://starkandwayne.com

cell +1 (415) 860-2185

twitter @drnic

Reply all

Reply to author

Forward