Diego Update

406 views
Skip to first unread message

Onsi Fakhouri

unread,
Apr 1, 2015, 9:01:24 PM4/1/15
to vcap...@cloudfoundry.org

Hey vcap-dev,

Diego’s march towards production-readiness continues. I wanted to send a quick update to the list with some resources and with some information about some upcoming features that we’ll be adding. Apologies for the length of this e-mail!

Diego Resources

We’ve created github.com/cloudfoundry-incubator/diego. This is a simple landing page that directs you to various Diego-related resources. We ask that any general Diego-related issues be opened on this repo.

In particular, I’d like to highlight two things:

  • The Migration Guide includes details around migrating from the DEAs to Diego and I’d recommend any and all CF operators take a look through this.
  • The Docker Support Notes describe how Diego implements support in Docker

In terms of timelines we have a number of features still to go to close the gap between Diego and the DEAs. More importantly we are working through a suite of performance tests to validate that Diego will scale to the initial targets we have in mind (we hope to support ~400 Cells at first and to have a vision for hitting the low 1000s of Cells). We anticipate that the results of these performance tests will lead us to make several breaking changes and would like these to land before calling Diego ready for production. As that time approaches we will let the list know. For now Diego remains in beta.

During the beta period we will not be making strong uptime guarantees during a Diego deploy. In fact some deploys may require manual intervention. Diego compeletely supports downtimeless rolling deploys at this time, however we’re reserving the right to break backward compatibility instead of burdening the cost of making all our changes backward compatible during the beta period.

Upcoming Features

These are likely to land during the beta period.

SSH

Diego does not currently support the cf files API. We plan to resolve this by broadening the scope of access to the containers and giving developers the ability to have full SSH access into running containers. Along with this will come the ability to:

  • get a console session on a running instance
  • do local port-forwarding
  • scp files out of containers

If there is interest I would be happy to provide the architectural details in a separate e-mail. And, yes, we plan on making SSH access configurable (by space & application) so that operators can disable it if they desire. If it proves feasible we are also considering enabling/disabling SSH access at a finer grained level (e.g. allow SCP but not console access).

There are stories in Diego’s tracker backlog. Search for the ssh label.

Placement Constraints

We plan on extending Diego and the CC API to support Placement Constraints. The idea here is relatively simple: individual Diego cells can be assigned arbitrary tags and users can specify a PlacementConstraint (simply a set of tags) on their applications. Diego will ensure that only Cells that have the tags associated with the application’s PlacementConstraint will host the application.

We plan on modelling this like the ApplicationSecurityGroups are modelled. PlacementConstraints are specified by a CF admin on a per-space basis. We will support different PlacementConstraints for staging vs running. We will allow setting a blanket default PlacementConstraint that will apply when none is specified.

For our first iteration we will only support PlacementConstraints that specify tags that must be present on a host Cell. One can imagine extending this to specify tags that are preferred or that should be disallowed.

If you’d like to learn more - there are stories in Diego’s tracker backlog. Search for the +placement-constraints label.

Thanks for reading!  As always, feedback is welcome :)

Onsi


Etourneau Gwenn

unread,
Apr 1, 2015, 10:00:04 PM4/1/15
to vcap...@cloudfoundry.org, ofak...@pivotal.io
Hi Onsi,

I am interesting in the ssh design can you send me the design notes ?

Placement pools constraint looks very interesting too, so that's mean the placement pool feature will no be implemented for DEA ? As diego will be the default I think in some time for CF.


Thanks

shashidhara td

unread,
Apr 2, 2015, 12:19:15 AM4/2/15
to vcap...@cloudfoundry.org, ofak...@pivotal.io

Hi onsi,
  I too am interested in the design of ssh feature. Can you share it please.

Thanks
Shashi

--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/2102f87e-9922-4057-8fba-eda37d858b4f%40cloudfoundry.org.

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.

James Bayer

unread,
Apr 2, 2015, 12:36:40 AM4/2/15
to vcap...@cloudfoundry.org, Onsi Fakhouri
placement pools will not be implemented in DEAs, just diego.


To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.



--
Thank you,

James Bayer

Etourneau Gwenn

unread,
Apr 2, 2015, 1:12:10 AM4/2/15
to vcap...@cloudfoundry.org, Onsi Fakhouri
Thanks James :)

--
You received this message because you are subscribed to a topic in the Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/CAB%3Dt-sXw9ResevYrJw3T0ymrArXmmY3TLHKTr_CpJ7SJCoX53A%40mail.gmail.com.

Alberto A. Flores

unread,
Apr 2, 2015, 1:43:18 AM4/2/15
to vcap...@cloudfoundry.org, Onsi Fakhouri
Hi Onsi,

Could you add me to the list to get a copy of the architectural changes to implement SSH?. Wondering how would something like this would be comparable to (or prevent) tmate like operations. Thoughts?

Nice work!

Alberto
Twitter: albertoaflores

simon.j...@springer.com

unread,
Apr 2, 2015, 4:28:40 AM4/2/15
to vcap...@cloudfoundry.org, ofak...@pivotal.io
I would also be interested SSH implementation details.

Matthew Sykes

unread,
Apr 2, 2015, 8:13:32 AM4/2/15
to vcap...@cloudfoundry.org
Wow; quite a bit of interest in ssh...

The work we're doing now is based on a prototype I put together in February. It consists of a few parts:

- A thin ssh daemon that can be deployed in the container
- An ssh proxy that can be bosh deployed
- A load balancing tier that distributes requests to the proxies
- A cli plugin to help with the user experience

When a user wishes to access a container, it ssh's to the load balancer at port 2222. (We used haproxy for our work.) The load balancer then connects to one of the ssh proxies and forwards the stream. All proxies share the same host key and, eventually, the public key fingerprint can be verified against data in /v2/info.

At the proxy, we authenticate the client and determine which container the user wants to access. We do this by having the client provide a username that indicates the destination and a password that is a valid domain credential.

For cf, we use 'cf:${app-guid}/${index}' as the username and the oauth bearer token as the password. Authentication and authorization involves contacting the CC with the bearer token and getting the instance information associated with the app to find the host and port of the app instance the user wants to access.

For diego, we use 'diego:${instance-guid}' for the username and '${receptor-user}:${receptor-password}' for the password. We connect to the receptor with the provided receptor credentials and get the host and port of the requested container instance.

With authentication complete, we establish an ssh connection to a daemon in the container. The daemon listens on 2222 inside the container and it was downloaded and configured by actions in the desired LRP. It runs as an unprivileged user and can only access container resources as that user.

For this second hop, we use a key pair that is configured at the proxy to connect to the container.

With both connections established, we wire the ssh connection from the end user and the ssh connection to the container together in the proxy. Since we're in the middle, we can see the protocol flows and apply policy. For example, we can turn off port forwarding or shell access while still allowing sftp for file access.

Finally, to make things easy on the end user, we have a cf cli plugin. This plugin uses the go ssh client to access ssh.${domain} at port 2222 and authenticate with the app guid and index requested by the user and the oauth token of the client.

Putting it all together, you get this: https://asciinema.org/a/17065

Hopefully that helps explain where we are and if there are any concerns.

Thanks.

--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.



--
Matthew Sykes
matthe...@gmail.com

Mike Youngstrom

unread,
Apr 2, 2015, 1:38:43 PM4/2/15
to vcap...@cloudfoundry.org
Nice update.  I'm excited about the ssh and port forwarding support.  That is very high on the list of desired features in my org.

Question about Placement Constraints.  One of our requirements for this is the need to control which placement tags a router can route instances too.  For example, If we have 2 tags "internal" and "DMZ" we'd like to have a router in our DMZ zone only route traffic to instances tagged at "DMZ".

Also with placement constraints will you support an application being able to specify multiple tags and if so will diego attempt to evenly divide instances across the multiple tags?  Or will we only support a single tag for an app and force the user to create multiple apps with the same route if they want an app in more than one pool?

Thanks,
Mike

--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.

Onsi Fakhouri

unread,
Apr 2, 2015, 5:42:46 PM4/2/15
to vcap...@cloudfoundry.org
Question about Placement Constraints.  One of our requirements for this is the need to control which placement tags a router can route instances too.  For example, If we have 2 tags "internal" and "DMZ" we'd like to have a router in our DMZ zone only route traffic to instances tagged at "DMZ".

This sounds pretty straightforward and could be a follow-on feature thought it’s not going to part of the first iteration on the PlacementConstraints.

 

Also with placement constraints will you support an application being able to specify multiple tags and if so will diego attempt to evenly divide instances across the multiple tags?  Or will we only support a single tag for an app and force the user to create multiple apps with the same route if they want an app in more than one pool?

You may specify multiple tags on an application but these will be ANDed together. To be clear: if an app has tags [alpha, beta] then it will only be scheduled on Cells that have both alpha and beta tags. Diego will find the set of Cells that satisfy this constraint and then distribute the application evenly among those Cells. To be double clear: if you have some Cells tagged alpha and others tagged beta the application will not be placed. While we could make the logic a bit more complex and include the ability to OR tags together I’d prefer to keep the first pass simple.

With that said, it sounds like part of the problem you are trying to solve is to ensure that an application is spread out across different pools of Cells. We currently support this using zones. Cells can be assigned a zone (typically an availability zone though you are free to interpret this as you wish) and Diego will do tis darndest to distribute application instances for a given app across zones. This behavior plays nicely with the PlacementConstraints we’re proposing. Cells with that satisfy a PlacementConstraint may span multiple zones. Diego will find those Cells and then distribute the application across zones. In a sense you could think of zones as a special kind of tag that the end-user specify in in a PlacementConstraint — Diego knows to distribute across zone tags.

Onsi


 

Mike Youngstrom

unread,
Apr 3, 2015, 11:50:20 AM4/3/15
to vcap...@cloudfoundry.org
Thanks for the clarifications Onsi.  I think the functionality you describe regarding tags and zones will work well for what we have in mind.

I understand that full pinning of a router to a Placement(s) isn't in the current plans.  However, if the opportunity arises to help this functionality along, like perhaps adding placement data to published route data, it would be great if you could give it some consideration.

Thanks!
Mike

Onsi Fakhouri

unread,
Apr 3, 2015, 11:53:57 AM4/3/15
to vcap...@cloudfoundry.org
On Fri, Apr 3, 2015 at 8:49 AM, Mike Youngstrom <you...@gmail.com> wrote:
Thanks for the clarifications Onsi.  I think the functionality you describe regarding tags and zones will work well for what we have in mind.

I understand that full pinning of a router to a Placement(s) isn't in the current plans.  However, if the opportunity arises to help this functionality along, like perhaps adding placement data to published route data, it would be great if you could give it some consideration.

The way things work now the requested routes and the placement constraints will be, in a sense, packaged together in Diego.  We would need to give the route-emitter the brains to know how to handle the placement constraints to pick out candidate routers to update.

Onsi
 

Phil Whelan

unread,
Apr 3, 2015, 7:20:44 PM4/3/15
to vcap...@cloudfoundry.org
Hi Onsi, Matthew,

Thanks for the great update! Sorry, a bunch of questions at once, which are mainly things I spotted going through the Trackers for these new features.

https://www.pivotaltracker.com/n/projects/1003146/stories/90748242


> As a consumer of Diego, I can craft an LRP that allows me to spin up an SSH daemon

Noticed, on this ticket there is lot of chatter about https://github.com/cloudfoundry/cfdreddbot but I can't access that. What is this repo?

Changing the PlacementConstraint associated with a space is allowed. At this point the applications in the space need to be restarted but CC has no way to orchestrate this. The same issue exists with the Application Security Groups.

Could you clarify why the CC has no way to orchestrate restarting the applications of space? Is it just that the functionality to gracefully migrate the app instances to other Cells does not exist?

Thanks!
Phil

Onsi Fakhouri

unread,
Apr 3, 2015, 7:33:05 PM4/3/15
to vcap...@cloudfoundry.org

Hi Phil,

Hi Onsi, Matthew,

Thanks for the great update! Sorry, a bunch of questions at once, which are mainly things I spotted going through the Trackers for these new features.

https://www.pivotaltracker.com/n/projects/1003146/stories/90748242

It’s a private link that the team uses to communicate internally.

> As a consumer of Diego, I can craft an LRP that allows me to spin up an SSH daemon

Noticed, on this ticket there is lot of chatter about https://github.com/cloudfoundry/cfdreddbot but I can't access that. What is this repo?

cfdreddbot monitors the various CF repos and asks contributes to sign the CLA (if necessary).

Changing the PlacementConstraint associated with a space is allowed. At this point the applications in the space need to be restarted but CC has no way to orchestrate this. The same issue exists with the Application Security Groups.

Could you clarify why the CC has no way to orchestrate restarting the applications of space? Is it just that the functionality to gracefully migrate the app instances to other Cells does not exist?

There are three classes of changes that can happen to an app:

  • changes that require a restage (e.g. changing the stack, updating the application bits, changing the allocated memory)
  • changes that require a restart (e.g. changing the placement constraints or the application security groups)
  • changes that require neither (e.g. changing routes, changing # of instances)

CC, today, relies on the client to orchestrate changes in the first two categories. For example, when you cf scale -m XX the cf client creates a new set of application instances with the desired memory, waits for them to come up, then shuts the older, smaller, instances down. Similarly, when you cf push the cf client stops the existing application, stages the new one, then launches the new application (hence the need to orchestrate a blue-green deploy independently via separate applications).

Onsi


 

Phil Whelan

unread,
Apr 3, 2015, 10:05:51 PM4/3/15
to vcap...@cloudfoundry.org

Jack Cai

unread,
Apr 9, 2015, 1:36:43 PM4/9/15
to vcap...@cloudfoundry.org
This is all exciting. For the health check support in Diego, will it continue to support droplet.yaml to specify a state file? If yes, how does this work for continuous health check?

Jack


Onsi Fakhouri

unread,
Apr 9, 2015, 3:15:44 PM4/9/15
to vcap...@cloudfoundry.org
Hey Jack,

We aren't supporting the state file health check at this point.  And, yes, that wouldn't be particularly useful for a continuous health check.

Onsi

Guillaume Berche

unread,
Apr 22, 2015, 10:32:23 AM4/22/15
to vcap...@cloudfoundry.org, ofak...@pivotal.io
Thanks Onsi and Matthiew for this additional details.


> Diego does not currently support the cf files API. We plan to resolve this by broadening the scope of access to the containers and giving developers the ability to have full SSH access into running containers.


One nice thing about the cf files and more generally the current CF APIs is that they worked reasonnably well with firewalls/http proxies which in some orgs block non HTTP(S) traffic, to the internet and sometimes across intranets within the organizations. Also they're fairly well supported in the SDKs (cf-java-client) and in some IDEs [1]. Lastly, they could safely be opened to users without risking much side effects on the app that remained immutable.

If SSH interactive access, and port forwarding become the mainstream solution to operate and troubleshoot apps (cf files, replacement for the previous DEBUG and CONSOLE ports), it will be useful for users behind such firewalls to be able to configure diego ssh plugin to use HTTP/SOCKS proxies to reach public CF instances. If the diego ssh cli plugin is using the regular local host ssh binaries, maybe tweaking the .ssh config file to add flags associated to host ssh.${domain} might be sufficient.

Also, current cf files clients will have to implement similar mechanism as the diego ssh plugin, and bundle an scp client.

It would be interesting to be find ways to preserve the guarantee that the app instances remain immutable even through they may be accessed in read mode:
- In terms of fine grain controls that the ssh proxy supports and exposes to Cf users, is it likely that the proxy/container could allow SCP source mode [2] only, i.e. allowing reading into the containers files and not writing to it (i.e. ACL parity with cf files) ?
- Similarly, it might be useful to be able to allow local port forward (for connecting to JMX or debugger port in the container) while not granting ssh interactive access. Although JMX access may have side effects on the app instance.
- A more radical approach could be similar to google app engine managed vm instance lock/unlock feature  [3]: a call to a new "unlock" command to a CF app instance would be necessary to get SSH access to it. CF then considers this instance as "tained"/untrusted, as it may have deviated from the pushed content, and does not act to it anymore (i.e. does not monitor its bound $PORT or root process exit, which may be handy to diagnose it as wish). When the "lock" command is requested on this instance, Cf destroys this tainted instance, and recreates a fresh new "trusted" one.

The SSH access will add lots of flexibility for operating individual instances, which is great. Is there still plans to support TCP routing to instances for other use cases (such as apps needing to receive production traffic on additional ports, and potentially non HTTP/WebSocket), as I could not find trace of it into the diego backlog. Similar question for tunnelling to service instances (previously cf tunnel) when they are not publicly addressable.

Thanks,

Guillaume.

[1] http://docs.run.pivotal.io/buildpacks/java/sts.html#view-file
[2] http://en.wikipedia.org/wiki/Secure_copy#How_it_works
[3] https://cloud.google.com/appengine/docs/managed-vms/host-env#changing_management



On Thursday, April 2, 2015 at 3:01:24 AM UTC+2, Onsi Fakhouri wrote:

James Bayer

unread,
Apr 22, 2015, 11:15:25 AM4/22/15
to vcap...@cloudfoundry.org, Onsi Fakhouri
guillaume,

onsi is on vacation for the remainder of this week. ssh policy is something we're planning to have in some way. for example, turn the feature on and off globally, later have finer grained policies [1] possibly applied at a space level rather than per-app.

GE engineers that have dojo'ed with the diego team are planning on starting some TCP routing work in may timeframe. the basic idea is to have an endpoint service that let's you request a specific port (or get assigned any available ip:port if you don't care). once you have that you'd be able to match that tcp endpoint to a particular app port with a route and have that exposed. we would like not use gorouter for this. the detailed planning for this is still forthcoming.

your points about "cf files" working well with http are something we are interested in preserving if we can, however we may decide that it's not worth the cost involved given the flexibility of the new ssh functionality. once we expose the ssh functionality we'll gather feedback and make a more informed decision, but "cf files" is not expected to work on diego initially.


--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.



--
Thank you,

James Bayer

Guillaume Berche

unread,
Apr 22, 2015, 11:46:19 AM4/22/15
to vcap-dev
great, thanks James!

--
You received this message because you are subscribed to a topic in the Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/CAB%3Dt-sUyd1Q6uQhS5xO508y_aebWqQTzoRcPEX%3DzVpgk7c796g%40mail.gmail.com.

Reply all
Reply to author
Forward
0 new messages