Mentioning now because Adam Jacob said (hopefully not out of context?)
on the Windows automation thread:
"If you have the luxury of starting from scratch (or nearly scratch)
the way you want to model app-deployment is as a part of system
configuration"
Ignoring the practicalities/limitations of existing systems, is this
where everyone else would start?
Are their any reasons to continue with push tools like capistrano or
well loved bash|fabric|ant scripts for deployment except for inertia?
G
--
Gareth Rushgrove
Web Geek
In an ideal world, I would probably use the pull/async model, but I
would still split the target servers into cluster and roll out to one
cluster at a time, with a way to roll back in case of errors. This all
is much easier said than done, but I would love to find out how people
do it, if they do it.
Grig
On 17 May 2011 20:16, Grig Gheorghiu <grig.gh...@gmail.com> wrote:
> I still think it makes sense to use push-based tools for application
> deployment, unless you have hundreds of servers that need to be
> updated.
Is there a line in the sand here? > 100 servers do x, < 100 servers do
y. Or is it never that clear cut? OK, so everyone thinks their
application is unique but but how do you determine that number? Just
feeling as you grow or is it based on some quantifiable properties?
Throughput, database read/write ratio, chosen
webserver/framework/language/platform.
G
On Tuesday, May 17, 2011 at 1:07 PM, gareth rushgrove wrote:
Something that interests me (in that I change my mind about the
> answer) and was discussed a little at the last devopsdays Europe is
> where application deployment and configuration management collide.
>
> Mentioning now because Adam Jacob said (hopefully not out of context?)
> on the Windows automation thread:
>
> "If you have the luxury of starting from scratch (or nearly scratch)
> the way you want to model app-deployment is as a part of system
> configuration"
>
> Ignoring the practicalities/limitations of existing systems, is this
> where everyone else would start?
It's more than just system configuration. The configuration is there to facilitate a system state that you've defined as policy for your infrastructure. When you're looking at the entire infrastructure itself, every application component needs to be integrated with everything else. This means treating the application itself as a managed resource that should be in a particular state, such as "deployed".
When scaling from a small environment to a large one, I've found it is much easier to reason about application deployment when it is included in the rest of system "configuration." It also means you have one canonical location in your "infrastructure as code" to look for how applications are deployed. Push-based application deployment tends to be prone to failure at scale in my experience.
--
Joshua Timberman, jos...@housepub.org
http://twitter.com/jtimberman | http://jtimberman.posterous.com
Something that interests me (in that I change my mind about the
answer) and was discussed a little at the last devopsdays Europe is
where application deployment and configuration management collide.
Mentioning now because Adam Jacob said (hopefully not out of context?)
on the Windows automation thread:
"If you have the luxury of starting from scratch (or nearly scratch)
the way you want to model app-deployment is as a part of system
configuration"
So what's the definition of small and large in this scenario?
And what causes the "failure at scale"?
Is it easier/better to go with a push based mechanism below that
threshold? Or is is better to always go for a pull/integrated
approach?
G
>
> --
> Joshua Timberman, jos...@housepub.org
> http://twitter.com/jtimberman | http://jtimberman.posterous.com
>
--
- Two repositories to manage configuration.
- Integration into the development ecosystem
- Multi-step deployment process
Having two repositories to manage configuration is almost a
non-starter. Let's assume we have the following classes of
configuration "atoms":
- Volatile
- Environment-specific
- Fairly static
In most cases, the static stuff we can ignore. It can be managed as
part of the normal codebase and most likely will be packaged as part
of a jar somewhere. However, when we get to the environment specific
stuff, we now need to resort to ERB templates that chef can populate.
Those are stored in a different repo than the code base. We can't use
any fancy submodule magic because the main code base is SVN and the
chef repo is git. So now, when we add a new configuration atom that
needs to be environment aware, we have to manage it in TWO places. We
need a local copy in the SVN repo that developers can use for local
development and a templated copy in Chef. This will almost always fall
down when a configuration setting doesn't get added to the template.
However assuming we get that worked out, we now have these highly
volatile configurations that, if we're lucky, aren't environment
specific. If they are, we've just created additional overhead. The
"solution" is probably to use something like Vagrant but that's a
mindset change in the midsts of another mindset change.
Then we have the issue of integration into the development ecosystem.
How do we manage those artifacts in Maven when they're needed in Chef?
Do we define dependencies in two places, once in Maven and once in a
Data Bag? Or do we simply build with Maven and generate system
packages to install? Right now we package our configurations as Maven
artifacts because that's the ecosystem the developers use and know.
The java application cookbook in Chef is awesome but it's fairly
simplistic. For any moderately complex java application, it's not just
a single war. There's various settings that are outside the war file -
log4j settings, spring property files. If we package those configs in
the war, we're no farther along that we were. So is the solution to
deploy using the maven artifacts (config tarball + war file) and then
overwrite the configs with Chef? That feels like an antipattern.
Finally, the multi-step deployment process. This is something we're
struggling with now. We, like many people, don't run in daemon mode.
Updating the data bag and then logging on to a given server to stage
the rollout doesn't make sense. Especially at any amount of scale.
Knife is a great tool but it's not a tool I want to use to manage
client runs across 100s or 1000s of machines.
In my perfect world, code commits trigger action and everything from
that point on is automated. Even if we were starting from scratch, I
don't think the tight integration makes sense for the Java world. I'm
pretty bullish (as everyone knows) on differentiating certain types of
configurations. Our solution so far has been to do things like using
haproxy on local machines to load balance across various service
clusters (so essentially every app server that needs to talk to FOO
services talks to localhost:someport). This allows us to manage that
environment aware and volatile configuration with Chef without
impacting the developer workflow. The rest of the stuff we're moving
into a system like Noah.
Hope my rambling made some sense ;)
--
John E. Vincent
http://about.me/lusis
Grig
Then it would be great if people advocating 'do everything with your
config mgmt tool of choice) (Joshua, Jordan and Patrick so far) would
go in some detail about the exact workflow that they apply during an
application rollout.
Grig
Awesome, that's exactly the level of detail I was hoping for. Thanks
for sharing!
Grig
There's a number of reasons for this, most of them having to do with psychology and how many things humans can track in their heads at any given point in time.
> And what causes the "failure at scale"?
>
> Is it easier/better to go with a push based mechanism below that
> threshold? Or is is better to always go for a pull/integrated
> approach?
For all three of these questions, see push vs pull at infrastructures.org
http://www.infrastructures.org/bootstrap/pushpull.shtml
The short answer is, because sometimes hosts are down or otherwise unreachable, especially in a large distributed infrastructure.
Here's a good example. One of the reasons I was hired for my current
company was to "kaching-ify" the environment. However retrofitting
that onto the existing environment is next to impossible. It's not a
bad environment. It just wasn't designed with automatic scaling in
mind. I eventually decided that the best bet, and the one that
everyone is cool with, is to replace all of our existing EC2 instances
for a given component at a time. I went through the gamut of tagging
machines as legacy vs. non-legacy and trying to write cookbooks that
took that into account. It wasn't happening.
So in essence I *AM* starting from scratch. I'm moving configurations
around per-application as I described. I'm using Jenkins for
self-service builds. However there's still the disconnect around
managing application configuration and the overlap with system
configuration (what I'm calling environment-aware configuration) that
makes sense at scale. As I said, the ultimate goal is code deploys are
triggered by commit message and scaling is ENITRELY automatic.
--
To clarify my actual position: I think the goal here is a fully
automated infrastructure, soup to nuts. That means if I have new
systems to roll out, the entire process is automated - in the places
where it isn't, it's a documented thing and part of the business
process.
Secondary to this is the desire to be able to re-build, from scratch,
the entire business from nothing but source code, an application data
backup, and bare metal hardware. You can do that with multiple tools -
I've found that if I start with that goal in mind, I rarely need a
second step.
Push/Pull is, in my mind, a red herring here - you can push if you
want to with tools like Chef and Puppet, just like you can with
fabric/func/capistrano. Which you choose is largely a function of the
application itself, and the workflow needs you have around
orchestrating the various state transitions.
For 99% of the people I've talked to, they think about the common
transitions inherent in things like Application Deployment as gates
and phases: you have a phase that happens on some set of systems, that
completes a gate (think a guard statement, like "all the servers have
copied the war file") that moves them on to the next phase ("all the
servers reload tomcat"). You have lots of variation around what
happens with failure, and how much (and when) something needs to be
guarded.
As an example, we roll out updates to Opscode entirely with Chef, but
we trigger the event - essentially we push. We built the application
to be free of the need for very many gates - each individual server
can update itself without having to worry about orchestrating with
others. Now, we gained that ability intentionally - the ability to
operate our application was in our minds when we built it.
In the real world application deployment is all about the application
- the devil is entirely in the details. If you get to start from
scratch, I would recommend you put your effort in to making sure the
application can be deployed with as little orchestration as possible.
If you don't get to start from scratch, then I would do what fits the
existing workflow best first, and work towards the world where you can
remove orchestration. :)
Adam
--
Opscode, Inc.
Adam Jacob, Chief Product Officer
T: (206) 619-7151 E: ad...@opscode.com
>Every puppet run installs the latest 'deployment' deb. So two builds +
>published packages and a change to the deployment config == new application
>version deployed.
How do you handle rollbacks?
Using system level packaging, you can't do this atomically.
And in your case, it sounds like upgrades are eventually consistent by design.
--dan
--------------------------------------------------------------
<dsully> please describe web 2.0 to me in 2 sentences or less.
<jwb> you make all the content. they keep all the revenue.
--
We've talked a bit about this before, but not in public. :)
There are lots of different kinds of state - active state, desired
state, etc etc.
Nothing inherent in Chef, or any other tool, makes it so you couldn't
use it to the problems you describe. With Chef, you could be having
the builds triggered by Jenkins updating the desired state of your
application through the API, so that when they get deployed they are
tightly coupled. You could be doing it on a per-version basis. You
could model it as baselines with differentials. All sorts of ways.
The bare bones reality of it is that you have to deal with the world
the way you find it, and you must leave it a better (read more
efficient and pleasant) place than you found it. I love the explosion
of tooling in the space - it's a reflection of the reality that
everyones environment is different, and no two applications are alike.
The best thing you can do is step back from the problem (read
implementation) for a minute, and think about the actual business
case. You're doing that clearly: I want to go from commit to
deployment with no steps in between, including scale. I would argue
that whether you're doing it all with Chef or not, you're getting the
same net effect - one inflection point that kicks of a series of
relatively autonomous actions that bring themselves as close to fully
functioning as possible, and that any steps that are left are *also*
happening, and it's just a matter of time before everyone agrees on
the new world order.
As someone with a blog titled after an psychology paper from the 1950s
about this exact issue
(http://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two)
I like you're answer :)
The question becomes how do you work out this number for your given
team/architecture? Also, that number should have a cool sounding name.
G
>> And what causes the "failure at scale"?
>>
>> Is it easier/better to go with a push based mechanism below that
>> threshold? Or is is better to always go for a pull/integrated
>> approach?
>
> For all three of these questions, see push vs pull at infrastructures.org
>
> http://www.infrastructures.org/bootstrap/pushpull.shtml
>
> The short answer is, because sometimes hosts are down or otherwise unreachable, especially in a large distributed infrastructure.
>
> --
> Joshua Timberman, jos...@housepub.org
> http://twitter.com/jtimberman | http://jtimberman.posterous.com
>
--
* Jordan Sissel shaped the electrons to say...How do you handle rollbacks?
Every puppet run installs the latest 'deployment' deb. So two builds +
published packages and a change to the deployment config == new application
version deployed.
Using system level packaging, you can't do this atomically.
And in your case, it sounds like upgrades are eventually consistent by design.
I've always thought the push vs. pull is a red herring in terms of the critical part of the conversation, partially because it's so easy to switch from one to the other - e.g., Puppet supports both just fine, using http, mcollective, capistrano, or whatever you want.
What I've become more interested in recently is focusing on the decision - who makes it, how it propagates, etc. For system stuff, we seem to be pretty comfortable having code deployed automatically, within half an hour or so, but as Darrin says, app developers generally prefer to be able to control when and how an app gets deployed.
Even doing that deployment could be done relatively easily in a pull model - e.g, we're working with one customer to have their Puppet clients checking in every 60 seconds instead of every half hour, and they update based on db state, so it's all pull but centrally controlled and fits perfectly into a developer workflow without needing to use parallell ssh or whatever.
--
Be wary of the man who urges an action in which he himself incurs no
risk. -- Joaquin Setanti
---------------------------------------------------------------------
Luke Kanies -|- http://puppetlabs.com -|- http://about.me/lak
Maybe it's the use of the words new and from scratch here, but
incremental deployments (maybe even just a few lines or at least a few
commits) are more common. Both these scenarios sound like it's just
for new servers or for initial rollouts of brand new systems?
>
> Push/Pull is, in my mind, a red herring here - you can push if you
> want to with tools like Chef and Puppet, just like you can with
> fabric/func/capistrano. Which you choose is largely a function of the
> application itself, and the workflow needs you have around
> orchestrating the various state transitions.
>
Glad someone said that. I've recently been using fabric to trigger
chef/puppet runs on demand on relevant machines with some success.
> For 99% of the people I've talked to, they think about the common
> transitions inherent in things like Application Deployment as gates
> and phases: you have a phase that happens on some set of systems, that
> completes a gate (think a guard statement, like "all the servers have
> copied the war file") that moves them on to the next phase ("all the
> servers reload tomcat"). You have lots of variation around what
> happens with failure, and how much (and when) something needs to be
> guarded.
>
> As an example, we roll out updates to Opscode entirely with Chef, but
> we trigger the event - essentially we push. We built the application
> to be free of the need for very many gates - each individual server
> can update itself without having to worry about orchestrating with
> others. Now, we gained that ability intentionally - the ability to
> operate our application was in our minds when we built it.
>
> In the real world application deployment is all about the application
> - the devil is entirely in the details. If you get to start from
> scratch, I would recommend you put your effort in to making sure the
> application can be deployed with as little orchestration as possible.
> If you don't get to start from scratch, then I would do what fits the
> existing workflow best first, and work towards the world where you can
> remove orchestration. :)
So, sometimes the real world is annoying. I'm keen on ignoring that in
the spirit of idealism.
In the back of my mind when I posted this thread was whether anyone
fancies putting together some "best practice" examples of deployment?
Taking a really simple application (lets say a single file wsgi python
application, php file or sinatra app) what would the 'perfect'
deployment mechanism look like?
>
> Adam
>
> --
> Opscode, Inc.
> Adam Jacob, Chief Product Officer
> T: (206) 619-7151 E: ad...@opscode.com
>
--
>> How do you handle rollbacks?
>
>apt-get handles downgrades just fine (via puppet's package provider).
>>
>> Using system level packaging, you can't do this atomically.
>>
>See above :)
Right - apt can't atomically switch. You need to remove & re-install.
Not as easy or as fast as changing a symlink & restarting your process.
What happens when engineering screws up (will happen), and the released code
isn't working right? Do you roll back, or do you fail forward? What sort of
downtime do you incur?
Does your eventual update model allow this type of symptom to be shown on
only the hosts it's been deployed to, and then you make your rollback/forward
choice?
Nah - I mean more if the project starts from scratch, not if the
servers do. When you talk about application deployment and server
configuration, it's all about what the application needs. If you're
managing one that was already built, the fastest path will be to
automate what is already in place: which means automating the existing
(perhaps inferior) workflow. If you have the luxury of defining that
workflow, my experience has been that the workflow that has as few
points of orchestration as possible kicks ass, from an operator point
of view. :)
> So, sometimes the real world is annoying. I'm keen on ignoring that in
> the spirit of idealism.
Not ignoring it is part of my idealism. :)
> In the back of my mind when I posted this thread was whether anyone
> fancies putting together some "best practice" examples of deployment?
> Taking a really simple application (lets say a single file wsgi python
> application, php file or sinatra app) what would the 'perfect'
> deployment mechanism look like?
The more simple you make it, the less useful it is. And the more
complex you make it, the more you'll get argued that you're solution
is baked in with too many complexities. You can't win for losing on
this road. We have good examples of deploying all of the above in
multi-server environments that scale well, and we have people who have
looked at those same solutions and found them either to complex or too
simplistic.
The reason capistrano/fabric/func are ubiquitous in this space is that
they provide really easy access to the common gates/phases primitives.
The perfect tool for me would wrap the value of Chef's desired state
management with the trivial gates/phases and active state inspection
you want. I think Chef+Noah actually goes a very long way down this
road.
Best,
* Jordan Sissel shaped the electrons to say...
Right - apt can't atomically switch. You need to remove & re-install.How do you handle rollbacks?
apt-get handles downgrades just fine (via puppet's package provider).
See above :)
Using system level packaging, you can't do this atomically.
Not as easy or as fast as changing a symlink & restarting your process.
What happens when engineering screws up (will happen), and the released code
isn't working right? Do you roll back, or do you fail forward? What sort of
downtime do you incur?
Does your eventual update model allow this type of symptom to be shown on
only the hosts it's been deployed to, and then you make your rollback/forward
choice?
So how do you segment people when it comes to what they are looking
for? I'll come back to number of servers because it's been mentioned,
but anything else?
I think when it comes to examples the problem is more trying to not
introduce constrains and trying to be too generic. An example that is
tailored for <10 servers for a Rails environment is probably doing to
get different folks interested and different feedback than a 100
server Python setup or a 1000 server Java setup.
So more constraints in this area are a big win in my view. I'm not
looking for one example to rule them all, more lots of examples that
we, as a community, thing is best, given different constrains. The
question is to me: is the number of permutations manageable for a set
of examples.
Also, some feedback on this topic is always going to be negative
because it might involve change (also generalisms). I'm happy to
ignore some of that. As mentioned, this thread is wholly idealistic :)
>
> The reason capistrano/fabric/func are ubiquitous in this space is that
> they provide really easy access to the common gates/phases primitives.
> The perfect tool for me would wrap the value of Chef's desired state
> management with the trivial gates/phases and active state inspection
> you want. I think Chef+Noah actually goes a very long way down this
> road.
Anyone have a example of this working? Or disagree this is the ideal?
(obviously substituting Chef for Chef/Puppet and Noah for
Noakh/ZooKeeper for technology agnostic watchers)
G
>
> Best,
> Adam
>
> --
> Opscode, Inc.
> Adam Jacob, Chief Product Officer
> T: (206) 619-7151 E: ad...@opscode.com
>
--
devops-t...@googlegroups.com wrote on 05/17/2011 04:28:41 PM:
> From: Adam Jacob <ad...@opscode.com>
> Nah - I mean more if the project starts from scratch, not if the
> servers do. When you talk about application deployment and server
> configuration, it's all about what the application needs. If you're
> managing one that was already built, the fastest path will be to
> automate what is already in place: which means automating the existing
> (perhaps inferior) workflow. If you have the luxury of defining that
> workflow, my experience has been that the workflow that has as few
> points of orchestration as possible kicks ass, from an operator point
> of view. :)
Yeah, I think that in highly complex environments you just end up needing orchestration. Sure, if I'm Google or Facebook and I've built some huge pool of stateless apps, it's great... But us enterprise guys have big ass hairballs of junk we have to deal with a lot of the time, and tight orchestration is often just the only tenable way to do something. Including "human workflow steps", to use a BPELy term, where oh say a DBA who hates automation because they hate people has to manually do something... Also I'm glad everyone is always so sure of their code that they figure they can dribble it out into production at random, but eek!
We normally resort to a 'rolling deploy' where we put the code to X of Y boxes, test it, and redeploy previous code if it fails, or keep rolling if it doesn't. Not DevOps nirvana but you go to war with the army you have.
Ernest
______________________
UN-altered REPRODUCTION and DISSEMINATION of
this IMPORTANT information is ENCOURAGED.
Right - it's about complexity of the process/environment as you found
it, not necessarily application complexity.
I worked for a company that had a 20+ step minimum deployment process,
with several of the steps tied directly to specific compliance
line-items. Any attempt to alter the 20 steps would have meant a
ridiculous push that would probably have died on the vine. Instead, we
automated the 20 that already existed, including the existing
compliance components, so that we had the victory we needed to change
things: once we owned it, dropping down to 5 steps under the hood was
easy.
Generally, I think people are looking for congruence between the
complexity they perceive in their request and the complexity of the
implementation of it. This is personal, cultural, and experience
settling what this balance ends up looking like.
More concretely, I believe those who are used to dragging and dropping
images of applications into an eclipse server view will have a certain
set of expectations on what deployment looks and feels like vs the
thousand node condor job vs autosys vs a big stinky n-tier vs...
They'll have their most common deployment experience as a measuring
stick and anything more complex will be too much and less would be too
naive. This is the opportunity of PaaS.
The service part of PaaS, IMHO, is marrying up expectations of the
user, with the right level of detail, in an API form, possibly
multiple API views per culture. Some cultures will care about reusing
tech they already bought into, or auditability over all. Offerings
needn't use the same technology for different deployment use cases and
scale, but it is certainly simpler on the backend to have pliable
technologies that cover vast areas. As I think Luke mentioned, I
agree the user is less concerned with mechanics of push/pull vs being
in control, and having reliable state transitions. Many tech can work
either way, and the point of raising abstraction is to help give a
better view to the user and more options to the implementor.
So, anyway, the perfect deployment matches the requestors expectations
with relevant execution, and to me.. this is also a goal of PaaS.
thanks for the fun thread.
-A
I understand that your deploys are handled by puppet and that your deploys
are staggered during the puppet run window (default is 30 minutes. I
don't know how often yours runs.). Based on what you say below, am I
correct in inferring that your application is in an inconsistent state for
N minutes while each of your servers eventually runs the latest puppet
manifest? Can you elaborate on how this plays out practically during
upgrades that cause noticeable site change to end users and/or breakage to
the site (such as an deploy requiring a database schema change)?
I purposefully haven't automated my deploys via puppet in order to avoid
having an inconsistent application state during a puppet run window, so
I'm interested to see how you deal with such deployment scenarios in your
environment.
regards,
Scott
On Tue, 17 May 2011 16:21:35 -0400, Jordan Sissel <j...@semicomplete.com>
wrote:
--
Using Opera's revolutionary email client: http://www.opera.com/mail/
Jordan,
I understand that your deploys are handled by puppet and that your deploys are staggered during the puppet run window (default is 30 minutes. I don't know how often yours runs.). Based on what you say below, am I correct in inferring that your application is in an inconsistent state for N minutes while each of your servers eventually runs the latest puppet manifest? Can you elaborate on how this plays out practically during upgrades that cause noticeable site change to end users and/or breakage to the site (such as an deploy requiring a database schema change)?
I purposefully haven't automated my deploys via puppet in order to avoid having an inconsistent application state during a puppet run window, so I'm interested to see how you deal with such deployment scenarios in your environment.
This is another great use case for things like dark launching - you
put in feature flags, and a way to twiddle features on or off. That
means you can roll the deploys out at your leisure, and make a single
application change to launch the feature in a coordinated fashion. You
don't *have* to tie things up to the way you orchestrate the
deployment.
With Chef doing deploys and the 0.10 release, we send a signal to the
running daemons, to trigger the run. You still won't necessarily get
gates, but you can get much closer intervals.
> Doing perfectly-timed, zero-interruption upgrades is pretty hard, has wild variation in solution, and depends on the right design decisions being made through your entire stack.
One additional point to the pull vs push discussion, along the line of what Jordan said.
I recently came off a project where pull deploy was used to distribute the application code to the cluster of 5000+ nodes. The process worked great except when it came to switching to the new version, then the out of sync problem really hit them. They're were losing real money during these releases. The solution they were headed too: a pull model to get all the bits on the box and a very tight push model to signal the symlink change. The signaling was custom built and relied on established connections to the "signalling" agent.
The point is that their deployment process required this type of sophistication because there was a financial incentive to it and the organization, at least the release engineering/system administration organizations got behind finding a solution. The next step was to look at the app code and figure out how they could do it it seamlessly, but all in due time. The organization was still getting *all* elements of the business units to line up.
-Noah
Something that interests me (in that I change my mind about the
answer) and was discussed a little at the last devopsdays Europe is
where application deployment and configuration management collide.Mentioning now because Adam Jacob said (hopefully not out of context?)
on the Windows automation thread:"If you have the luxury of starting from scratch (or nearly scratch)
the way you want to model app-deployment is as a part of system
configuration"
Ignoring the practicalities/limitations of existing systems, is this
where everyone else would start?
For me, it’s actually become more entrenched. Regarldess of the tooling you use, thinking systemically about deployment in the context of the entire lifecycle, from cradle to grave, is the best thing you can do for yourself. Some tools are easier to work with than others in that systemic sense.
Adam
--
You received this message because you are subscribed to the Google Groups "devops-toolchain" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
devops-toolcha...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Mentioning now because Adam Jacob said (hopefully not out of context?)
on the Windows automation thread:"If you have the luxury of starting from scratch (or nearly scratch)
the way you want to model app-deployment is as a part of system configuration"
Sorry to join this conversation so late, but for those who advocate packaging apps and using configuration management to deploy, I'm wondering how you handle stateful deployment interactions? Things like taking servers out of loadbalancers while their caches rewarm, making sure that servers running the new version of an app are talking to database servers with the schema changes, etc.
Something that interests me (in that I change my mind about the
answer) and was discussed a little at the last devopsdays Europe is
where application deployment and configuration management collide.
Mentioning now because Adam Jacob said (hopefully not out of context?)
on the Windows automation thread:
"If you have the luxury of starting from scratch (or nearly scratch)
the way you want to model app-deployment is as a part of system
configuration"
Ignoring the practicalities/limitations of existing systems, is this
where everyone else would start?
I think you’ll find that, in general, they don’t have them at all. The way the system works is so deeply embedded, culturally and systemically, that routing around it as a requirement of a given piece of software isn’t really feasible. Essentially, they find another way.
Adam
From: devops-t...@googlegroups.com [mailto:devops-t...@googlegroups.com]
On Behalf Of Ryan Miller
Sent: Friday, April 19, 2013 10:31 AM
To: devops-t...@googlegroups.com
Subject: Re: Application deployment vs system configuration
Schlomo--that sounds like a really mature toolchain, and while a lot of people have built something similar (myself included) it's awesome that you've open sourced yours so we can look at it and/or build on it. But in the end it's a stateful deploy tool. Which is fine; it's just that some people (who I respect) in this thread have suggested they don't need/use one and I'm curious about how.
Jeff--but basically everybody has tight state somewhere. I mean, Facebook and Google Docs run on MySQL. What is true is that they've (or at least Facebook has, who seem to talk about it more) come up with really generic schemas so they don't need many schema
changes, and are super-disciplined about feature-flagging. What worries me is they say they still have occasional non-flaggable / tightly coupled releases, and what "continuous deployment" culture has taught us is that rare is scary. The bigger you are,
and the more advanced your culture, the more rare and dangerous those occasional super-stateful deploys are going to be. So I'm curious about if/how people manage to get around them.
Ryan
--
Yeah – I’m not saying it never happens. Simply that it gets more and more rare as the systems themselves become more and more understood, and the cultural/engineering backlash for changes that require it gets larger over time.
Pragmatism is the order of the day here – if upgrading hadoop requires you to roll that way, man, roll that way. J
The big suggestion to our customers is that they start thinking holistically about the system, not about the workflow. Take nagios as an example. The only reason to rebuild that config is because you aren’t monitoring the real status – you are looking at observable side effects. So imagine a world where your monitoring system is hitting a smarter status endpoint, which knows about things like “I’m about to upgrade”. Now nagios is removed from your orchestration flow – the status endpoint of the thing taking the action (the node being upgraded, in this case) makes it possible to observe its state. The nagios server assumes that promise (that the server is the source of truth) will be kept. So – first in a deployment means first to have its status changed, which means when nagios checks next, its “taken out” automatically. Work hard on eliminating the hand of external state manipulation in your infrastructure, and the benefits are legion.
Note the first part, though: pragmatics win out. But you never get to the goal if you don’at know what it is, an in my opinion, the goal is an easy to reason about, easy to operate, well understood infrastructure. And at scale (both in terms of systems and humans) that’s almost always one like I’ve described. In the small, it might not be – pretty easy to grok what happens in that fabric/Capistrano script when you can hold the list of tasks, servers, and humans in your head.
Best,
ADam
From: devops-t...@googlegroups.com [mailto:devops-t...@googlegroups.com]
On Behalf Of Ryan Miller
Sent: Friday, April 19, 2013 11:17 AM
To: devops-t...@googlegroups.com
Subject: Re: Application deployment vs system configuration
Adam,
--
Ryan--
Let’s break them down.
First, the java upgrade and tomcat restart. The problem here is that the “infrastructure” upgrade (Java) is decoupled from the application that is (in theory) passively receiving it (Tomcat). So now we have a new thing we have to track – not just is the application available, but are we doing something outside that we know has an impact. Lets move our application state thingy into a new bucket - one that runs outside the app server. Now we have the process that triggers our java update notify this system (which is local to it) that such an activity is occurring, and our status moves on nicely.
Only doing so many at a time is a function of how your system groups delivery of desired state descriptions. In chef, you do this through environments – in the future, they will likely be even cooler mechanisms. But using environments goes a real, real long way.
Lets rock kernel upgrades, which require a reboot. The hard part here is getting your status to understand that the status endpoints themselves are going to go away for a bit, but that it’s a normally scheduled event. This one is harder to push to the edges, since we don’t have a thing running we can keep consistent – it’s really a thing that can only be observed by an external entity. So lets solve this with one more piece of small code – a service that takes a notification from a sever of a planned reboot, and that automatically suppresses notifications for a period of time (since, hey, lets face it: we can’t suppress them forever, because we care). Pre-reboot, we notify this service of our impending demise.
Getting the hang of it?
Best,
Adam
From: devops-t...@googlegroups.com [mailto:devops-t...@googlegroups.com]
On Behalf Of Ryan Miller
Sent: Friday, April 19, 2013 11:45 AM
To: devops-t...@googlegroups.com
Subject: Re: Application deployment vs system configuration
Ok, so I see how that works for the appserver--you monitor the status page and the app itself, and only fire on app failure if the status page thinks the app should be working. I like that. But it seems like there are lots of times when
it won't work--say you need a new java version, so you have to restart Tomcat, which takes some time. (And just as much so for lower-level things, where you have to upgrade kernel, database, etc). And there's obviously the problem of making sure that you
only do so many servers at a time, check before doing more, etc, which I haven't figured out how to naturally do in the autonomic model.
None of which is to say that culture isn't the most important thing, or that tooling integration isn't getting better. But I would like to move more things to the autonomic model, because it's less painful in general.
Ryan
--
Ryan--
You received this message because you are subscribed to the Google Groups "devops-toolchain" group.
To unsubscribe from this group and stop receiving emails from it, send an email to devops-toolcha...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Right. End of the day, Orchestration is neccessary.
Just remember the first rule of orchestration: first, try and eliminate the orchestrator. Use this question: “How can I change this workflow so that we no longer need an orchestrator?”
Adam
From: devops-t...@googlegroups.com [mailto:devops-t...@googlegroups.com]
On Behalf Of John Vincent
Sent: Friday, April 19, 2013 12:24 PM
To: devops-t...@googlegroups.com
Subject: Re: Application deployment vs system configuration
I've backed of a BIT on how much orchestration I want to automate in the last year. I mentioned it a bit at ChefConf but the fully automated orchestration people want is....not what they want. When you start getting into the weeds, you can quickly end up with dependency loops, unintended changes due to transitive deps and other stuff.
--
Orchestration is a lot like mutable state - keep it minimal and only use it when you have to. When you do that, it makes everything else you're doing almost trivial.
What I am trying to understand is how one can cope with and improve a situation when the funding for an architecture rethink happens about once a decade.
After four years of a major re-architecture effort with significant changes, we can only think of two or three times when we made a conscious effort to hold other things off while we brought in a major change. Each time after we did it, we had several of our lead engineers pointing out that it wasn’t necessary. In fact, it got to being a point of pride where lead engineers would bring in a major change through our standard processes. It might require testing a few components together offline before committing, or a little extra testing up front to reduce the risk, but then major changes would come through the queue along with everything else.
Stu
--
You received this message because you are subscribed to the Google Groups "devops-toolchain" group.
To unsubscribe from this group and stop receiving emails from it, send an email to devops-toolcha...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
The C3 project’s purpose was to replace the entire family of Chrysler payroll programs. It didn’t accomplish that. Many years later, the subsequent project to replace the entire family of payroll programs has also not accomplished it. We now realize what should have been done.
We should have replaced the broken bits, one at a time, most valuable first.
For the record, I super like a d respect stu :)
I think incremental improvement is your only hope here - and incrementally inching towards CD isn't realistic. You'll get better - you won't be unrecognizable.
Absolutely! On many, many fronts, this is true. On business velocity? Not so much.
> My view is that things are usually so bad there is ample room for improvement - even order of magnitude improvement - without a "burn the ships" approach. Release automation tools are a part of that, but mostly as a catalyst to fix the thinking and culture.
What I am trying to understand is how one can cope with and improve a situation when the funding for an architecture rethink happens about once a decade.What if, whenever we implemented new requirements, we never touched legacy stuff but always implemented it in an architecture that _was_ deployable, testable, etc? If that required moving functionality over from existing legacy stuff, we'd find a way to port the smallest possible amount of functionality over that would let us achieve our goal. Occasionally we'd have to modify the legacy stuff to create a little API, but we're talking minimal invasiveness.Over time, the old stuff would be doing less and less. A couple of years in, we could write little tools that instrumented it to discover which code paths were getting executed, and delete big chunks of code.It turns out people actually do this - it's called the Strangler Application pattern. It's how Amazon moved from a big ball of mud to a service oriented API.
My experience is that the Big Rewrite is almost always a total disaster, whereas incremental ways to get us from A to B are low risk.
The hard bit is changing the mindset of people in the organization to understand that the best way to get from A to B is incrementally, not through some big-bang project, and then teaching everybody how to do it - BEFORE the shit hits the fan and we're about to be outflanked by our competitors. Oh, and read Toyota Kata.
> Where I disagree is the notion that one cannot execute incrementally and apply appropriate compromise to build credibility and momentum among a skeptical management that could just offshore the lot and set things back another 10 years.You can't. If they don't see it now, they may in 2, 3 or 5 years. Pushing twoards a goal they fundamentally don't agree with isn't doing a good job. Help them achieve the goals they do understand if you like, but don't kid yourself (or them) that having mediocre interim goals leads to magnificent future results: it does not. You know this in your bones.
We had demonstrated major success in a lean, continuous delivery approach on our bulk commodity demand systems, and saved another doomed offshore project with this approach.
But the alternative, a massive $300m+ rewrite on SAP, won - and was viewed as much less risky by (the new) executive management and board of directors.
GM’s reasons for doing this may well apply to many other firms too. “IT has become more pervasive in our business and we now consider it a big source of competitive advantage,” says Randy Mott, GM’s chief information officer, who has been responsible for the reversal of the outsourcing strategy. While the work was being done by outsiders, he said most of the resources that GM was devoting to IT were spent on keeping things going as they were rather than on thinking up new ways of doing them. The company reckons that having its IT work done mostly in-house and nearby will give it more flexibility and speed and encourage more innovation. [1]
--
You received this message because you are subscribed to the Google Groups "devops-toolchain" group.
To unsubscribe from this group and stop receiving emails from it, send an email to devops-toolcha...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
But the alternative, a massive $300m+ rewrite on SAP, won - and was viewed as much less risky by (the new) executive management and board of directors.I'm all for using COTS if you don't have to do any customization. If you have to customize a system that is not designed for customization (on purpose, so consultants can make a ton of money) you are entering a world of pain. I like to point people who think otherwise at this case study (for Telstra, an enormous telco): http://www.zdnet.com/keep-it-simple-stupid-telstraclear-1339307482/. I am guessing your management failed to grasp this.
The reason execs don't like dealing with IT is because they don't understand it and they don't think it's a core part of their business. That's already not true, and it's about to become a lot more not truer. Fortunately some people are realizing this. GM pioneered outsourcing, and now they've realized it's not a good idea:GM’s reasons for doing this may well apply to many other firms too. “IT has become more pervasive in our business and we now consider it a big source of competitive advantage,” says Randy Mott, GM’s chief information officer, who has been responsible for the reversal of the outsourcing strategy. While the work was being done by outsiders, he said most of the resources that GM was devoting to IT were spent on keeping things going as they were rather than on thinking up new ways of doing them. The company reckons that having its IT work done mostly in-house and nearby will give it more flexibility and speed and encourage more innovation. [1]
I have a dose of skepticism given the negative reviews that have been leaking out of HP
--
You received this message because you are subscribed to the Google Groups "devops-toolchain" group.
To unsubscribe from this group and stop receiving emails from it, send an email to devops-toolcha...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "devops-toolchain" group.
To unsubscribe from this group and stop receiving emails from it, send an email to devops-toolcha...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
I have a dose of skepticism given the negative reviews that have been leaking out of HPWell that's the problem, it's so easy to fuck it up :(BTW how is it insourcing if HP are doing it?!
I think there's also (and guess what my preference is)d) team up with a courageous executive who actually wants to get things done.
Sometimes these people are brought in and only have a short amount of time to prove themselves. Sometimes they get kicked out again after a year or two, but in that time they manage to make lasting course changes (I have at least one example of this happening at a very large company). Sometimes you can use a) to achieve d).
"Shadow IT" is basically d) in action. The trick is leveraging it to transform the rest of IT.