[snip..]
You are correct that EMA was Digitals kick at tackling the
difficult and challenging world of how to create a
proactive management strategy that resolves issues before
they impact the business. Tivoli (IBM), OpenView (HP, now
more or less retired in favour of OneView) and CA are all
other examples of more recent offerings.
Fwiw, you could take the EMA architecture from Digital,
change a few slides, terms and technologies and you would
have a modern strategy for how to proactively manage
private clouds.
DevOPS is mostly industry hype that is similar to SOA, SDN
and other more recent terms. Has some core things that are
ok, but when you peel back the covers, there is lots of
vapor ware because at the core it requires huge changes in
the way multiple different Developers and Operations
groups work in a company. Getting a few developer / OPS
groups to agree on new workflows, strategies and tools
might be possible, but any more groups than that is next
to impossible.
The proper OPS strategy has not changed in 30+ years, but
the huge difficulty remains that IT shops think buying
cool products is the answer to all of this. It's not. At
the core remains that every site is different and you need
to look at each CI device, determine which alerts are
critical or not, forward them to an Operations Bridge for
screening, filtering, escalating etc. then, with a smart
ticketing strategy in place, you can develop how tickets
are created, escalated, group communications is handled.
This is really tough stuff politically because you need
the Network, Server, Storage, App, DB, facility (security
as well in some companies) groups all agreeing on a
strategy that ensures things like -
- when there is network issue, a ticket is created and
assigned to the network group, but communications are sent
to other groups that a network issue may be impacting
their service, but it is being worked. The common approach
today is a network issue happens, impacts everyone and
everyone jumps on their favorite tools to determine if the
incident is their problem or not.
- if a router fails, but it is a paired router for HA,
then notify the Operations group screen via yellow alert
that a router is down, the service is still up, but there
is something that needs to be addressed. If the same
router is not paired, then the same alert shows up on the
OPS screen as red.
This is all custom coding and custom workflows on each
site. There are also tools that overlap between server,
storage and network management - which ones will be used
for each potential failure scenario? Different App groups
will have different monitoring/custom tools as well.
Again, needs planning and consensus building.
To do this properly, the old saying goes that for every $1
spent on a mgmt. product, there should be $3's spent on
the services (internal or external) to properly integrate
that tool.
And yes, similar to why OpenView, Tivoli etc. are tough to
implement, this was a big reason why EMA did not take off.
Technically EMA is a very sound strategy. HP's recent
OneView architecture is actually not bad (with exception
it emphasizes X86-64 support only)
http://www8.hp.com/h20195/v2/GetDocument.aspx?docname=4AA5
-3811ENW
A pretty good video on why you need an Operations Bridge
can be found here: (HP marketing video, but has good,
solid technical strategy)
https://www.youtube.com/watch?v=ZLwfNtBu1sg