So when watching another view, like the default state view with all computers
they are grey, but when looking at the state of the agent they are grey?
If you go to one of your agents, and look in the ops mgr event viewer log,
is there anything? Like communication problems with the management server?
Have you placed the RMS into maintenace mode?
Anders Bengtsson
Microsoft MVP - System Center Operations Manager
www.contoso.se
> Please could some assist me in identifying my problem with the Health
> Service Watcher. All servers as grey and critical, but the Agent
> State show all servers as green and healthy. What troubleshooting
> steps would anyone advise?
>
> Any advise would be much appreciated.
>
> Thanks
> Paul Kirkup
Thanks for the quick reply, I have confirmed the rms is not in maintenance
mode. Here is some more details on our current situation...
We have a simple opsmgr setup with one domain, one rms server and one sql
server. One monitoring database with a 14day grooming interval, no gateway
or warehouse required. We monitor approx 30 windows 2003 servers and have
been running fine for approx 3 months with the following management packs...
DNS
DHCP
AD
WSUS
SQL 2005
SCCM.
I have returned from holidays to find all servers in grey on the Monitoring
> Computers state - this was resolved by removing duplicate spn's from AD.
But opsmgr is not generating any alert due to the error below (I think).
The root management server (HealthService) is running but has reported
limited functionality soon after 10/03/2009 10:53:52. The specific reason
code is 49 and description is " The health manager has detected that entity
state collection has stalled. ".
On one of the clients the following events are logged...
The OpsMgr Connector connected to <rms>.<domain>, but the connection was
closed immediately after authentication occured. The most likely cause of
this error is that the agent is not authorized to communicate with the
server, or the server has not received configuration. Check the event log on
the server for the presence of 20000 events, indicating that agents which are
not approved are attempting to connect.
Followed by...
OpsMgr has returned to communicating with it's primary host <rms>.<domain>
And...
OpsMgr has received new configuration for management group OpsMgrCJP1 from
the Configuration Service. The new state cookie is "A4 1C 52 D8 8B 74 BD 82
78 56 7A 07 67 CE BF 81 70 F6 D5 8B "
Approx every hour clients log..
The Health Service has deleted one or more items for management group
"OpsMgrCJP1" which could not be sent in 1440 minutes.
I'm quite lost here as to the cause. It read like the client lost
connection, then recovered it, downloaded the latest configuration but still
cannot connect?
Many thanks
Paul Kirkup
--
-- Marius Sutara [MSFT] (Developer)
-- http://blogs.msdn.com/MariusSutara
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of attachments are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm
"Paul Kirkup" <PaulK...@discussions.microsoft.com> wrote in message
news:682E379D-9E50-4FB4...@microsoft.com...
I have restarted the rms server and this has not changed the situation. I
also tried to clear out the health service store, but again this had no
effect.
"Marius Sutara [MSFT]" wrote:
> runtime on your RMS is not seeing ack delivered for entity state change it
> posted. When you restart, runtime will try to post previous data again and
> may be able to clear this issue. Let me know if still an issue after that
> ....
--
-- Marius Sutara [MSFT] (Developer)
-- http://blogs.msdn.com/MariusSutara
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of attachments are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm
"Paul Kirkup" <PaulK...@discussions.microsoft.com> wrote in message
news:04B2325A-1090-4AEB...@microsoft.com...
Sorry for the delay in response - I've been away on other business. I have
looked at the database and I have to admit I'm not sure what I'm looking for.
I have opened an alert within ops console and changed the owner and alert
status - both of those fields are shown correctly when viewing the database
alert table. Is there another table I should be looking at or does this
answer your question?
As we don't use scom for historical data, its prime usage is live monitoring
only at this point in time, is it worth removing scom from the domain and
starting again with a new installation? I have all the management packs and
customizations exported.
Many thanks
Paul Kirkup