Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Event 2115 Microsoft.SystemCenter.DataWarehouse.CollectEventData

36 views
Skip to first unread message

MedeBay

unread,
Apr 11, 2008, 12:06:01 PM4/11/08
to
SYStem has been running great for many months. All of the sudden 3 days ago
this started:

Event 2115 Microsoft.SystemCenter.DataWarehouse.CollectEventData


SCOM 2007 SP1 RTM. 2 gateways, 2 MS, 1 RMS.

Both MS's are having the same problem which appears to be related to the
data warehouse. Here is the event details. Looks like 18 hours since a
response:

Event Type: Warning
Event Source: HealthService
Event Category: None
Event ID: 2115
Date: 4/11/2008
Time: 8:34:52 AM
User: N/A
Computer: SJD-ENTMO-013
Description:
A Bind Data Source in Management Group mongo has posted items to the
workflow, but has not received a response in 65149 seconds. This indicates a
performance or functional problem with the workflow.
Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectEventData
Instance : SJD-ENTMO-013.corp.ebay.com
Instance Id : {ABDFD60F-AA66-8094-11ED-41A4CC309397}

MedeBay

unread,
Apr 11, 2008, 12:23:01 PM4/11/08
to
I followed these instructions and I will see if it helps. My only concern is
if this actually resolved anything or just covered it up?

It appears we may be hitting cache resolution error we were trying to catch
for a while. This is avout CollectEventData workflow. We had 3 reports of it
happenning before. Erorr is very hard to catch and we're including a fix in
SP2 to avoid it. There are two ways to resolve the problem in the meantime.
Since error happens very rarely, you can just restart Health Service on the
Management Server that is affected. Or you can prevent it from blocking the
workflow by creating overrides in the following way:

1) Launch Console, switch to Authoring space and click "Rules"
2) In the right top hand side of the screen click "Change Scope"
3) Select "Data Warehouse Connection Server" in the list of types,. click
"Ok"
4) Find "Event data collector" rule in the list of rules;
5) Right click "Event data collector" rule, select Overrides/Override the
Rule/For all objects of type...
6) Set Max Execution Attempt Count to 10
7) Set Execution Attempt Timeout INterval Seconds to 6

That way if DW event writer fails to process event batch for ~ a minute, it
will discard the batch. 2115 events related to
Datawarehouse.CollectEventData should go away after you apply these
overrides. BTW, while you're at it you may want to override "Max Batches To
Process Before Maintenance Count" to 50 since it appears you have a
relatively large environment. We think 50 is better default setting then
SP1's 20 in this case and we'll switch default to 50 in SP2.

Hope this helps, and sorry it took me so much time to get to it.

--
Vitaly Filimonov [MSFT]

MedeBay

unread,
Apr 11, 2008, 12:35:01 PM4/11/08
to
OK, So that did not fix it, the time is creaping up again.

After restarting the helath service on both MS's I noticed they both gave
this error:

Event Type: Error
Event Source: Health Service Modules
Event Category: Data Warehouse
Event ID: 31551
Date: 4/11/2008
Time: 9:19:49 AM


User: N/A
Computer: SJD-ENTMO-013
Description:

Failed to store data in the Data Warehouse. The operation will be retried.
Exception 'KeyNotFoundException': The given key was not present in the
dictionary.

One or more workflows were affected by this.

Workflow name: Microsoft.SystemCenter.DataWarehouse.CollectEventData
Instance name: SJD-ENTMO-013.corp.ebay.com
Instance ID: {ABDFD60F-AA66-8094-11ED-41A4CC309397}
Management group: mongo

Any Ideas ?

MedeBay

unread,
Apr 11, 2008, 1:00:01 PM4/11/08
to
So now it seems the original progblem is either resolved or masked by the
overrides. But Now I continue to get 31551 and 31552 errors.

Any known issues with this?

Diego Zamora

unread,
Apr 13, 2008, 11:50:01 AM4/13/08
to
MedeBay,
I'm having almost the same issue with my environment and have a PSS case
open with Microsoft right now, this has been identify as an application bug,
we have look at the SAN (where the DB is located) and have not found
anything, the RMS is "stable" and everything was pointing to a bug, and like
i just said That is Microsoft's final answer on Friday. I will post any news
as i get them next week

Vitaly Filimonov [MSFT]

unread,
Apr 14, 2008, 6:51:25 PM4/14/08
to
The overrides I provided on the other thread resolve this problem. Please
confirm that you have set overrides and that new version of the DefaultUser
MP was downloaded to your management servers. Once overrides take effect
CollectEventData rule stops reporting 2115 events.

Overrides cause the rule to drop data if it cannot write it in several
tries. That makes events flow again. "KeyNotFoundException" problem happens
very rarely and if it does, the events will be dropped. We'll be including a
fix in SP2 to prevent this problem in a different way, so that events
causing it won't be dropped.

--
Vitaly Filimonov [MSFT]
-------------------------------------------
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm
"Diego Zamora" <Diego...@discussions.microsoft.com> wrote in message
news:3635F99B-0D9F-4389...@microsoft.com...

drebdreb

unread,
Sep 8, 2008, 10:43:00 AM9/8/08
to
Hi,

I am having the same issue.
Could you please post the link to the "DefaultUser MP" ?

Thanks

0 new messages