Zenoss 4.2.5 - Connection refused. Check zeneventserver status on Daemons

112 views
Skip to first unread message

Satay Epic

unread,
May 7, 2022, 12:57:30 PM5/7/22
to Zenoss Core
Hello,

I see this error pop up often in the yellow band on the top of the UI.  Sometimes, I get a long trace of python code in the  UI with the last line as 

"ZepConnectionError: Timed out connecting to service.

Have anyone seen this happened and know the cause of the problem?

Thanks

Jane Curry

unread,
May 8, 2022, 8:07:34 AM5/8/22
to Zenoss Core
Is this intermittent and it then goes away? 

Fundamentally it is telling you that the evnts subsystem is either stressed or broken.  Do you actually receive events still?

If you can provide a bit more information, we might be able to help.

Cheers,
Jane

Satay Epic

unread,
May 9, 2022, 11:45:50 AM5/9/22
to Zenoss Core
Yes, it is intermittent but occurs quite frequemtly.  Yes, the events are being received as I see them in the event console.  

In the 'zeneventserver' log, I don't see any errors/warnings other than the usual table adding  and  pruning messages. I suspect likely some trouble with Zope DB ( latency ? )   which is hosted on the same host.

One observation is that the zeneventserver does seems to spike up the system load freqently as well.  

So how do I validate if the Zope DB is heallthy and optimal?

Thanks

Jane Curry

unread,
May 9, 2022, 12:44:48 PM5/9/22
to Zenoss Core
Is this a Zenoss 4.x or a 6.x??  If a 6.x then try giving more resources to zeneventserver (do you have a red spot beside zeneventserver in Control Center?).
If it is Zenoss 4 then you may just be short on resources - what does the Unix top command have to say about CPU / RAM / swap?

Another possibility is that you have bad event transforms that are putting an unacceptable load on the zeneventd daemon and that, in turn, is tressing zeneventserver.  Try using a filter of "transform" for the summary field in the Events Console.

Unless you have lots of event transforms that do lots of checking of attributes of devices / components, then it is less likely to be a problem with the Zope ZODB database.  zep is the events database and it seems more likely that this is the culprit.

Cheers,
Jane

Satay Epic

unread,
May 17, 2022, 9:35:31 PM5/17/22
to Zenoss Core
Seems like "zeneventserver" is consuming memory vss is like 32GB.. The host has 16 GB physical ram.  I'm going through the transofms to ensure they are written properly.  I do see  "Rule" warning -- incorrect value "oid"  for some transforms -  The rule is like  evt.oid.startswith(<OID numbers>)

Going to try the Zenoss Toolbox to see the DBs are okay.

May need to do some tuning as well as ..

Is there a way to run a transform interfactively  via zendmd or directly using a python code ?

Thanks

Jane Curry

unread,
May 18, 2022, 1:31:03 PM5/18/22
to Zenoss Core

Hmm. My zeneventserver is greedy but nothing like that greedy.  However, it is the only part of Zenoss that is written in java rather than python.....

Have you checked for a bad transform in the Event Console?  Set a filter on the summary field of "transform".

Try the toolbox tools but they only check the Zenoss Database (ZODB) not the zep events database and I think your issue is with events - unless you have a transform that is possibly querying the ZODB for attributes of a device and that object is broken??  Just possible but unlikely.

Do you have my "Event Management for Zenoss Core 4" paper - https://www.skills-1st.co.uk/papers/jane/zenoss4-events/  Chapter 8 is on Testing & Debugging and provides a little help with trying to test transforms.  Basically, if you run the zeneventd daemon in debug mode then everything is in the zeneventd.log but it is exceedingly verbose.  I have really only succeeded with  this technique when I get get the system down to only producing the one or two events I am interested in - setting virtually all of your devices temporarily to Decommissioned Production Status can help.

Before you do that, have you checked your rabbit queues?  The events subsystem is a pipeline where the events initially entering the system are in the zep.rawevents queue.  If that has more than zero events then the pipeline is blocked and is often a bad rule or transform.  As the zenoss user, try:

[zenoss@zen42 HttpMonitor]$ rabbitmqctl -p /zenoss list_queues
Listing queues ...
celery    0
zenoss.queues.zep.modelchange    0
zenoss.queues.zep.signal    0
zenoss.queues.zep.migrated.summary    0
zenoss.queues.zep.rawevents    0
zenoss.queues.zep.heartbeats    0
eventForwarder    0
zenoss.queues.zep.zenevents    0
zen42.class.example.org.celeryd.pidbox    0
zenoss.queues.zep.migrated.archive    0
...done.
 
The zenevents queue takes processed events from zeneventd to zeneventserver which then saves events in the zep database, makes them available to the Event Console GUI and uses the signal queue to send events to zenactiond which drives triggers and notifications, so you might also check the log for zenactiond to see if there is anything nasty going on there that is blocking your pipeline.  More info on this architecture in chapter 2 of the paper.

Hope some of that help,
Jane

jstan...@gmail.com

unread,
May 19, 2022, 9:52:55 AM5/19/22
to Zenoss Core
It would be worth finding the log traceback in the event.log. That would help point us in the right direction.

Also, do you happen to see any weird events in the console.. maybe look at any events assigned to localhost, 127.0.0.1 and device name of the RM


Might be worth clearing out the lucene

stop zeneventserver (id suggest stopping zenoss completely)
rm -rf /opt/zenoss/var/zeneventserver/index/
start zeneventserver/zenoss

jstanley

unread,
May 19, 2022, 10:19:16 AM5/19/22
to Zenoss Core

Satay Epic

unread,
May 24, 2022, 11:56:35 PM5/24/22
to Zenoss Core
Thank you both for your replies. It does seem the ZEP is overwhelmed for some reason. There are quite a number of events ( not continuous bombarding though ) coming in but I don't think the events number is high enough to cause the trouble to ZEP as I filtered the events based on the transforms. I just fixed an  event transform of a class having devices receiving maximum traps than any other class. The bad code caused around 80K outstanding critical events. All rabbitmq queues stay between 0 - 300 ( I'v a nagios monitor in place ) After I fixed the bad tranform, I  closed 10% of those outstanding events  and the  the "signal" queue backlog went up ( 55K)  and took like 2:30 hrs to clear it.
I just had to cancel the events close task after 10% to minimize the operational impact.

Based on Jane's comment about the event flow mechanism, I suspect it is the  ZEP DB / Mysql  is not performing well. 

The "event.log" does show error  " Cannot connect to ZEP " occessinally ..  mostly seen when I click on the "Event Classes" in the UI and that's when the system load starts climbing up besides the urlib3 connections warnings. When I move away from the "Event Classes" in the UI, the load drops down to normal.
I don't see anything odd with the events in the console. 

I might try ZEP Index clear option next hope it won't cause any trouble. I'll go through other event transforms as well for sanity check. 

Thanks



jstanley

unread,
May 25, 2022, 11:11:32 AM5/25/22
to Zenoss Core
Transforms are handled by zeneventd and notifications handled by zenactiond. A few things I would look at:

  1. What do your triggers look like? How many do you have and how many events are being processed by them (ie: how many notifications?)
  2. Do you have any transforms that connect to zep? Update an event, check event counts, etc
  3. How many events do you have in the event console? (open/closed/etc)
  4. Do you get the slow down or high load when viewing the event console?


Satay Epic

unread,
May 25, 2022, 10:54:34 PM5/25/22
to Zenoss Core
There are 13 triggers and total 26 notifcations ( command + email) 
No, there is not transform which connects to ZEP
So I see around 500K events Open ( New and Critical )  instead of 80K I mentioned earlier.  All those are for one event class .. tranform I fixed yesterday.  So I've to close those 500K events in phases.
I will disable the notification so there won't be load on the system and email spam to the users but afraid about notification for the incoming events. 
Event Console view seem normal and no system load increase while at it. 

Thanks

Satay Epic

unread,
May 26, 2022, 7:52:26 AM5/26/22
to Zenoss Core
Is it possible to have a threshold for Notifications .. command and email like send it when the event count cross that threshold and send only one email each for a new and cleared event?

Thanks

jstanley

unread,
May 26, 2022, 9:30:06 AM5/26/22
to Zenoss Core
In your trigger, you can set a value on the event count.

For a notification, you can select only send on initial occurrence and disable send clear.

This will only send one notification and no notification for when the event clears (if you want the clear sent, select send clear)

jstanley

unread,
May 26, 2022, 9:30:39 AM5/26/22
to Zenoss Core
Out of curiousity, what event class had all the events and what is the transform?

Satay Epic

unread,
May 31, 2022, 10:43:26 PM5/31/22
to Zenoss Core
Thanks. I'm looking at the trigger options/rules and I think that will help the notifications as well.

The event class is for ASR and the transform is based on the eventClasskey. The transform has arrays of the "OK" , "Critical" and "Warning" events matching the eventClassKeys.  I came to know that the ASR core platform MIB returns "Clear" events  as "Info" events.
Whoever wrote the transform probably wasn't aware of it which caused the huge backlog. One of the devices has over 1M "Info" events.  To avoid the system overhead apparently being caused by the trigger and the notification when
I attempt to close those old events, I'm  going to try closing the events ( all type ) in batches using JSON api while the trigger is disabled.

I have to figure out a way to close the "Info" events after a week . Not sure if it is possible to do that.

jstanley

unread,
Jun 1, 2022, 12:53:26 PM6/1/22
to Zenoss Core
You could modify the transform to make them into Clears (severity = 0) and duplicates then get dropped by the collector. (This is how I would do it)
Or you could turn on event aging for info .. under Advanced > Events

There is also a way to use the mapping zProperty zEventClearClasses, but I have had issues with that method in the past.

Satay Epic

unread,
Dec 14, 2022, 12:06:10 AM12/14/22
to Zenoss Core
 For the very reason, the event ageing was setup Info which explained the "Info" event were stacking up.  Changed it to "Error" and updated the aging / archive "limits" to 100 .. not to overwhelm the ZEP and the DB.

Thanks for your help!



Reply all
Reply to author
Forward
0 new messages