Yes, each shadownaemon process requires some memory to run. 50-100mb per process is a good value to calculate with.
The "crash" looks like the autogenerated config is wrong somehow. Each shadownaemon pulls in all hosts and services
from the remote site via livestatus and writes out a dummy naemon object config. It then trys to start with that
objects. Seems like this fails in some cases here.
Could you investigate and verify if those hosts and services or servicegroup mentioned in the error message are
created correctly?
On 01/03/15 14:45, Fabrice Le Dorze wrote:
>
> Hi again
> No problem, I'm trying it on our Lab server. I'm very interested in this fonctionnality as we plan to have more and more backends, some of them with a lot of hosts/services.
> So I will do any test I can do to help debugging.
>
> I suspected a memory shortage.
> I have 9 backends plus the local Naemon. But only 2 shadownaemon start.
> The memory use of one shadownaemon is around 60MB. It remains around 600MB free for the 7 remaining backend, that is 420MB. Am I wrong ?
>
> I just cleanup the shadow_naemon_dir from all backend ID dirs and restarted.
> All of them are properly created.
>
> The 2 working ones have a shadowdaemon.log like that :
> / //[1425215585] livestatus: Naemon Livestatus 1.0.0-naemon Socket: '/var/cache/naemon//adc60/live'//
> //[1425215585] livestatus: Finished initialization. Further log messages go to /var/cache/naemon/adc60/tmp/livestatus.log//
> //[1425215585] Event broker module '/usr/lib/naemon/naemon-livestatus/livestatus.so' initialized successfully.//
> //[1425215586] TIMEPERIOD TRANSITION: 24x7;-1;0//
> //[1425215586] started caching
91.151.62.65:822 to /var/cache/naemon//adc60/live/
>
> The non working ones have a shadowdaemon.log like that :
> /[1425215585] query failed: 400//
> //query://
> //---//
> //GET services//
> //---//
> //[1425215585] Error: Could not find a service matching host name 'ADE_01UTA02' and description 'WAN' (config file '/var/cache/naemon/1c326/tmp/objects.cfg', starting on line 473)//
> //[1425215585] Error: Could not expand members specified in servicegroup 'ADE_WAN' (config file '/var/cache/naemon/1c326/tmp/objects.cfg', starting at line 473)//
> //[1425215591] query failed: 400//
> //query://
> //---/
>
> even if the configuration files in /var/cache/naemon/tmp seem to be OK.
>
> The debug Thruk log shows the regular start retries of failed shadow naemons :
> /
> //[2015/03/01 14:21:59][s-hypervision0][ERROR][Thruk.Utils.Livecache] shadownaemon Veon1 RMS for peer d8487 (
172.27.0.91:822) crashed, restarting...//
> //[2015/03/01 14:21:59][s-hypervision0][DEBUG][Thruk.Utils.Livecache] /usr/bin/shadownaemon -d -i
172.27.0.91:822 -o /var/cache/naemon//d8487 -l /usr/lib/naemon/naemon-livestatus/livestatus.so >> /var/cache/naemon//d8487/tmp/shadownaemon.log 2>&1/
>
> I tried it manually :
> / /usr/bin/shadownaemon -v -d -i
172.27.0.91:822 -o /var/cache/naemon//d8487 -l /usr/lib/naemon/naemon-livestatus/
livestatus.so/