command/module promise laments about return code 1

20 views
Skip to first unread message

Xander Cage

unread,
Mar 14, 2023, 8:45:04 AM3/14/23
to help-cfengine
Hi,

So i have this simple module promise which calls a python script (module). i dont know why, but from one day to another it starts complaining about a not zero return code and
therefor goes to failed state.

the promise:

```
bundle agent cfe_internals_checkup { commands: "$(sys.workdir)/modules/cfe_db_check" args => "", module => "true", contain => command_timeout(30), comment => "run the check meck fuck script", classes => cfe_db_module; reports: #CFE_DB_MODULE_KEPT:: # "Time: $(sys.date) - Bundle: $(this.bundle) - Message: cfe_db_check was running successfully."; CFAGENT_HANGS:: "Time: $(sys.date) - Bundle: $(this.bundle) - Message: Agent hangs, The Purge was initiated."; CFE_DB_ERRORS_FATAL:: "Time: $(sys.date) - Bundle: $(this.bundle) - Message: CFE internal Databases are bonkers, cfengine killed, cleanup triggered."; CFE_DB_IN_DIFFICULTIES:: "Time: $(sys.date) - Bundle: $(this.bundle) - Message: CFE internal Databases are in danger."; #CFE_DB_OK:: # "Time: $(sys.date) - Bundle: $(this.bundle) - Message: CFE internal Databases are good."; } body classes cfe_db_module { kept_returncodes => { "0" }; promise_kept => { "CFE_DB_MODULE_KEPT" }; }
```
outcome:
```
root@aixstp17t1: /var/cfengine/outputs # /var/cfengine/bin/cf-agent -KI -b cfe_internals_checkup
    info: Using command line specified bundlesequence
    info: Executing 'timeout=30s' ... '/var/cfengine/modules/cfe_db_check '
    info: Command related to promiser '/var/cfengine/modules/cfe_db_check' returned code '1' not defined as promise kept, not kept or repaired; setting to failed
    info: Completed execution of '/var/cfengine/modules/cfe_db_check '
```

executing the python module directly in shell clearly returns zero...

```
root@aixstp17t1: /var/cfengine/outputs # /var/cfengine/modules/cfe_db_check;echo $?
+CFE_DB_OK
0
```
including 1 in kept_returncodes will silence this but i want to know the reason for this.

cfengine version is 3.18.2

an idea whats wrong here...

Nick Anderson

unread,
Mar 15, 2023, 2:02:37 PM3/15/23
to Xander Cage, help-c...@googlegroups.com

Xander Cage <christia...@itsv.at> writes:

executing the python module directly in shell clearly returns zero…

``` root@aixstp17t1: /var/cfengine/outputs # /var/cfengine/modules/cfe_db_check;echo $? +CFE_DB_OK 0 ``` including 1 in kept_returncodes will silence this but i want to know the reason for this.

cfengine version is 3.18.2

an idea whats wrong here…

I guess your cfe_db_check script runs cf-check? I suspicion that whatever is happening might be related to running that while agents are in motion vs running directly from the shell the agent is not in motion.

Care to share a gist or something with the source of your cfe_db_check?

I recall, but don't immediately find reference to the specifics that some of the cf-check functionality is automtic by the agent. Like if there is a corrupt lmdb the agent will automatically attempt to export and import the data and failing that just move it out of the way and let it regenerate automatically.

Xander Cage

unread,
Mar 16, 2023, 4:48:21 AM3/16/23
to help-cfengine
Hi Nick,

clairvoyant *g*...yes there is indeed a cf-check call...

Nick Anderson

unread,
Mar 16, 2023, 2:21:58 PM3/16/23
to Xander Cage, help-c...@googlegroups.com

Xander Cage <christia...@itsv.at> writes:

Hi Nick,

clairvoyant g…yes there is indeed a cf-check call…

Epic: bo...@hellskitchen.org

So, it looks like this was perhaps introduced back in 3.12. I would consider disabling it as I believe there has been more robust automatic lmdb issue detection and remediation introduced into the agent.

Again, I haven't searched deeply but as I recall when the agent starts going if it encounters an error with the lmdbs (not all errors, but the one that cf-check repair would fix at least) happen automatically.

Also since that is poking at the lmdbs while the agent is running, if you want to keep it, I would consider moving it's execution outside agent runs to cron or something. Perhaps peek at the watchdog stuff1, looks like it might be a good fit to put into that sort of thing.

Vratislav Podzimek

unread,
Mar 17, 2023, 4:45:27 AM3/17/23
to help-c...@googlegroups.com
On Thu, 2023-03-16 at 13:16 -0500, 'Nick Anderson' via help-cfengine wrote:
> Xander Cage <christia...@itsv.at> writes:
> > Hi Nick,
> >
> > clairvoyant g…yes there is indeed a cf-check call…
> >
> > gist -> https://gist.github.com/flynn1973/b50011df9cbf9b6a0cdb59838a145ad4
> >
> Epic: bo...@hellskitchen.org
> So, it looks like this was perhaps introduced back in 3.12. I would consider disabling it as I
> believe there has been more robust automatic lmdb issue detection and remediation introduced into
> the agent.
> Again, I haven't searched deeply but as I recall when the agent starts going if it encounters an
> error with the lmdbs (not all errors, but the one that cf-check repair would fix at least) happen
> automatically.
Yes, quite a lot has been done regarding LMDB corruption detection and handling. See these PRs for
details:
https://github.com/cfengine/core/pull/3873
https://github.com/cfengine/core/pull/3880


--
Vratislav
signature.asc
Reply all
Reply to author
Forward
0 new messages