Hi,
we are working quiet some time with cfengine and everything is working ok.
But there is one issue from time to time that makes me pull my hair out. It mostly happens when I start playing around with the removal of cf-keys with the command cf-keys.
I cannot figure out how the authentication works when you start bootstrapping a client to the policy hub. I cannot connect the dots between the key files in /var/cfengine/ppkeys (localhost and rootMD5 files on both server and client), lastseen database, ...
The error I receive is:
root@laptop:~# cf-agent --bootstrap cfengine0102
notice: Bootstrap mode: implicitly trusting server, use --trust-server=no if server trust is already established
error: Failed to establish TLS connection: underlying network error (Connection reset by peer)
error: No suitable server found
error: Failed to establish TLS connection: underlying network error (Connection reset by peer)
error: No suitable server found
R: This autonomous node assumes the role of voluntary client
R: Failed to copy policy from policy server at 10.1.16.45:/var/cfengine/masterfiles
Please check
* cf-serverd is running on 10.1.16.45
* CFEngine version on the policy hub is 3.6.0 or latest - otherwise you need to tweak the protocol_version setting
* network connectivity to 10.1.16.45 on port 5308
* masterfiles 'body server control' - in particular allowconnects, trustkeysfrom and skipverify
* masterfiles 'bundle server' -> access: -> masterfiles -> admit/deny
It is often useful to restart cf-serverd in verbose mode (cf-serverd -v) on 10.1.16.45 to diagnose connection issues.
When updating masterfiles, wait (usually 5 minutes) for files to propagate to inputs on 10.1.16.45 before retrying.
R: Did not start the scheduler
Running on the server side cf-serverd -d or cf-serverd -v only reveils only the following output:
Nov 18 11:52:44 cfengine0102 cf-serverd[7752]: CFEngine(server) Remote host '10.3.130.107' not in allowconnects, denying connection
Which makes no sense since we allow the entire 10.x.x.x IP range in the allowconnects (both client and server are in the same network), and in the trustkeysfrom slist I even allow all IP addresses. But it looks like the cf-serverd process does not pick up this configuration. Which is even weirder because because it previously worked (before fiddling with the cf-key command).
Def.cf:
"acl" slist => {
# Allow everything in my own domain.
# Note that this:
# 1. requires def.domain to be correctly set
# 2. will cause a DNS lookup for every access
# ".*$(def.domain)",
# Assume /16 LAN clients to start with
"$(sys.policy_hub)/8",
# "2001:700:700:3.*",
# "217.77.34.18",
# "217.77.34.19",
},
...
"trustkeysfrom" slist => {
# COMMENT THE NEXT LINE OUT AFTER ALL MACHINES HAVE BEEN BOOTSTRAPPED.
"
0.0.0.0/0", # allow any IP
},
As far as I can understand what is happening, is that the key trust between policy hub and client does not work anymore. It looks like there is still a reference to this particular host. Removing cf_lastseen.lmdb, all keys in /var/cfengine/ppkeys, regenerate keys, rebootstrap policy server to itself (which works) restarting services, there is still a reference of this client host.
I have no clue how to "reset" the policy server regarding trust keys for a host or even multiple hosts. For example: do I need to remove the lasseen database, generated keys, all root-MD5 keys, ... stop services, rebootstrap policy server to itself, ...?
Or if someone can explain in simple words, what happens regarding the trusting of keys during bootstrap?
Thanks,
Tom