Proposal: abandon current cf_promises_validated mechanism

152 views
Skip to first unread message

Neil Watson

unread,
Jun 26, 2013, 5:12:36 PM6/26/13
to help-cfengine
I propose the current cf_promises_validated mechanism be abandoned. It's
undocumented and unreliable. The desired functionality can be
accomplished using native Cfengine policy rather than compiled voodoo.

Relevant Bug:
https://cfengine.com/dev/issues/1541

Proposed solution:
https://gist.github.com/neilhwatson/5871500

--
Neil Watson
Linux/UNIX Consultant
http://watson-wilson.ca

Nicolas Charles

unread,
Jun 26, 2013, 6:15:33 PM6/26/13
to help-c...@googlegroups.com
Le 26/06/2013 23:12, Neil Watson a �crit :
> I propose the current cf_promises_validated mechanism be abandoned. It's
> undocumented and unreliable. The desired functionality can be
> accomplished using native Cfengine policy rather than compiled voodoo.
>
> Relevant Bug:
> https://cfengine.com/dev/issues/1541
>
> Proposed solution:
> https://gist.github.com/neilhwatson/5871500
>
I agree, using promises rather than hardcoded feature makes the
behaviour much more adaptable to custom needs.
We ended up using this kind of approach, as we don't put everything in
masterfiles, and so couldn't rely on the cf_promises_validated to
determine when to copy.

Nicolas

Ted Zlatanov

unread,
Jun 26, 2013, 7:06:17 PM6/26/13
to Nicolas Charles, help-c...@googlegroups.com
On Thu, 27 Jun 2013 00:15:33 +0200 Nicolas Charles <nicolas...@normation.com> wrote:

NC> Le 26/06/2013 23:12, Neil Watson a écrit :
>> I propose the current cf_promises_validated mechanism be abandoned. It's
>> undocumented and unreliable. The desired functionality can be
>> accomplished using native Cfengine policy rather than compiled voodoo.

There may be some edge cases that require compiled code, but I agree
that it would be nice to put it all in policy.

>> Relevant Bug:
>> https://cfengine.com/dev/issues/1541

This is a documentation bug. I think Neil's proposal merits a new
ticket, "related" to 1541.

>> Proposed solution:
>> https://gist.github.com/neilhwatson/5871500
>>
NC> I agree, using promises rather than hardcoded feature makes the
NC> behaviour much more adaptable to custom needs.
NC> We ended up using this kind of approach, as we don't put everything in
NC> masterfiles, and so couldn't rely on the cf_promises_validated to
NC> determine when to copy.

To be clear, Neil is proposing to do the promises validation in policy,
not to abandon the mechanism. From the client side things would look
the same IIUC. So in fact by changing update.cf or failsafe.cf you
could implement Neil's proposal against a new validation marker file,
however it's generated on the server.

I think that's the safest route for now. We should have some proof of
resiliency and scalability before we propose such a mechanism as the
default because it would affect so many users. Nicolas is not a good
use case because he's (I'm guessing) talking about Rudder, which has a
completely different update model and doesn't use masterfiles.

Neil, can you organize a group to try your proposal and report on how
well it works after a few weeks? That would be really helpful.

Thanks!
Ted

Bas van der Vlies

unread,
Jun 27, 2013, 9:50:42 AM6/27/13
to Nicolas Charles, <help-cfengine@googlegroups.com>

On 27 jun. 2013, at 00:15, Nicolas Charles <nicolas...@normation.com> wrote:
+1 Now we can just have the default behavior as cfengine syntax and if needed we can adjust it to your own needs ;-)



---
SURFsara has a new telephone number: +31 20 800 1300.

Bas van der Vlies
| Operations, Support & Development | SURFsara | Science Park 140 | 1098 XG Amsterdam
| T +31 (0) 20 800 1300 | bas.van...@surfsara.nl | www.surfsara.nl |




Nicolas Charles

unread,
Jun 28, 2013, 8:52:12 AM6/28/13
to Ted Zlatanov, help-c...@googlegroups.com
On 27/06/2013 01:06, Ted Zlatanov wrote:
> On Thu, 27 Jun 2013 00:15:33 +0200 Nicolas Charles<nicolas...@normation.com> wrote:
>
> NC> Le 26/06/2013 23:12, Neil Watson a �crit :
Actually, I am not only talking about Rudder; it's a need I've seen on a
lot of occasions; especially when promises are not updated, but
templates or "references" files are. These are usually not in the
masterfiles folder, but in a "control", "template" directory, or
whatever sounds logical at this time; and so there is the need to build
a promise to check that they are updated to distribute them.
Having a standart promise for updating promises and dependencies would
be awesome, and it would allow everyone to reuse the same skeleton,
rather than rebuilding everything and having the same pitfall (like
checking hashes on thousands files)

Nicolas

Nicolas

Ted Zlatanov

unread,
Jun 28, 2013, 9:58:06 AM6/28/13
to Nicolas Charles, help-c...@googlegroups.com
On Fri, 28 Jun 2013 14:52:12 +0200 Nicolas Charles <nicolas...@normation.com> wrote:

NC> Actually, I am not only talking about Rudder; it's a need I've seen on
NC> a lot of occasions; especially when promises are not updated, but
NC> templates or "references" files are. These are usually not in the
NC> masterfiles folder, but in a "control", "template" directory, or
NC> whatever sounds logical at this time; and so there is the need to
NC> build a promise to check that they are updated to distribute them.

Right. I've seen this need as well.

NC> Having a standart promise for updating promises and dependencies would
NC> be awesome, and it would allow everyone to reuse the same skeleton,
NC> rather than rebuilding everything and having the same pitfall (like
NC> checking hashes on thousands files)

I agree. A VCS commit ID would be perfect as the way to identify a
"release" which is what I think we're talking about. Then the problem
is reduced to capturing the unique VCS release ID in
cf_promises_validated.

That mechanism doesn't cover live editing of mastefiles, but I feel
strongly that we should not explicitly support that use case, and that
outside of CFEngine training, it should be actively discouraged.

Ted

Loïc Pefferkorn

unread,
Oct 4, 2013, 2:58:12 PM10/4/13
to help-c...@googlegroups.com
Le 27/06/2013 15:50, Bas van der Vlies a �crit :
>
> On 27 jun. 2013, at 00:15, Nicolas Charles <nicolas...@normation.com> wrote:
>
>> Le 26/06/2013 23:12, Neil Watson a �crit :
>>> I propose the current cf_promises_validated mechanism be abandoned. It's
>>> undocumented and unreliable. The desired functionality can be
>>> accomplished using native Cfengine policy rather than compiled voodoo.
>>>
>>> Relevant Bug:
>>> https://cfengine.com/dev/issues/1541
>>>
>>> Proposed solution:
>>> https://gist.github.com/neilhwatson/5871500
>>>
>> I agree, using promises rather than hardcoded feature makes the behaviour much more adaptable to custom needs.
>> We ended up using this kind of approach, as we don't put everything in masterfiles, and so couldn't rely on the cf_promises_validated to determine when to copy.
>>
>> Nicolas
>>
>
> +1 Now we can just have the default behavior as cfengine syntax and if needed we can adjust it to your own needs ;-)
>
>
> Bas van der Vlies

Hello,

Any update on this ?

Moreover, a visible mechanism through policies will also help newcomers
to CFEngine to understand the whole synchronization process.

--
Loic

Neil Watson

unread,
Oct 4, 2013, 3:02:44 PM10/4/13
to help-c...@googlegroups.com
On Fri, Oct 04, 2013 at 08:58:12PM +0200, Lo�c Pefferkorn wrote:
>Any update on this ?

Loic,

I haven't gotten any farther with it. Since I don't use
cf_promises_validated, it is not high on my priorities.

--
Neil H Watson
http://evolvethinking.com/evolve-thinkings-free-cfengine-library/
Hardening with Cfengine http://evolvethinking.com/products/
VIM and Cfengine https://github.com/neilhwatson/vim_cf3

Eystein Måløy Stenberg

unread,
Oct 4, 2013, 3:20:26 PM10/4/13
to help-c...@googlegroups.com
Hi,

Isn't the main problem that cf_promises_validated is integrated into the
base masterfiles? I can see some uses for it as you will get a variable
on when the policy was last updated locally on each server (use in motd,
etc.).

If it is not in the main policy, then users can choose to optimize with
it and leverage it in policy, but won't be surprised by it.

Does that make sense?
Eystein

Neil Watson

unread,
Oct 4, 2013, 3:32:03 PM10/4/13
to help-c...@googlegroups.com
On Fri, Oct 04, 2013 at 12:20:26PM -0700, Eystein M�l�y Stenberg wrote:
>Isn't the main problem that cf_promises_validated is integrated into
>the base masterfiles? I can see some uses for it as you will get a

I think the main problem is that cpv behaves unexpectedly due to bugs or
lack of understanding. How and when the file is generated, and how it is
used, is not documented in any useful detail.

If the hard coding is removed, and cf-agent policy is instead used to
generate and check cpv, most of the bug problems will probably go away.
Further, we'll better understand how cpv actually works. Currently it's
black magic.

Mike Svoboda

unread,
Oct 4, 2013, 3:54:52 PM10/4/13
to Neil Watson, help-c...@googlegroups.com
We drop files into /var/cfengine/masterfiles from "external sources" via
our policy servers executing rsync commands to pull data from other
places. Clients just perform regular cfengine file transfers and have no
idea what files are / are not SVN controlled.

Because of this, we never used cf_promises_validated because data could be
updating under masterfiles that isn't a cfengine policy update / SVN
action.


Also, we always want to run in "worst state." The beautiful thing about
using select_class over 4 policy servers, is this happens:

Schedule Min00, Min30
Splaytime 25m

(Approximated)
Min00-08 <--- machines hit MPS1
Min08-15 <--- machines hit MPS2
Min15-20 <--- machines hit MPS3
Min20-25 <--- machines hit MPS4


So, at any given time, really only one MPS is being hit for Cfengine
traffic -- but then it has the majority of the schedule to allow
cf-serverd to "recover" if it got super busy. I think this was one of
the major "wins" for using cf_promises_validated. It gave cf-serverd a
chance to recover if it ever got super busy. Just by adding more policy
servers, we're able to give cf-serverd some breathing room, but always do
a MD5 comparison against every file under masterfiles.

If you want to see how we perform Cfengine network transfers using
select_class, jump 22:30 to here:
http://youtu.be/zYSLBbFWlT8

Thanks
Mike







On 10/4/13 3:32 PM, "Neil Watson" <cfen...@watson-wilson.ca> wrote:
>--
>You received this message because you are subscribed to the Google Groups
>"help-cfengine" group.
>To unsubscribe from this group and stop receiving emails from it, send an
>email to help-cfengin...@googlegroups.com.
>To post to this group, send email to help-c...@googlegroups.com.
>Visit this group at http://groups.google.com/group/help-cfengine.
>For more options, visit https://groups.google.com/groups/opt_out.

Erlend Leganger

unread,
Oct 5, 2013, 1:02:06 PM10/5/13
to help-cfengine

On 26 June 2013 23:12, Neil Watson <cfen...@watson-wilson.ca> wrote:
I propose the current cf_promises_validated mechanism be abandoned. It's
undocumented and unreliable. The desired functionality can be
accomplished using native Cfengine policy rather than compiled voodoo.

What is the problem with cfv, in a few words? I don't deal with this file other than deleting it when I want to force an update of a client's policy.

Neil Watson

unread,
Oct 5, 2013, 1:33:30 PM10/5/13
to help-cfengine
On Sat, Oct 05, 2013 at 07:02:06PM +0200, Erlend Leganger wrote:
More than a few have problems with cpv refreshing constantly even when
there has been no change. This is extra difficult to troubleshoot
because the cpv is hard coded rather than managed by CFEngine policy.

David Lee

unread,
Oct 7, 2013, 5:04:49 AM10/7/13
to help-c...@googlegroups.com, cfen...@watson-wilson.ca
As a thought-exercise, let's turn this the other way up.

Suppose, for a moment, that nothing like "cf_promises_validated" yet existed in CFEngine.

Suppose it were our task to invent it, ex nihilo, from a clean sheet.

What, from our perspective, would its purpose be?   Decide.  Then, before doing anything else, providing supporting documentation for our proposal.

What might other people in other places want from it?  How might  we generalise it to include such potential use?  Then, likewise,document it.

Where should its implementation be?  Should it be within the C codebase or as cfengine policy?

Now...

...back to life, back to reality.

From my run-ins with cpv, it seems to have fallen short in these regards.  Authoritative documentation seems non-existent.

Its implementation goes against CFEngine's own design philosophy.  CFEngine wants us to put all system config under its control, to be specified as policy.   CFEngine even asks us to consider migrating stuff that would normally have gone into cron, to be specified as policy.   All this aims to provide the end-user with great flexibility and control.  All good.

Yet this "policy" of cpv seems to break CFE's own rules and instead go into the core code, lacking all flexibility.

If, in our thought-exercise of absent cpv, a user were to propose the current implementation and documentation, would it be accepted?

-- David Lee

Mark Burgess

unread,
Oct 7, 2013, 5:16:14 AM10/7/13
to help-c...@googlegroups.com

Hi David,

the original idea behind the promises_validated marker was to avoid the need to perform a lengthy server-intensive search for changed
policy files in a large policy file tree.

Suppose you are an organization with thousands of hosts, and possibly hundreds of policy files. If you checked every file for every client on the server every five minutes, that would be computationally very time consuming, as each check would require a contentious server-side search. The result is a scaling  bottleneck.

The idea of the validation file was as a server-side certification that there was something worth searching for. By having a single file with known location and name, the search is reduced to a trivial time-stamp "stat" which is hiundreds of times cheaper. That scales easily to thousands of hosts every five mins. Only if the validation "certificate" was changed would the agent bother to perform an update. This can only work if a certain discipline is maintained of course.

  https://cfengine.com/archive/manuals/st-scale#Scalable-policy-strategy

This mechanism is what allows CFEngine to roll out changes in under five minues on average in a massive environment without trying to "push". Does this make sense?

In the future, this file could actually contain a list of files that differ. Somehow, this never moved forward,

M
--
You received this message because you are subscribed to the Google Groups "help-cfengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to help-cfengin...@googlegroups.com.
To post to this group, send email to help-c...@googlegroups.com.
Visit this group at http://groups.google.com/group/help-cfengine.
For more options, visit https://groups.google.com/groups/opt_out.

Neil Watson

unread,
Oct 7, 2013, 9:23:32 AM10/7/13
to help-c...@googlegroups.com
On Fri, Oct 04, 2013 at 08:58:12PM +0200, Lo?c Pefferkorn wrote:
>Any update on this ?
>
>Moreover, a visible mechanism through policies will also help
>newcomers to CFEngine to understand the whole synchronization
>process.

Loic,

I'm came up with this. It is not well testing.

body common control
{
bundlesequence => {
"main",
};
}

bundle agent main
{
vars:
"inputs" slist => { ".*?\.txt", ".*?\.cf" };

files:
"${sys.workdir}/inputs"
file_select => by_name( "@{inputs}" ),
changes => detect_all_change,
classes => if_repaired( "validate_inputs" ),
depth_search => recurse( "inf" );

"${sys.workdir}/inputs/cpv"
edit_defaults => empty,
create => 'true',
depends_on => { "main_commands_cf_promises" },
edit_line => append_if_no_line( "${sys.date}" );

commands:
validate_inputs::
"${sys.cf_promises} -c"
handle => "main_commands_cf_promises";
}

bundle edit_line append_if_no_line(str)
{
insert_lines:

"$(str)"

comment => "Append a line to the file if it doesn't already exist";
}

body edit_defaults empty
{
empty_file_before_editing => "true";
edit_backup => "false";
}

body file_select by_name(names)
{
leaf_name => { @(names)};

newatson@ltipc682:~/.cfagent/inputs$ cf-agent -IKf ./cpv.cf
2013-10-07T09:04:07-0400 error: Hash 'md5' for '/home/newatson/.cfagent/inputs/cpv.cf' changed!
2013-10-07T09:04:07-0400 error: /main/files/'${sys.workdir}/inputs': Updating hash for '/home/newatson/.cfagent/inputs/cpv.cf' to 'MD5=9b 5650c58da8bbe9d8041083a3544c8'
2013-10-07T09:04:07-0400 error: Hash 'sha1' for '/home/newatson/.cfagent/inputs/cpv.cf' changed!
2013-10-07T09:04:07-0400 error: /main/files/'${sys.workdir}/inputs': Updating hash for '/home/newatson/.cfagent/inputs/cpv.cf' to 'SHA=99 1f3c86756c008f0854daf84fbc7697b7214b4'
2013-10-07T09:04:07-0400 info: Executing 'no timeout' ... '"/home/newatson/.cfagent/bin/cf-promises" -c'
2013-10-07T09:04:11-0400 info: Completed execution of '"/home/newatson/.cfagent/bin/cf-promises" -c'
2013-10-07T09:04:11-0400 info: /main/files/'${sys.workdir}/inputs/cpv': Edit file '/home/newatson/.cfagent/inputs/cpv'

newatson@ltipc682:~/.cfagent/inputs$ cf-agent -IKf ./cpv.cf

newatson@ltipc682:~/.cfagent/inputs$ cat cpv
Mon Oct 7 09:04:06 2013

Ted Zlatanov

unread,
Oct 7, 2013, 4:29:12 PM10/7/13
to help-c...@googlegroups.com
On Mon, 7 Oct 2013 02:04:49 -0700 (PDT) David Lee <davi...@ecmwf.int> wrote:

DL> As a thought-exercise, let's turn this the other way up.
DL> Suppose, for a moment, that nothing like "cf_promises_validated" yet
DL> existed in CFEngine.

DL> Suppose it were our task to invent it, ex nihilo, from a clean sheet.

I'd make it something entirely contained in cf-serverd.

That daemon is constantly running so it could tell (using
fam/inotify/whatever if the platform supports it) if files and
directories have changed without scanning the whole tree; that makes a
big difference in some situations.

cf-serverd also could run `cf-promises' to generate the validation
knowledge, and could respond to SIGHUP or whatever to recheck the
policies. So cf_promises_validated would be just a string you read from
the server.

Another orthogonal direction is that the server could broadcast "hello,
the latest checksum/commit ID/timestamp is X" on a well-known channel
(perhaps through Avahi). Then the agents can choose to update if they
don't match X.

Ted

Brian Bennett

unread,
Oct 7, 2013, 4:47:46 PM10/7/13
to help-c...@googlegroups.com, help-c...@googlegroups.com
I like it being in cf-serverd.

I don't like the broadcast (multicast), especially via avahi. I have systems that are not on the same subnet as the hub so this wouldn't work at all for me.

--
Brian

Ted Zlatanov

unread,
Oct 7, 2013, 5:00:25 PM10/7/13
to help-c...@googlegroups.com
On Mon, 7 Oct 2013 13:47:46 -0700 Brian Bennett <bah...@digitalelf.net> wrote:

BB> I like it being in cf-serverd.
BB> I don't like the broadcast (multicast), especially via avahi. I have systems that are not on the same subnet as the hub so this wouldn't work at all for me.

It's all voluntary; the broadcast avoids having thousands of clients
hitting the server.

Multicast can be efficiently retransmitted across subnets; most of
today's routers support the relevant protocols. But I certainly agree
it should not be the only update mechanism.

Ted

Mike Svoboda

unread,
Oct 7, 2013, 5:41:43 PM10/7/13
to help-c...@googlegroups.com
Why not just offer a single object via cf-serverd that contains:

Absolute path : md5sum

Of every file under /var/cfengine/masterfiles? There's some thread within
cf-serverd that constantly performs a recursive search under
/var/cfengine/masterfiles and updates this map (configurable).

The server and client transfer this map (single file/object transfer).
The client looks at the md5sum of the map, compares it to the md5sum of
the file on local disk, and determines which files it needs to update.
The client performs a recursive search to figure out what it has a md5sum
for on disk, on every transfer.

The client and server then transfer a minimal amount of files -- those
objects containing a different md5sums from what cf-serverd had in the map
compared to what the client had on disk.

How is this different than what happens now? The server and client don¹t
perform a md5 comparison of every object on every client connection. The
server just has to keep an updated map and offer this single file out to
clients. The server only performs file transfers for a minimal amount of
objects that the client requests when the client detects a md5sum that¹s
different from the map that the server offered.

Neil Watson

unread,
Oct 7, 2013, 6:14:33 PM10/7/13
to help-c...@googlegroups.com
On Mon, Oct 07, 2013 at 09:41:43PM +0000, Mike Svoboda wrote:
>Why not just offer a single object via cf-serverd that contains:
>
>Absolute path : md5sum
>
>Of every file under /var/cfengine/masterfiles? There's some thread within
>cf-serverd that constantly performs a recursive search under
>/var/cfengine/masterfiles and updates this map (configurable).

/var/cfengine/masterfiles should not be hard coded. Often users serve
inputs from multiple masterfiles. Single server may serve out multiple
git branches of masterfiles. e.g.
/var/cfengine/repositories/dev
/var/cfengine/repositories/qa
/var/cfengine/repositories/prod

Ideally you want to be able validate any all directories.

Ted Zlatanov

unread,
Oct 8, 2013, 10:05:43 AM10/8/13
to help-c...@googlegroups.com
On Mon, 7 Oct 2013 21:41:43 +0000 Mike Svoboda <msvo...@linkedin.com> wrote:

MS> Why not just offer a single object via cf-serverd that contains:
MS> Absolute path : md5sum

MS> Of every file under /var/cfengine/masterfiles? There's some thread within
MS> cf-serverd that constantly performs a recursive search under
MS> /var/cfengine/masterfiles and updates this map (configurable).

MS> The server and client transfer this map (single file/object transfer).
MS> The client looks at the md5sum of the map, compares it to the md5sum of
MS> the file on local disk, and determines which files it needs to update.
MS> The client performs a recursive search to figure out what it has a md5sum
MS> for on disk, on every transfer.

MS> The client and server then transfer a minimal amount of files -- those
MS> objects containing a different md5sums from what cf-serverd had in the map
MS> compared to what the client had on disk.

MS> How is this different than what happens now? The server and client don�t
MS> perform a md5 comparison of every object on every client connection. The
MS> server just has to keep an updated map and offer this single file out to
MS> clients. The server only performs file transfers for a minimal amount of
MS> objects that the client requests when the client detects a md5sum that�s
MS> different from the map that the server offered.

I don't see a reason to reinvent rsync, which already has very efficient
differential transfers and checksumming (including --size-only to
explicitly turn off checksums).

My idea has a single in-memory variable, so the server can respond
quickly to clients. That's consistent with the current
cf_promises_validated mechanism, which requires no checksums on the
client side to find out if an update needs to be pulled.

The goal IMO should be to accomplish the client-server update check in
one TCP packet, ideally under 700 bytes or so. It's even better if it's
done over UDP (multicast ideally), but as soon as you start growing that
initial update check scope you're moving towards the rsync model, which
doesn't scale linearly.

Ted

Ted Zlatanov

unread,
Oct 8, 2013, 10:11:11 AM10/8/13
to help-c...@googlegroups.com
On Mon, 7 Oct 2013 18:14:33 -0400 Neil Watson <cfen...@watson-wilson.ca> wrote:

NW> /var/cfengine/masterfiles should not be hard coded. Often users serve
NW> inputs from multiple masterfiles. Single server may serve out multiple
NW> git branches of masterfiles. e.g.
NW> /var/cfengine/repositories/dev
NW> /var/cfengine/repositories/qa
NW> /var/cfengine/repositories/prod

NW> Ideally you want to be able validate any all directories.

We're moving towards `$(sys.masterdir)' instead of a hard-coded location
in 3.6 but ultimately this has to be specified by the user in `update.cf'.

Ted

Mike Svoboda

unread,
Oct 8, 2013, 10:14:51 AM10/8/13
to help-c...@googlegroups.com

So, instead of cf_promises_validated which is file based, you want
something memory based? What problem does this solve that
cf_promises_validated doesn't already provide?

With cf_promises_validated and how its currently designed, you assume that
this file gets updated when cf-promises detects policy updates. This
isn't the case. We have several examples where our policy servers grab
data from external sources via WGET or RSYNC, drop it into
/var/cfengine/masterfiles, and assume clients are going to pick up that
data as soon as its available. Cf-promises isn't going to detect that
something has changed.

Also, once your in-memory variable is multicasted out or whatever -- then
clients have to perform a full recursive MD5 comparison with cf-serverd?
Yes, maybe offering a small payload to tell clients "Hey! I have new
things!" is more efficient, but then clients have to perform a full
recursive MD5 sum crawl with cf-serverd!


If you move towards the model of maintaining a map of md5 objects, clients
can selectively say -- cf-serverd, I'm grabbing file object 12345. That¹s
all I need. I don¹t need to have you perform a md5sum comparison of the
other 10,000 objects under $(masterfiles).

Yes, cf-serverd has to offer the md5sum map out to every client at request
time -- its actually far less network overhead when an update needs to
happen. It just has to offer one object, it doesn't have to satisfy a
full filesystem scan and the network overheads associated with that.





On 10/8/13 10:05 AM, "Ted Zlatanov" <t...@lifelogs.com> wrote:

>On Mon, 7 Oct 2013 21:41:43 +0000 Mike Svoboda <msvo...@linkedin.com>
>wrote:
>
>MS> Why not just offer a single object via cf-serverd that contains:
>MS> Absolute path : md5sum
>
>MS> Of every file under /var/cfengine/masterfiles? There's some thread
>within
>MS> cf-serverd that constantly performs a recursive search under
>MS> /var/cfengine/masterfiles and updates this map (configurable).
>
>MS> The server and client transfer this map (single file/object transfer).
>MS> The client looks at the md5sum of the map, compares it to the md5sum
>of
>MS> the file on local disk, and determines which files it needs to update.
>MS> The client performs a recursive search to figure out what it has a
>md5sum
>MS> for on disk, on every transfer.
>
>MS> The client and server then transfer a minimal amount of files -- those
>MS> objects containing a different md5sums from what cf-serverd had in
>the map
>MS> compared to what the client had on disk.
>
>MS> How is this different than what happens now? The server and client
>donžt
>MS> perform a md5 comparison of every object on every client connection.
>The
>MS> server just has to keep an updated map and offer this single file out
>to
>MS> clients. The server only performs file transfers for a minimal
>amount of
>MS> objects that the client requests when the client detects a md5sum
>thatžs
>MS> different from the map that the server offered.
>
>I don't see a reason to reinvent rsync, which already has very efficient
>differential transfers and checksumming (including --size-only to
>explicitly turn off checksums).
>
>My idea has a single in-memory variable, so the server can respond
>quickly to clients. That's consistent with the current
>cf_promises_validated mechanism, which requires no checksums on the
>client side to find out if an update needs to be pulled.
>
>The goal IMO should be to accomplish the client-server update check in
>one TCP packet, ideally under 700 bytes or so. It's even better if it's
>done over UDP (multicast ideally), but as soon as you start growing that
>initial update check scope you're moving towards the rsync model, which
>doesn't scale linearly.
>
>Ted
>

Mike Svoboda

unread,
Oct 8, 2013, 10:03:37 AM10/8/13
to Neil Watson, help-c...@googlegroups.com
Yup, just trying to communicate an example.

Although the cf-serverd would build a complete md5sum map of
$(masterfiles), wherever that is -- it should also be able to just offer
the client a subset of the map.

I.e. If the client wants to perform three file transfers from cf-serverd
of:

$(masterfiles)/inputs
$(masterfiles)/modules
$(masterfiles)/site-XYZ
$(masterfiles)/site-QRS


Each file transfer should request a different chunk of the map with a
configurable depth_search. Cf-serverd understands that when the client
requests $(masterfiles)/inputs with depth_search("inf"), cf-serverd
doesn't need to give clients md5sums of objects under the site-XYZ
directory.


How is this approach different than rsync? Well, rsync is designed as a 2
machine network transfer. Machine A requests objects from machine B, and
they both compute the md5sum map of what they have on disk at request
time. The md5sum hashtable that cf-serverd would offer here would always
be updating, and would be offered to thousands of clients immediately at
client request time. Its pre-built. Cf-serverd always maintains an
updated md5sum hashtable that it offers to all machines. Clients
determine what objects they want to pull from the server depending on the
md5sums of the objects they have on disk.

Mike Svoboda

unread,
Oct 8, 2013, 11:21:33 AM10/8/13
to help-c...@googlegroups.com
So, here's an idea, and I think you're onto the right idea of having a
two-stage file transfer.

* cf-serverd continuously performs a filesystem scan under
$(sys.masterfiles) to update its hashtable. If it detects a md5sum
update for an object it:
* Updates your in-memory variable (uuid?) saying something new is
available. I personally would avoid multicast. Several shops implement
firewalls that will block this traffic. Also, I'm not sure what the code
complexity is here on implementing multicast traffic, but it may not be
trivial.


When a client looks to perform a update, this could be the workflow:

1. Cf-agent wakes up, sees that previous file transfer with a policy
server had uuid 12345
2. Query cf-serverd for current uuid. Cf-serverd offers uuid 45678. <--
agent realizes that something changed on $(sys.masterfiles) on the policy
server.
3. Cf-agent downloads hashtable map from cf-serverd for
$(sys.masterfiles)/inputs
4a. Cf-agent performs a recursive search on $(sys.inputs), compares
with the hashtable it grabbed from cf-serverd, and determines that file
XYZ needs updating.
4b. Cf-agent performs a recursive search on $(sys.inputs), compares
with the hashtable, but realizes nothing has changed. Maybe
$(sys.masterfiles)/modules had an updated file?
if 4a then, 5. Cf-agent requests file X from cf-serverd
6. Cf-agent saves uuid 45678 as last known good state of cf-serverd.


Its funny you mentioned bittorrent. Facebook also uses bittorrent to push
code. It does have value if you need to push a massively large amount of
data (gigabytes) to several thousand of machines. We are actually
evaluating using bittorrent as well in a new project to distribute data to
end nodes.

For the most part, cf-serverd doesn't serve large objects. It just has
thousands of small policy files, configuration files, and other tiny
objects that it has to move around. Unless you need to move a massive
amount of data, I don¹t know if this has much value.





On 10/8/13 11:04 AM, "Ted Zlatanov" <t...@lifelogs.com> wrote:

>On Tue, 8 Oct 2013 14:14:51 +0000 Mike Svoboda <msvo...@linkedin.com>
>wrote:
>
>MS> So, instead of cf_promises_validated which is file based, you want
>MS> something memory based? What problem does this solve that
>MS> cf_promises_validated doesn't already provide?
>
>It scales better than transferring a disk file and can be updated in
>memory by another thread without locking a disk file.
>
>You can also have multiple "client profiles" and have a separate
>timestamp for each one.
>
>Finally, I don't like depending on magic files, personally.
>
>MS> With cf_promises_validated and how its currently designed, you assume
>that
>MS> this file gets updated when cf-promises detects policy updates. This
>MS> isn't the case. We have several examples where our policy servers
>grab
>MS> data from external sources via WGET or RSYNC, drop it into
>MS> /var/cfengine/masterfiles, and assume clients are going to pick up
>that
>MS> data as soon as its available. Cf-promises isn't going to detect that
>MS> something has changed.
>
>Right; my idea is to have cf-serverd see filesystem notifications.
>Running cf-promises is a separate task and could be done only for some
>"client profile" as, essentially, a pre-commit hook.
>
>MS> Also, once your in-memory variable is multicasted out or whatever --
>then
>MS> clients have to perform a full recursive MD5 comparison with
>cf-serverd?
>MS> Yes, maybe offering a small payload to tell clients "Hey! I have new
>MS> things!" is more efficient, but then clients have to perform a full
>MS> recursive MD5 sum crawl with cf-serverd!
>
>The secondary transfer can use rsync, a differential map as you
>described, or the current protocol. I'm talking about the primary
>transfer, the one where 100K clients are hitting you to check for
>updates. That code path should be optimized first and be most
>scalable. My claim is that if you do the primary transfer in one
>packet, that lets the server breathe a bit on the secondary transfers.
>
>MS> If you move towards the model of maintaining a map of md5 objects,
>clients
>MS> can selectively say -- cf-serverd, I'm grabbing file object 12345.
>Thatžs
>MS> all I need. I donžt need to have you perform a md5sum comparison of
>the
>MS> other 10,000 objects under $(masterfiles).
>
>MS> Yes, cf-serverd has to offer the md5sum map out to every client at
>request
>MS> time -- its actually far less network overhead when an update needs to
>MS> happen. It just has to offer one object, it doesn't have to satisfy
>a
>MS> full filesystem scan and the network overheads associated with that.
>
>I think that's a fine way to do the secondary transfer, but don't know
>if it's better than pure rsyncd or the current mechanism; I'd test it.
>An integrated solution is probably simplest and keeping track of a
>secondary file index is probably a bit of extra work, but I really have
>no strong opinion here.
>
>Another direction for cf-serverd is to use something like
>https://blog.twitter.com/2010/murder-fast-datacenter-code-deploys-using-bi
>ttorrent
>
>A truly distributed strategy would work very well in geographically
>distributed environments, but you can't predict the load on individual
>clients as well.
>
>Ted

Neil Watson

unread,
Oct 8, 2013, 11:26:34 AM10/8/13
to help-c...@googlegroups.com
On Tue, Oct 08, 2013 at 03:21:33PM +0000, Mike Svoboda wrote:
>So, here's an idea, and I think you're onto the right idea of having a
>two-stage file transfer.
>
>* cf-serverd continuously performs a filesystem scan under
>$(sys.masterfiles) to update its hashtable. If it detects a md5sum

Should be @{sys.masterfiles} because some servers serve masterfiles from
multiple locations.

Ted Zlatanov

unread,
Oct 8, 2013, 11:04:21 AM10/8/13
to help-c...@googlegroups.com
MS> If you move towards the model of maintaining a map of md5 objects, clients
MS> can selectively say -- cf-serverd, I'm grabbing file object 12345. That�s
MS> all I need. I don�t need to have you perform a md5sum comparison of the

Ted Zlatanov

unread,
Oct 8, 2013, 12:03:10 PM10/8/13
to help-c...@googlegroups.com
On Tue, 8 Oct 2013 15:21:33 +0000 Mike Svoboda <msvo...@linkedin.com> wrote:

MS> So, here's an idea, and I think you're onto the right idea of having a
MS> two-stage file transfer.

(Note the credit for that design goes to others; I am just suggesting
some incremental improvements like caching the UUID in memory.)

MS> * cf-serverd continuously performs a filesystem scan under
MS> $(sys.masterfiles) to update its hashtable. If it detects a md5sum
MS> update for an object it:
MS> * Updates your in-memory variable (uuid?) saying something new is
MS> available. I personally would avoid multicast. Several shops implement
MS> firewalls that will block this traffic. Also, I'm not sure what the code
MS> complexity is here on implementing multicast traffic, but it may not be
MS> trivial.

My idea with Avahi was to hook into existing frameworks, like we already
do for hub discovery at bootstrap time. Bare multicast works just as
well and is easy to program, but any transport (SNMP, TCP, pure ICMP,
NNTP, snail mail) would work.

MS> When a client looks to perform a update, this could be the workflow:

MS> 1. Cf-agent wakes up, sees that previous file transfer with a policy
MS> server had uuid 12345
MS> 2. Query cf-serverd for current uuid. Cf-serverd offers uuid 45678. <--
MS> agent realizes that something changed on $(sys.masterfiles) on the policy
MS> server.
MS> 3. Cf-agent downloads hashtable map from cf-serverd for
MS> $(sys.masterfiles)/inputs
MS> 4a. Cf-agent performs a recursive search on $(sys.inputs), compares
MS> with the hashtable it grabbed from cf-serverd, and determines that file
MS> XYZ needs updating.
MS> 4b. Cf-agent performs a recursive search on $(sys.inputs), compares
MS> with the hashtable, but realizes nothing has changed. Maybe
MS> $(sys.masterfiles)/modules had an updated file?
MS> if 4a then, 5. Cf-agent requests file X from cf-serverd
MS> 6. Cf-agent saves uuid 45678 as last known good state of cf-serverd.

Yes, that all makes sense. It seems to be just an optimization of the
current method, using a file index, so it's not very objectionable.

You're missing the other thread that monitors the filesystem. It needs
to run a pre-commit verification hook on the policies before updating
the UUID.

Ted

Mike Svoboda

unread,
Oct 8, 2013, 12:13:35 PM10/8/13
to help-c...@googlegroups.com
> You're missing the other thread that monitors the filesystem. It needs
> to run a pre-commit verification hook on the policies before updating
> the UUID.



I would leave that up to the git / svn repo as a pre-commit hook before
allowing the production change to happen. You could implement this, but
its probably unnecessary -- or at least make it a configurable option. A
production commit should never be allowed with a syntax error. Invalid
bits shouldn't ever make their way on your policy server to begin with.

Also, with different versions of cf-agent (3.5 vs 3.4.4), a syntax error
on your client may not be a syntax error on your policy server. So, your
policy server running 3.4 would allow a policy to be grabbed by your 3.5
client, thereby throwing syntax errors.

Ted Zlatanov

unread,
Oct 8, 2013, 3:10:56 PM10/8/13
to help-c...@googlegroups.com
On Tue, 8 Oct 2013 16:13:35 +0000 Mike Svoboda <msvo...@linkedin.com> wrote:

>> You're missing the other thread that monitors the filesystem. It needs
>> to run a pre-commit verification hook on the policies before updating
>> the UUID.

MS> I would leave that up to the git / svn repo as a pre-commit hook before
MS> allowing the production change to happen. You could implement this, but
MS> its probably unnecessary -- or at least make it a configurable option. A
MS> production commit should never be allowed with a syntax error. Invalid
MS> bits shouldn't ever make their way on your policy server to begin with.

Yes, it would be configurable (in this theoretical redesign) per client
profile.

MS> Also, with different versions of cf-agent (3.5 vs 3.4.4), a syntax error
MS> on your client may not be a syntax error on your policy server. So, your
MS> policy server running 3.4 would allow a policy to be grabbed by your 3.5
MS> client, thereby throwing syntax errors.

Of course. Assume we care about this situation :)

Ted

Laurent Raufaste

unread,
Oct 9, 2013, 12:55:37 AM10/9/13
to
I just read the thread so sorry if I'm missing something obvious, but I think MS has a good solution.
It could even be more simpler/generic by using text file with a format like:

/var/cfengine/masterfiles/ 305bfb753c979dc6a03adbbc12c6f268
/var/cfengine/masterfiles/base_files.cf 45cc3b2bb2ae8f7a830d6be50c1025d6
/var/cfengine/masterfiles/controls/ be7515b2034c3cc4af0649f80f68b580
/var/cfengine/masterfiles/cf_agent.cf 3b31eb2667b3bf2e653b68f57f6c1374
[...]
/var/cfengine/masterfiles/update.cf 961cc288f7f4aac03d6a324ed6e9ba6c

A file ending with a / is a folder, and the hash of a folder is the md5sum of all the hash of all the files/folder included in it.
This way, the first line is enough to compare a whole masterfiles folder content, and this hash is the only 1 exchanged 1st by cf-agent and cf-serverd.
Like MS said if the hash is different, or if cf-agent does not have a hash, they compare the list to know what's different.
And the comparison can be optimized by looking at folders 1st (ending with a /) and looking only for files in folders with different hashes.

This is close to what you guys were talking about (in-memory uid) but this way there is no special case, it's all the same behavior, the file format is explicit and consistant (and can be used for other things), and cf-agent and cf-serverd 1st compare the 1st line (small data), then the rest as needed.

On the server side, this should be only updated every X min, like now, and this can be enforced manually when wanting a new policy to be distributed fast, like now too.

Also, IMHO, using multicast, avahi or being able to change the location of /var/cfengine/masterfiles are off-topic here, no offense ;)

Loïc Pefferkorn

unread,
Oct 9, 2013, 1:00:46 AM10/9/13
to help-c...@googlegroups.com
Le 07/10/2013 15:23, Neil Watson a �crit :
> On Fri, Oct 04, 2013 at 08:58:12PM +0200, Lo?c Pefferkorn wrote:
>> Any update on this ?
>>
>> Moreover, a visible mechanism through policies will also help
>> newcomers to CFEngine to understand the whole synchronization process.
>
> Loic,
>
> I'm came up with this. It is not well testing.
>

Thank you Neil, I will try it.

Cheers,
Loic

Mike Svoboda

unread,
Oct 9, 2013, 4:39:37 PM10/9/13
to Laurent Raufaste, help-c...@googlegroups.com

Maybe instead of doing a md5sum of a folder, why not do a md5sum of the entire hashtable and use that as the policy server's "uuid"?   I think that might get at the point you're trying to drive — and its a good one.   If you're going to use some object to communicate state change (or state hasn't changed), it might as well have value.   An example of this is below.

Using just a md5sum on the masterfiles directory would only change if you assume your directory structure is flat.   If you have subdirectories and make a modification there, the masterfiles directory itself wouldn't have different md5sum.

The hastable itself would update with the new file object, so the md5sum of the hashtable object would change.


You could then use this to  very easily tie into clustering.   If you had 3+ policy servers, the policy servers could communicate with each other and determine that they all have the same hashtable md5sum — meaning — their contents of $(sys.masterfiles) is identical across all of your policy servers.   Your clients could contact any of the policy servers and you could guarantee that they would be grabbing the same data, regardless of which node they hit.


An example:

Policy server 1 — hashtable md5sum 12345
Policy server 2 — hashtable md5sum 12345
Policy server 3  -- hashtable md5sum 234567


Policy server 1 and 2 are in quorum.   With matching hashtable md5sums, its guaranteed that they are serving the same masterfiles dataset.  Policy server 1 and 2 announce that policy server 3 is out of quorum and needs update its working set of masterfiles.   If policy server 3, after update (svn update / git checkout)  is unable to achieve quorum with policy server 1/2, then cf-serverd is shut down on policy server 3 so invalid data isn't served to clients.






From: Laurent Raufaste <anal...@gmail.com>
Date: Wednesday, October 9, 2013 12:54 AM
To: "help-c...@googlegroups.com" <help-c...@googlegroups.com>
Subject: Re: [help-cfengine] Re: Proposal: abandon current cf_promises_validated mechanism

I just read the thread so sorry if I'm missing something obvious, but I think MS has a good solution.
It could even be more simpler/generic by using text file with a format like:

/var/cfengine/masterfiles/ 305bfb753c979dc6a03adbbc12c6f268
/var/cfengine/masterfiles/base_files.cf 305bfb753c979dc6a03adbbc12c6f268
/var/cfengine/masterfiles/controls/ 305bfb753c979dc6a03adbbc12c6f268
/var/cfengine/masterfiles/cf_agent.cf 305bfb753c979dc6a03adbbc12c6f268
[...]
/var/cfengine/masterfiles/update.cf

Laurent Raufaste

unread,
Oct 9, 2013, 4:54:40 PM10/9/13
to Mike Svoboda, help-c...@googlegroups.com
Yes, sorry if this was not clear.
The textfile format is:
/var/cfengine/masterfiles/ 305bfb753c979dc6a03adbbc12c6f268
/var/cfengine/masterfiles/base_files.cf 8fa14cdd754f91cc6554c9e71929cce7
/var/cfengine/masterfiles/controls/ 2d917f5d1275e96fd75e6352e26b1387
/var/cfengine/masterfiles/cf_agent.cf d4319fefc66c701f24c875afda6360d6
[...]
/var/cfengine/masterfiles/update.cf 09353387931db36c8af0e1b3658ddffe

"A file ending with a / is a folder, and the hash of a folder is the md5sum of all the hash of all the files/folder included in it."
So for a folder, the hash is not the md5sum of the folder or the files, but the md5sum of all the hash of files/folder under it in the text file.

In the case above, the hash of /var/cfengine/masterfiles/ would be the md5sum of /var/cfengine/masterfiles/base_files.cf down to /var/cfengine/masterfiles/update.cf in the text file, subfolders and subfiles included. It's very fast to compute (you need to find all the lines starting with the current folder, / included) and it means that you only need to md5 the files in masterfiles on the filesystem.
--
Laurent Raufaste
<http://www.glop.org/>

Laurent Raufaste

unread,
Oct 9, 2013, 5:20:07 PM10/9/13
to
Let me try with an explicit example:

/folder_1/ can be seen as the masterfiles folder
hash_X can be md5sum like d41d8cd98f00b204e9800998ecf8427e
# are comments to explain the line

Here's the file format

/folder_1/ hash_1 # = hash(hash_A + hash_B + hash_2 + hash_C + hash_D + hash_3 + hash_4 + hash_E + hash_5 + hash_F + hash_G) # Every hash below as everything is under this folder
/folder_1/file_A hash_A # = hash(file_content(file_A)
/folder_1/file_B hash_B # = hash(file_content(file_B)
/folder_1/folder_2/ hash_2 # = hash(hash_C + hash_D + hash_3 + hash_4 + hash_E) # Everything under /folder_1/folder_2/
/folder_1/folder_2/file_C hash_C # = hash(file_content(file_C)
/folder_1/folder_2/file_D hash_D # = hash(file_content(file_D)
/folder_1/folder_2/folder_3/ hash_3 # = hash() # Nothing, empty folder, but this still has a hash
/folder_1/folder_2/folder_4/ hash_4 # = hash(hash_E) # file_E is the only file included in this folder
/folder_1/folder_2/folder_4/file_E hash_E #= hash(file_content(file_E)
/folder_1/folder_5/ hash_5 # hash_5 # = hash(hash_F+hash_G)
/folder_1/folder_5/file_F hash_F # = hash(file_content(file_F)
/folder_1/folder_5/file_G hash_G # = hash(file_content(file_G)

This way the same rule apply for all files/folders in the file, even the 1st one, and you can still exchange only the global hash (hash_1) to see if there is something different.
Or any has below if you are only interested in some subfolder or some file.

Folder hashes are a hash of the hashes of everything under them (files+folders)
File hashes are only the hash of their file content

Does it make sense ?

Mike Svoboda

unread,
Oct 9, 2013, 5:21:56 PM10/9/13
to Laurent Raufaste, help-c...@googlegroups.com
Does it make sense ?

Yup, I think we were trying to communicate the same idea.   A single md5sum hash to represent the entire state of $(sys.masterfiles)



From: Laurent Raufaste <anal...@gmail.com>
Date: Wednesday, October 9, 2013 5:16 PM
To: "help-c...@googlegroups.com" <help-c...@googlegroups.com>
Subject: Re: [help-cfengine] Re: Proposal: abandon current cf_promises_validated mechanism

Let me try with an explicit example:

/folder_1/ can be seen as the masterfiles folder
hash_X can be md5sum like d41d8cd98f00b204e9800998ecf8427e
# are comments to explain the line

Here's the file format

/folder_1/ hash_1 # = hash(hash_A + hash_B + hash_2 + hash_C + hash_D + hash_3 + hash_4 + hash_E + hash_5 + hash_F + hash_G) # Every hash below as everything is under this folder
/folder_1/file_A hash_A # = hash(file_content(file_A)
/folder_1/file_B hash_B # = hash(file_content(file_B)
/folder_1/folder_2/ hash_2 # = hash(hash_C + hash_D + hash_3 + hash_4 + hash_E) # Everything under /folder_1/folder_2/
/folder_1/folder_2/file_C hash_C # = hash(file_content(file_C)
/folder_1/folder_2/file_D hash_D # = hash(file_content(file_D)
/folder_1/folder_2/folder_3/ hash_3 # = hash() # Nothing, empty folder, but this still has a hash
/folder_1/folder_2/folder_4/ hash_4 # = hash(file_E) # file_E is the only file included in this folder
/folder_1/folder_2/folder_4/file_E hash_E #= hash(file_content(file_E)
/folder_1/folder_5/ hash_5 # hash_5 # = hash(hash_F+hash_G)
/folder_1/folder_5/file_F hash_F # = hash(file_content(file_F)
/folder_1/folder_5/file_G hash_G # = hash(file_content(file_G)

This way the same rule apply for all files/folders in the file, even the 1st one, and you can still exchange only the global hash (hash_1) to see if there is something different.
Or any has below if you are only interested in some subfolder or some file.

Reply all
Reply to author
Forward
0 new messages