Puppet RFC 23 - XPP Files

154 views
Skip to first unread message

Eric Sorenson

unread,
Mar 30, 2016, 12:24:46 PM3/30/16
to Puppet Developers
Hi, I've just posted a new Puppet RFC that describes pre-parsed and pre-validated Puppet files, akin to '.pyc' files for Python. It's called XPP and the doc is open for comments here:


Please comment inline on the doc, or come back to this thread if the conversation gets too involved (more than about 4-5 replies in a comment box on google docs becomes unwieldy)

Once the commenting tapers off we'll incorporate changes into the spec and post it as markdown in the puppet-rfc repo: https://github.com/puppetlabs/puppet-rfc

--eric0

jeremia...@seagate.com

unread,
Mar 30, 2016, 3:02:08 PM3/30/16
to Puppet Developers
I write a lot of native types for our internal use at work.  (I tell people that if you are just using Exec in Puppet that ansible is one search away in Google. ) Some of the Puppet code used with these types would be very challenging to "pre-compile" in any way.

I think a lot of my questions are just because I don't consider a .pp file to usually be the unit of function in Puppet.  I only consider the resource, which is usually implemented in Ruby, and everything else is dressing.

Reading over this proposal raises questions about how it will fit into Puppet, a language and environment very different from Python, Java or other languages (Puppet 4 long leaving the domain-specific title by the wayside.)

Since you have to build an AST on a node-by-node basis I am having a hard time telling the value of this verses something like a marker that indicates the code has been parser validated.   If your code doesn't need node-by-node rebuild then perhaps the compiler could cache the results instead?  I know the adage about about three hard problems in Computer Science still applies but what this is doing is creating and managing a cache.

Pre-compiling is translating one language into another because compiling is changing one programing language, usually higher level, into another, usually lower level.   Puppet does something very different in my understanding given above.  It sounds as if the goat of the C++ parser is not to produce a catalog of resources but instead a ruby program for the agent to run.

 From the statements about a C++ Puppet parser is the target still just a collection of resources?  Or is the goal to eventually spit out something other than what the Puppet server should send to a node?

Is the scope just 'closed' .pp files?  That is classes where all variables can be resolved without inquiry to facts?  The behavior of languages that support things like pre-compiled files is specific to how they do binding of missing data.  While this proposal punts on the 'serialization' format the handling of binding is pretty central. That raises questions like:

How can a compiled format deal with Puppet features that call APIs?  Is this for the defined types?  How will it deal with the fact that the code could radically change based on node data not even present at the time the pre-compile is done?

What happens if a precompiled module that depends on PuppetDB information is moved to an environment without it?  For that matter, is the format intended to move between Puppet installations or just very similar ones?

My model of Puppet is that the compiled catalog is just an ordered list of resources which do not have any placeholders like variables or sub-dividable parts.  Classes in this model are just names that help you find the resources.

This format is relevant to recent discussions among people on IRC and in the mailing lists and groups about getting and storing the catalog for a node.  This is historically something hard as an end user. 

The context of any given resource is the whole catalog and at a minimum requires the dependent and depending classes in the graph tree.  Otherwise how does this deal with the unresolved parts?

Peter Huene

unread,
Mar 30, 2016, 3:37:42 PM3/30/16
to puppe...@googlegroups.com
Hi Jeremiah,

On Wed, Mar 30, 2016 at 11:56 AM, <jeremia...@seagate.com> wrote:
I write a lot of native types for our internal use at work.  (I tell people that if you are just using Exec in Puppet that ansible is one search away in Google. ) Some of the Puppet code used with these types would be very challenging to "pre-compile" in any way.

I think a lot of my questions are just because I don't consider a .pp file to usually be the unit of function in Puppet.  I only consider the resource, which is usually implemented in Ruby, and everything else is dressing.

Reading over this proposal raises questions about how it will fit into Puppet, a language and environment very different from Python, Java or other languages (Puppet 4 long leaving the domain-specific title by the wayside.)

Since you have to build an AST on a node-by-node basis I am having a hard time telling the value of this verses something like a marker that indicates the code has been parser validated.   If your code doesn't need node-by-node rebuild then perhaps the compiler could cache the results instead?  I know the adage about about three hard problems in Computer Science still applies but what this is doing is creating and managing a cache.

ASTs don't need to be built on a node-by-node basis (unless you meant manifest-by-manifest basis); an AST is just a representation of the manifest's source code. The XPP file format is simply an attempt to define a serialization format for the AST itself so that, say, all the manifest files in an environment could be parsed, validated, and the resulting ASTs saved in a format that is faster to read when a catalog for a node is being compiled all in one go and "upfront" (i.e. not during the catalog compilation for any one node).

In compiler terminology, there's a "frontend" and a "backend".  The frontend is responsible for producing an AST that the backend can evaluate or generate code from (in the case of Puppet, the backend directly evaluates ASTs to generate a catalog).  Having a well-defined AST serialization format means we could potentially swap out another implementation for the compiler's frontend, which is one of the goals of this project and a first step towards a new compiler implementation.


Pre-compiling is translating one language into another because compiling is changing one programing language, usually higher level, into another, usually lower level.   Puppet does something very different in my understanding given above.  It sounds as if the goat of the C++ parser is not to produce a catalog of resources but instead a ruby program for the agent to run.

The goal of this particular initiative is to enable the C++ parser (i.e. the frontend) to interop with the Ruby evaluation implementation (i.e. the backend).  The Puppet code is not being pre-compiled, but pre-parsed/pre-validated; the C++ implementation will not (yet) evaluate any Puppet code or load custom types or functions defined in Ruby.

The ultimate goal of the C++ implementation is to replace both the frontend and the backend for catalog compilation, but that's a ways off still as there needs to be design and implementation around loading custom types and functions from Ruby source to maintain backwards compatibility.


 From the statements about a C++ Puppet parser is the target still just a collection of resources?  Or is the goal to eventually spit out something other than what the Puppet server should send to a node?

The target is still the same thing we send to agents now: a resource catalog.  That won't change when the C++ compiler implements the backend too.


Is the scope just 'closed' .pp files?  That is classes where all variables can be resolved without inquiry to facts?  The behavior of languages that support things like pre-compiled files is specific to how they do binding of missing data.  While this proposal punts on the 'serialization' format the handling of binding is pretty central. That raises questions like:

How can a compiled format deal with Puppet features that call APIs?  Is this for the defined types?  How will it deal with the fact that the code could radically change based on node data not even present at the time the pre-compile is done?

What happens if a precompiled module that depends on PuppetDB information is moved to an environment without it?  For that matter, is the format intended to move between Puppet installations or just very similar ones?

My model of Puppet is that the compiled catalog is just an ordered list of resources which do not have any placeholders like variables or sub-dividable parts.  Classes in this model are just names that help you find the resources.

This format is relevant to recent discussions among people on IRC and in the mailing lists and groups about getting and storing the catalog for a node.  This is historically something hard as an end user. 

The context of any given resource is the whole catalog and at a minimum requires the dependent and depending classes in the graph tree.  Otherwise how does this deal with the unresolved parts?

The above seems to be confusing, understandably so, pre-compiling a resource catalog with pre-parsing a manifest.  In terms of a language like Python, the "pre-compiled" pyc files are simply a representation of the source that is more efficient to load and execute than having to parse the Python source again.  This is analogous to what we're calling pre-parsed XPP files: a representation of the Puppet source code that enables faster evaluation (i.e. during resource catalog "compilation") and also enables the existing Ruby implementation to load and evaluate even if a entirely different parser implementation was used to generate them.

Hope this helps.

Peter



On Wednesday, March 30, 2016 at 11:24:46 AM UTC-5, Eric Sorenson wrote:
Hi, I've just posted a new Puppet RFC that describes pre-parsed and pre-validated Puppet files, akin to '.pyc' files for Python. It's called XPP and the doc is open for comments here:


Please comment inline on the doc, or come back to this thread if the conversation gets too involved (more than about 4-5 replies in a comment box on google docs becomes unwieldy)

Once the commenting tapers off we'll incorporate changes into the spec and post it as markdown in the puppet-rfc repo: https://github.com/puppetlabs/puppet-rfc

--eric0

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/6e8b3ba1-e1a3-4008-9451-a4e0e65c5fcd%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jeremiah Powell

unread,
Mar 30, 2016, 5:13:52 PM3/30/16
to puppe...@googlegroups.com
ASTs don't need to be built on a node-by-node basis (unless you meant manifest-by-manifest basis

Well, manifest-by-manifest where the manifest will vary depending on the node the compile job is depending upon.  Reviewing the code I get the impression I just don't understand the existing parser[1] enough today to hold a valid opinion on this.  But honestly, I'm not trying to troll.

The goal of this particular initiative is to enable the C++ parser (i.e. the frontend) to interop with the Ruby evaluation implementation (i.e. the backend).  The Puppet code is not being pre-compiled, but pre-parsed/pre-validated; the C++ implementation will not (yet) evaluate any Puppet code or load custom types or functions defined in Ruby.

How will this work with create_resources[2]? 

In compiler terminology, there's a "frontend" and a "backend".

In compiler terminology the frontend is a scanner composed of a parser and a lexer.  The front-end validates the parse of the code as a side-effect.  This is beyond the scope of the discussion of the PRFC and into a sizing competition about who's read Aho, Lam, Sethi and Ullman.

The only point form this is that this is not compiling but a partial parsing. Some of my concerns cannot be raised until there is actual output to examine.

The above seems to be confusing, understandably so, pre-compiling a resource catalog with pre-parsing a manifest.  In terms of a language like Python, the "pre-compiled" pyc files are simply a representation of the source that is more efficient to load and execute than having to parse the Python source again

That is because the Java code and CPython code is completely compiled and ready to link in at runtime.    In this the XPP proposal does not appear similar to .pyc files or Java bytecode. 

It does appear to me too be very similar to the Ecore technology[3] from Eclipse, and thus Geppetto as mentioned in the references on RGen in the prior art section.   It does appears to be similar in how you write a Coffeescript parser for grammars in Atom or languages in Sublime Text.  It is just that you plan to serialize the result to disk instead of displaying to the user.

I suggest you read more about the CPython implementation of .pyc files in PEP 3147[4]. The PEP proposal is very well written, IMHO.  It covers a lot of the questions that are being raised in comments on the PRFC. Like the discussion of not using a shadow filesystem for the files.

An example from the PEP: will there be features like the ability to detect if pre-parsing is available or in use?  Can I turn it off in code as a developer or must I always use the --no-xpp command line as a user?  Would that even be a good idea?

From my understanding of the PRFC and XPP is a half-parsed file with compiler warnings mixed in.    This brings to mind the use of this for created a blocking step in a code deployment process.  I've already commented on that use in the document, tough.

1. https://github.com/puppetlabs/puppet/blob/master/lib/puppet/parser
2. https://docs.puppetlabs.com/puppet/latest/reference/function.html#createresources
3. http://puppet-on-the-edge.blogspot.com/2014/04/puppet-internals-introduction-to.html
4. https://www.python.org/dev/peps/pep-3147/

Jeremiah Powell
Seagate Technology - U.S. Business Data Center
http://www.seagate.com
email : Jeremia...@Seagate.com
phone: +1(405) 324-3238

WARNING:

The information contained in this transmission may be privileged or confidential. It is intended only for the above-identified recipient(s). If you are not the intended recipient, please forward this transmission to the author and delete this transmission and all copies. Thank you.

--
You received this message because you are subscribed to a topic in the Google Groups "Puppet Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/puppet-dev/ve0IdegWQck/unsubscribe.
To unsubscribe from this group and all its topics, send an email to puppet-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CACZQQfOhsBHcfjdoh9%3DyZS4YGUXQtnj6HRBTqLZiRr_CiueeNQ%40mail.gmail.com.

Henrik Lindberg

unread,
Mar 30, 2016, 6:01:16 PM3/30/16
to puppe...@googlegroups.com
On 30/03/16 23:12, Jeremiah Powell wrote:
> ASTs don't need to be built on a node-by-node basis (unless you
> meant manifest-by-manifest basis
>
>
> Well, manifest-by-manifest where the manifest will vary depending on the
> node the compile job is depending upon. Reviewing the code I get the
> impression I just don't understand the existing parser[1] enough today
> to hold a valid opinion on this. But honestly, I'm not trying to troll.
>

In puppet it is the "compilation of a catalog" that is specific to a
node, not the parsing of the individual .pp files. The word "parsing"
does not imply "evaluation" or "catalog compilation".

The puppet parser does the following:

* reads the source text
* builds a representation of this in memory (the Abstract Syntax Tree (AST)
* Performs static validation of the produced AST (to report semantic
problems that are not covered by syntax alone.

The compiler does this:

* given a node, with its facts, settings from a node classifier etc. it
is determined where it should start evaluating puppet code

* when it needs to evaluate something - e.g. "site.pp", it first needs
to parse this file into AST (the steps above).

* When it has the AST, it starts evaluating the AST (expressions like 1
+ 1, function calls, resource declarations etc).

* The result of the evaluation is that it has built up a catalog of
resources.

* The catalog is typically sent to an agent for application, but can be
written to disk etc.

The XPP PRFC is only for the three parsing steps. There will be no
difference what so ever in what happens once the AST is loaded into
memory. The only difference between the "Ruby parser parsing .pp file
into AST" and the "reading an XPP containing an AST" is that we do not
have to use the very slow Ruby runtime to do all the processing.


> The goal of this particular initiative is to enable the C++ parser
> (i.e. the frontend) to interop with the Ruby evaluation
> implementation (i.e. the backend). The Puppet code is not being
> pre-compiled, but pre-parsed/pre-validated; the C++ implementation
> will not (yet) evaluate any Puppet code or load custom types or
> functions defined in Ruby.
>
>
> How will this work with create_resources[2]?

They are completely unrelated. The call to the "create_resources" will
take place in exactly the same way. The AST that is evaluated looks
exactly the same if it came from an XPP file or if it was read in source
form from a .pp file and parsed by the ruby parser.

>
> In compiler terminology, there's a "frontend" and a "backend".
>
>
> In compiler terminology the frontend is a scanner composed of a parser
> and a lexer. The front-end validates the parse of the code as a
> side-effect. This is beyond the scope of the discussion of the PRFC and
> into a sizing competition about who's read Aho, Lam, Sethi and Ullman.
>
> The only point form this is that this is not compiling but a partial
> parsing. Some of my concerns cannot be raised until there is actual
> output to examine.
>

Puppet does not have a compiler in the typical computer science sense.
The puppet term "compiler" uses the word in an English/generic sense
since what it is doing is "compiling a catalog" (putting a catalog
together out of the pieces that are supposed to be in it).

Puppet is an interpreter that interprets the AST that is produced by the
puppet parser. The Puppet compiler uses that interpreter when it is
compiling the catalog.

> The above seems to be confusing, understandably so, pre-compiling a
> resource catalog with pre-parsing a manifest. In terms of a
> language like Python, the "pre-compiled" pyc files are simply a
> representation of the source that is more efficient to load and
> execute than having to parse the Python source again
>
>
> That is because the Java code and CPython code is completely compiled
> and ready to link in at runtime. In this the XPP proposal does not
> appear similar to .pyc files or Java bytecode.
>

Java is compiled to byte code. This byte code is then either interpreted
or "just in time" compiled into machine code.

The Puppet AST is to Puppet what the Java byte code is to a JVM.
(Although puppet it is not byte code based; we simply use the AST).

Byte code is typically for a virtual machine that is stack based.
As a simple example if you have the source:

a = 2 + 3

Byte code may be something like

Push Literal 2
Push Literal 3
Add
Store a

Whereas the AST is a tree of nodes (which is hard to draw, so here using
a list notation. The same in puppet would be:

(AssignmentExpression a
(ArithmeticExpression +
(Literal 2)
(Literal 3)))


> It does appear to me too be very similar to the Ecore technology[3] from
> Eclipse, and thus Geppetto as mentioned in the references on RGen in the
> prior art section. It does appears to be similar in how you write a
> Coffeescript parser for grammars in Atom or languages in Sublime Text.
> It is just that you plan to serialize the result to disk instead of
> displaying to the user.
>
All parsers do build a representation in memory using some form of tree.
Puppet is no exception. This form is useful for further processing
(validation, compilation/transformation into another form).
What we get from RGen/Ecore is simply a convenient way to define what
the building blocks are of the tree together with tooling that helps us
process them. It would look very similar if done by hand (only take more
time to do so).

And Yes! XPP simply gives us the ability to not having to go through all
of the steps from puppet source to AST, to evaluation to catalog output.
The parsing step is a compute intensive and it is the worst kind of task
possible for Ruby to perform.

XPP is simply the AST in serialized form (+ validation result).

> I suggest you read more about the CPython implementation of .pyc files
> in PEP 3147[4]. The PEP proposal is very well written, IMHO. It covers
> a lot of the questions that are being raised in comments on the PRFC.
> Like the discussion of not using a shadow filesystem for the files.
>

Thanks for the reference, will read more in that PEP.

> An example from the PEP: will there be features like the ability to
> detect if pre-parsing is available or in use? Can I turn it off in code
> as a developer or must I always use the --no-xpp command line as a
> user? Would that even be a good idea?
>

The --xpp / --no-xpp is a setting so it can be set in a configuration
file to avoid having to give it on the command line every time.

You cannot turn this on/off in individual puppet manifests. It is a
setting for the puppet "catalog compiler" if it should load AST from XPP
files instead of using the much slower Ruby parsing route. It will
always be possible to fallback to the Ruby route if there is no XPP
available.

> From my understanding of the PRFC and XPP is a half-parsed file with
> compiler warnings mixed in. This brings to mind the use of this for
> created a blocking step in a code deployment process. I've already
> commented on that use in the document, tough.
>

It is not "half parsed" - it is completely parsed and validated (as far
as we can do static analysis). It does the same as the command "puppet
parser validate" only that it produces a result that can be used later.

- henrik


--

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

Thomas Gelf

unread,
Mar 31, 2016, 10:03:02 PM3/31/16
to puppe...@googlegroups.com
Hi Eric,

your dedication in getting Puppet faster is really appreciated. My post
is absolutely not in favor of XPP, but please don't get me wrong: it is
meant to be a constructive contribution to the current design process.

In my personal opinion we have a sad history of optimizations focusing a
lot on blaming different languages and tools. Puppet often created fancy
new tools with new languages and components, but we rarely tackled the
root causes of our problems. This would be off topic, but I guess I'll
add a few examples by the end of this mail to let you understand what I
mean.


So let me start with the stated "Problems":

* Performance: I didn't do any measurements, but I guess the compiler
spends more time in resolving dependencies and traversing graphs than it
does in parsing and validating .pp files. Not to mention a lot of compat
hacks, alias-handling voodoo, insane Hiera lookups, type validation for
those lookups and legacy support hacks. So do you have any related
numbers? Where is most of the time spent when building and shipping
(real-world) catalogs? Are you really sure an AST-cache (per manifest?!)
would be worth the effort and solve the "performance problem"? I guess
the C++ parser itself is not so slow that it already needs an AST cache,
because then there would be something wrong with it.

* Cross-Language support: You wrote that the C++ parser needs to provide
the compiled AST to the Ruby runtime. Makes sense to me. But parsing .pp
files with C++, serializing them to a custom not yet designed format,
parsing that custom format with Ruby again and then re-resolve all
(most, some?) dependency graphs across the whole catalog with Ruby...
this doesn't sound like something that could help with getting things
faster. Sure, it would help the C++ parser to hand over it's AST. Or
store it to disk. But would this speed up the whole process? I have some
serious doubts in that relation.

IMHO this wouldn't help much, at least not unless "drop all Ruby
interfaces in the long run" is the final goal on your agenda. In that
case please let us know. Those who want to support that goal could unite
their forces to get it accomplished as fast as possible, the others
would at least know what to expect.

In a current Puppet ecosystem a C++ parser able to generate an AST from
a .pp file to me still seems far from anything that could completely
replace the current Ruby-based parser in a helpful way very soon. At
least not in a real-world environment with lot's of modules, custom
functions and external data sources, often provided by custom lookup
functions. At least not in a way that would bring any benefit to the
average Puppet user.

So, to me the former one remains a key question to the performance
benefit we could get from all this. As long as the Ruby runtime is
supported, I do not really see how this could work out. But this is just
a blind guess, please prove me wrong on this. Obviously the C++ Puppet
will be faster as soon as you drop the Ruby runtime. But then we should
add something else to the big picture: how should we build custom
extensions and interfaces to custom data in the future? Forking plugins?
Talking with web services? Because adding a C++ compiler to a (dev)ops
deployment pipeline will not convince many people I guess.

Everything that comes to my mind has it's very own performance impact.
We should know what to expect in that direction to be able to understand
what needs to be added to our (performance) calculation. As of this
writing and from what I know from mailing lists, Puppet Conf (and Camps)
to me the C++ parser is still an academic construct able to generate an
AST in a fast way. Nice for sure, but not (yet) any benefit in a
real-world Puppet scenario. Of course I might be missing some parts of
your big picture, involving strategic product-related features not yet
known to the public.

But please do forget that the extensibility of a tool is one of the key
features of any OpenSource software. Ops people didn't choose good old
Nagios because of it's "beautiful" frontend and it's "well-designed"
plugin API. They are using it because everyone from students to 60 years
old UNIX veterans are able to write something they use to call a
"plugin". Mostly awful snippets of Bash or Perl, not worth to be called
software. But doing customized crazy shit running on millions of
systems, available since nearly 20 years without breaking compatibility.
Of course there is Icinga right now ;) New Core, C++, shiny new web...
but still running those ugly old plugins. They are awful, they are
terrible, we all hat them. But lots of people invested a lot of time in
them, so breaking them is a no-go.

No one I know currently understands how existing "interfaces" (being
plain Ruby) fit in if your C++ plans. There is a lot of uncertainty
amongst (skilled) Puppet users regarding that right now. Some public
clarification would definitively help to smooth the waters. If your
plans include dropping that part in favor of restricted EPP and
DSL-based "functions" please let us know. It will be faster, for sure.
But it will be a different product with different (restricted)
possibilities. In that case I would prefer to be among the first ones
leaving the ship instead of being treated like the famous slowly boiled
frog.


But let's get back to the next point in your proposal, "requirements":

* publishing modules as XPP: I guess building an AST for a module would
take less time than checking out the very same module with r10k from
your local GIT repository. Even with "slow Ruby code". So IMO there are
no real benefits for this, but lots of potential pitfalls, insecurities,
bugs. If you need this to provide obfuscated Enterprise-only modules in
the future... well, it's your choice.

* longevity of file formats: what makes you think that Puppet will
change slower in the near future? Today there is no way to run many
Puppet 3.x Manifests with Puppet 4.x, and those are plain .pp files. An
AST would per definition be a lot more fragile. Why should we believe
that those cache files would survive longer?

* Efficient serialization is key to the success of XPP: you name it. And
please do not forget that efficient unserialization is far more
important. This will not take zero time and happens as often as a .pp
file is parsed today.


"Non-goals":

* If XPP will be plaintext it would obviously be not that fast, but
that's still fine for me

* I also have no problem with a serialized format not readably by human
beings. I will happily live with any binary format as long as you keep
YAML and similar diseases far away from me ;-)


"Proposal":

* XPP file handling in general sounds good to me

* I have some doubts when it goes to checking whether that file is "up
to date". Race conditions and issues when people are manually copying
files come to my mind.

* a safe way to solve this could be xpp files carrying source file
checksums in their name, but of course that would then be more costly as
it involves generating and validating checksums all the time. Outdated
XPP files must be removed.

* You know that people use r10k or custom tools to just checkout
specific tags or commit IDs again and again? Sometimes directly in their
module path. I work with customers where every 2-5 minutes the whole day
long someone pushes a new Puppetfile in an automated way. How would that
fit with your XPP model? Should Puppet (r10k, whoever) re-check/generate
all of them with every deployment? Every few minutes?

Also please to not underestimate the potential pitfalls for users when
trusting file modification times. We could run into a support nightmare.
We all know, writing a cache is not an easy task.


"API, switches, protocols":

* looks good to me


"Interfaces modified or extended":

* I see there is some discussion of whether XPP files should reside in
the module directories or in a mirrored structure. Well, caught between
a rock and a hard place - good luck :D


"Diagnostics of XPP"

* msgpack: well... mmmmhok

* shebang: there are enough comments, nothing to add

* pcore part, shebang line, mime type: you already define three
different kinds of version/subformat headers in a draft for a new
format. Not good.

* mime type: a container for a bunch of different formats doesn't make a
good format to me. Are you really sure that implementing AST
serialization for C++ and Ruby (and others?) with different formats for
all of those is a good idea? Msgpack AND JSON (easy) AND YAML (which
version?

* regarding YAML: how to protect against code injection? A slow
Ruby-based parser, once again?

* you mention JAR files as an example. They are used for deployment
reasons, not for speed. XPP borrows some ideas from a JAR. A JAR is a
container for convenience, it makes it easy to ship multiple files.
However, it makes reading files slower, that's why they are being
deployed (and extracted) on the target system. The resulting class file
is what XPP should try to look like if it wants to bring any benefit. At
least as long as you do not plan to store all pp files of a module in a
single .xpp file - but that would be even slower for many use cases. And
please note that class files are binary for a good reason: speed.

* you mention pyc files. They are binary, contain marshalled code
objects, once again being binary and native to Python. Same story as
above. There IS a reason why they are fast. Doesn't fit our current XPP
scenario with nested text formats.


Next point, "Alternatives":

* byte code oriented format: absolutely. If you want to have a fast AST
cache, this would help. Still, please add the time eventually needed for
eval'uating the (already parsed) AST with Ruby to the calculation.

* wait until the C++ compiler is implemented: also a very good idea. And
not only this, wait not only until it is implemented but also until we
know where the whole ecosystem (Ruby functions, interfaces, Ruby-based
"indirections") should move. Once you know how they will look like we
will know better how to tune all this. Parsing and validating plain .pp
files probably involves a fraction of the computing resources a Puppet
master spends today. Ruby is far from being our main problem here.

* embedding the C++ parser in Ruby would be a good and efficient
approach. Good luck with Clojure and Jruby ;)

* produce the .xpp also with Ruby: IMO a must. You will definitively run
into compatibility issues between your different parsers. No easy way to
discover them in an automated way without this feature.

* fork the C++ parser: now it is getting scary. Sure, why not. But
(un)serialization cost in one way or the other remains, doesn't it?


"Additional Concerns":

* "Compatibility": when you really allow different shebang lines,
different serialization formats, XPP shipped with forge modules,
auto-generated in your code deployment procedure, some people using
other deployment mechanism, timestamp issues... all this together could
result in a big mess.

* "Security": you are right with "no extra impact", but I would add the
possibility for new attack vectors eventually hidden to validation tools
as soon as you add YAML (as mentioned in the draft) to the mix

* "Documentation": I do not agree that this would not be necessary. XPP
(when implemented) will be a key component of all deployments. People
WILL build custom tools around it. It's better to state clearly how
things are designed instead of letting everybody figure out by
themselves how to do black magic.

* Spin offs: wooo... this adds a lot of new players to the mix, while
still being pretty vague. JavaScript? Then better stay with JSON ond
forget about MsgPack. And how should a browser handle YAML?

* C++ parser as a linter: makes sense to me

* Encrypt XPP files: would not make them faster. While I'm an absolute
fan of signed packages, I do not see a use for this on an XPP file level


That was a lot of text, sorry :) And thank you for reading all this. My
conclusion: the XPP file draft is an early optimization of something
fitting in an ecosystem still very vaguely defined. If ever implemented,
it should be postponed. I'm sure the C++ parser is a lot faster than the
Ruby-based one. But hey, if I start up IRB, require 'puppet' (still 3.4
on my local Ubuntu desktop) - then it takes Puppet 0,03s to load and
validate a random 5KB .pp file. This is not very fast, but I see no
urgent problem with this.

And as initially mentioned, this leads me to my last point - a few
examples of similar findings and "optimizations" we enjoyed in the past:


"Databases are slow"

We had active-records hammering our databases. The conclusion wasn't
that someone with SQL knowledge should design a good schema. The
publicly stated reasoning was "well, databases are slow, so we need more
cores to hammer the database, Ruby has not threading, Clojure is cool".
It still was slow by the end, so we added a message queue, a dead letter
office and more to the mix.

Just to give you some related numbers to compare: a year or two ago I
wrote a prototype for (fast) catalog-diffs. My test DB still carries
2400 random catalogs with an average of 1400 resources per catalog, 18+
million single resource parameters in total. Of course far less rows in
the DB as of checksum-based "de-duplication". But this is "real" data.
The largest catalog has nearly 19,000 resources, the smallest one 450.
Once again, no fake data, real catalogs collected over time from real
environments.

Storing an average catalog (1400 resources, cached JSON is 0,5-1MB)
takes as far as I remember less than half a second all the times. For
most environments something similar should perfectly be doable to
persist catalogs as soon as compiled. Even in a blocking mode with no
queue and a directly attached database in plain Ruby.


"Facter is slow"

Reasoning: Ruby is slow, we need C++ for a faster Facter. But Facter
itself never was the real problem. When loaded from Puppet you can
neglect also it's loading time. The problem were a few silly and some
more not-so-good single fact implementations. cFacter is mostly faster
because those facts been rewritten when they were implemented in C++.

Still, as long as we have custom facts cFacter still needs to fork Ruby.
And there it looses the startup time it initially saved. I guess the
Ruby-based Puppet requires 'facter' instead of forking it. I could be
wrong here. Still, the optimization was completely useless. But as a
result of all this as of today it is harder for people to quickly fix
facts behaving wrong on their systems. Combined with a C++-based Agent
cFacter still could make sense as Puppetlabs wants to support more
platforms. And even this argument isn't really valid. I'm pretty sure
there are far more platforms with Ruby support than ones with a
supported Puppet AIO package.


"Puppet-Master is slow"

Once again, Ruby is slow we learned. We got Puppet Server. I've met (and
helped) a lot of people that had severe issues with this stack. I'm
still telling anyone to not migrate unless there is no immediate need
for doing so. Most average admins are perfectly able to manage and scale
a Ruby-based web application. To them, Puppet Server is a black box.
Hard to manage, hard to scale. For many of them it's the only Java-based
application server they are running, so no clue about JVM memory
management, JMX and so on.

And I still need to see the one Puppet Server that is running faster
than a Puppet Master in the same environment. Preferably with equal
resource consumption.


Should I go on with PCP/PXP? I guess that's enough so far, I think you
understood what I mean.

With what I know until now, C++ Puppet and XPP would make perfect next
candidates for this hall of "fame". But as mentioned above, I'd love to
be proven wrong an all this. I'm neither a Ruby fanboy nor do I have
objections against C++. All I'm interested in is running my beloved
Puppet hassle-free in production, not wasting my time for caring about
the platform itself. I'd prefer to dedicate it to lots of small ugly
self-written modules breaking all of the latest best practices I can
find on the web ;-)

Cheers,
Thomas

Trevor Vaughan

unread,
Apr 1, 2016, 5:58:54 AM4/1/16
to puppe...@googlegroups.com
Thomas,

This is certainly a well thought out writeup, and mirrors some of the concerns that I've heard discussed elsewhere.

I must agree that a large part of the benefit of Puppet is being able to deep dive into the various components relatively easily and figure out what's breaking, patch it, and get on with life. Your Nagios analogy is spot on.

In theory, the move to C++ brings another benefit in that you can tie to more back end languages. This would potentially mean starting up more stacks, but you would be able to write plugins in your choice of back-end so long as they followed the C++ API hooks.

Something certainly needs to be done in terms of the server handling additional node load with the same, or fewer, resources and it will be interesting to see where that capability heads with the compiler rewrite. I agree on the fact that the Server and DB are often the only Java applications in an environment and that the usual woes of managing Java apps apply.

Unfortunately, the single biggest inefficiency in my environments is still catalog transfer and deserialization. In general, my puppet clients spend half of their time dealing with the catalog that is passed across the wire. This is understandable from a pure object model point of view but it would be great if the clients could do things more efficiently over time.

In theory, you could handle more nodes more efficiently by implementing a back-off procedure like the following:

1) Client requests catalog and provides timeout (tie to some API version, old clients just don't get a response, but the catalog compile can be killed efficiently)
2) Server runs catalog compilation within the timeout window
3) If timeout reached, tell client to wait or find another master. Terminate catalog compilation (perhaps save partially compiled catalog?)
3a) If timeout reached for one client, tell any additional clients to go find another master until compile queue below some watermark
4) Client either immediately tries another master (DNS SRV style) or starts exponential backoff
5) Rinse and repeat

This doesn't help with the "Java wants all my RAM" issue, but it does help with immediate scaling and automatic client fan-out.

I'm certainly interested to see where this, and other proposals, lead. Out of curiosity, did you post the DB implementation to which you referred online at any point? It would be interesting to see the implementation.

Thanks,

Trevor

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699

-- This account not approved for unencrypted proprietary information --

Henrik Lindberg

unread,
Apr 1, 2016, 2:21:42 PM4/1/16
to puppe...@googlegroups.com
On 01/04/16 04:02, Thomas Gelf wrote:
> Hi Eric,
>
> your dedication in getting Puppet faster is really appreciated. My post
> is absolutely not in favor of XPP, but please don't get me wrong: it is
> meant to be a constructive contribution to the current design process.
>
> In my personal opinion we have a sad history of optimizations focusing a
> lot on blaming different languages and tools. Puppet often created fancy
> new tools with new languages and components, but we rarely tackled the
> root causes of our problems. This would be off topic, but I guess I'll
> add a few examples by the end of this mail to let you understand what I
> mean.
>
>
> So let me start with the stated "Problems":
>
> * Performance: I didn't do any measurements, but I guess the compiler
> spends more time in resolving dependencies and traversing graphs than it
> does in parsing and validating .pp files. Not to mention a lot of compat
> hacks, alias-handling voodoo, insane Hiera lookups, type validation for
> those lookups and legacy support hacks. So do you have any related
> numbers? Where is most of the time spent when building and shipping
> (real-world) catalogs? Are you really sure an AST-cache (per manifest?!)
> would be worth the effort and solve the "performance problem"? I guess
> the C++ parser itself is not so slow that it already needs an AST cache,
> because then there would be something wrong with it.
>

The C++ implementation is several orders of magnitudes faster than the
ruby implementation. i.e. something silly like tens of thousands of
times faster.

The Ruby lexing/parsing and validation alone can take minutes on a
complex set up. We have shown earlier though benchmarks that lexing
alone is a bottleneck in any catalog compilation - every optimization
there contributes greatly to the bottom line.

> * Cross-Language support: You wrote that the C++ parser needs to provide
> the compiled AST to the Ruby runtime. Makes sense to me. But parsing .pp
> files with C++, serializing them to a custom not yet designed format,
> parsing that custom format with Ruby again and then re-resolve all
> (most, some?) dependency graphs across the whole catalog with Ruby...
> this doesn't sound like something that could help with getting things
> faster. Sure, it would help the C++ parser to hand over it's AST. Or
> store it to disk. But would this speed up the whole process? I have some
> serious doubts in that relation.
>

We have already measured the approach. The benefit on the ruby side is
that the lexing is delegated to a native implementation that reads
binary. A spike was performed with Ruby Marshal, which also compared to
a native MsgPack.

The main point here is that we are transitioning to a full
implementation of the puppet catalog compiler to C++. It will take some
time to get there and we want to give user the benefit of the
performance improvements and increased quality (we can check more,
record more information as it is cheaper) sooner rather than later.

The use of XPP makes this possible. We are happy if we initially only
get 5-10% out of this - for some that will be enough as it can translate
to supporting hundreds of additional agents on the same master. We are
hoping for more though. If we get a yield that is too disappointing we
will naturally not make the XPP feature a regular one, and instead try
something else.

> IMHO this wouldn't help much, at least not unless "drop all Ruby
> interfaces in the long run" is the final goal on your agenda. In that
> case please let us know. Those who want to support that goal could unite
> their forces to get it accomplished as fast as possible, the others
> would at least know what to expect.
>

That is not the intent (drop all Ruby things) - our plan is to make a
smooth transition. All "boil the ocean" strategies tends to fail, so
backwards compatibility and gradual change is important to us.

> In a current Puppet ecosystem a C++ parser able to generate an AST from
> a .pp file to me still seems far from anything that could completely
> replace the current Ruby-based parser in a helpful way very soon. At
> least not in a real-world environment with lot's of modules, custom
> functions and external data sources, often provided by custom lookup
> functions. At least not in a way that would bring any benefit to the
> average Puppet user.
>

The goal is to do this transparently.

> So, to me the former one remains a key question to the performance
> benefit we could get from all this. As long as the Ruby runtime is
> supported, I do not really see how this could work out. But this is just
> a blind guess, please prove me wrong on this. Obviously the C++ Puppet
> will be faster as soon as you drop the Ruby runtime. But then we should
> add something else to the big picture: how should we build custom
> extensions and interfaces to custom data in the future? Forking plugins?
> Talking with web services? Because adding a C++ compiler to a (dev)ops
> deployment pipeline will not convince many people I guess.
>

That topic is indeed a big topic, and one that will continue as we are
working towards a C++ based environment. The key here is
interoperability where extensions are supported in Ruby, or in a
language it makes sense to implement them in.

Expect to see a lot more about this later in the game.

> Everything that comes to my mind has it's very own performance impact.
> We should know what to expect in that direction to be able to understand
> what needs to be added to our (performance) calculation. As of this
> writing and from what I know from mailing lists, Puppet Conf (and Camps)
> to me the C++ parser is still an academic construct able to generate an
> AST in a fast way. Nice for sure, but not (yet) any benefit in a
> real-world Puppet scenario. Of course I might be missing some parts of
> your big picture, involving strategic product-related features not yet
> known to the public.
>

It is anything but academic. What we could do, but are reluctant to do
is to link the C++ parser into the ruby runtime - c++ based parsing
would then be completely transparent to users - it would still need to
pass the native/ruby object barrier - which XPP is handling - if linked
into memory it would just be an internal affair. If we have to go there
we may do so, XPP is however a step along the way, and if it proves to
give much wanted performance benefits even if the c++ based parser and
the ruby runtime are talking via a file/pipe then we think that is of
value to users.

> But please do forget that the extensibility of a tool is one of the key
> features of any OpenSource software. Ops people didn't choose good old
> Nagios because of it's "beautiful" frontend and it's "well-designed"
> plugin API. They are using it because everyone from students to 60 years
> old UNIX veterans are able to write something they use to call a
> "plugin". Mostly awful snippets of Bash or Perl, not worth to be called
> software. But doing customized crazy shit running on millions of
> systems, available since nearly 20 years without breaking compatibility.
> Of course there is Icinga right now ;) New Core, C++, shiny new web...
> but still running those ugly old plugins. They are awful, they are
> terrible, we all hat them. But lots of people invested a lot of time in
> them, so breaking them is a no-go.
>

Backwards compatibility and interop is of the utmost concern. We belive
that breaking things apart and specifying good APIs and providing well
performing communication between the various part of the system is key
from moving away from the now quite complicated and slow monolithic
implementation in Ruby.

> No one I know currently understands how existing "interfaces" (being
> plain Ruby) fit in if your C++ plans. There is a lot of uncertainty
> amongst (skilled) Puppet users regarding that right now. Some public
> clarification would definitively help to smooth the waters. If your
> plans include dropping that part in favor of restricted EPP and
> DSL-based "functions" please let us know. It will be faster, for sure.
> But it will be a different product with different (restricted)
> possibilities. In that case I would prefer to be among the first ones
> leaving the ship instead of being treated like the famous slowly boiled
> frog.
>

I think my responses above covers this. We are not going to slowly boil
users... That is what makes what we are doing difficult. It would be a
lot simpler to write a completely new implementation - it would also be
an implementation that would take a very long time for users to be able
to adopt. We are not doing that.

>
> But let's get back to the next point in your proposal, "requirements":
>
> * publishing modules as XPP: I guess building an AST for a module would
> take less time than checking out the very same module with r10k from
> your local GIT repository. Even with "slow Ruby code". So IMO there are
> no real benefits for this, but lots of potential pitfalls, insecurities,
> bugs. If you need this to provide obfuscated Enterprise-only modules in
> the future... well, it's your choice.
>

This point is also raised by others. The requirements will be revised.

> * longevity of file formats: what makes you think that Puppet will
> change slower in the near future? Today there is no way to run many
> Puppet 3.x Manifests with Puppet 4.x, and those are plain .pp files. An
> AST would per definition be a lot more fragile. Why should we believe
> that those cache files would survive longer?
>

Because the are well defined as opposed to how things were earlier where
things just happened to be a certain way because of how it was
implemented. Knowing what something means is the foundation that allows
it to be transformed. And when something is "all data" as opposed to
"all messy code", it can be processed by tools.

> * Efficient serialization is key to the success of XPP: you name it. And
> please do not forget that efficient unserialization is far more
> important. This will not take zero time and happens as often as a .pp
> file is parsed today.
>

That goes without saying - piping data to /dev/null is not a performance
concern ;-)

As an example - what makes things expensive in Ruby is creation of many
object and garbage collection. (in lexing, each and every character in
the source needs to be individually processed, the result is a stream of
very small objects. Strings in Ruby and it is a catch 22 to try to
optimize this by transforming into symbols (internalizing), or looking
them up in a singleton pattern). An efficient
serialization/deserialization however makes judicious use of tabulation
(same value only appears once). When this is done with a C++ serializer
all of the cost is on the serializing side - the deserialization greatly
benefits from this as there are far fewer objects to construct, less
processing is required, and less memory is used. (These secondary
effects have not been benchmarked in puppet, but has proven to be very
beneficial in implementations we have used in the past).

> "Non-goals":
>
> * If XPP will be plaintext it would obviously be not that fast, but
> that's still fine for me
>
> * I also have no problem with a serialized format not readably by human
> beings. I will happily live with any binary format as long as you keep
> YAML and similar diseases far away from me ;-)
>
>
> "Proposal":
>
> * XPP file handling in general sounds good to me
>
> * I have some doubts when it goes to checking whether that file is "up
> to date". Race conditions and issues when people are manually copying
> files come to my mind.
>
> * a safe way to solve this could be xpp files carrying source file
> checksums in their name, but of course that would then be more costly as
> it involves generating and validating checksums all the time. Outdated
> XPP files must be removed.
>
> * You know that people use r10k or custom tools to just checkout
> specific tags or commit IDs again and again? Sometimes directly in their
> module path. I work with customers where every 2-5 minutes the whole day
> long someone pushes a new Puppetfile in an automated way. How would that
> fit with your XPP model? Should Puppet (r10k, whoever) re-check/generate
> all of them with every deployment? Every few minutes?
>
> Also please to not underestimate the potential pitfalls for users when
> trusting file modification times. We could run into a support nightmare.
> We all know, writing a cache is not an easy task.
>

These concerns are shared. It is the overall process more than the lower
level technical things that I worry about getting right.
The requirements and exactly how/when/where XPPs gets created and used
will require an extra round or two of thought and debate.

>
> "API, switches, protocols":
>
> * looks good to me
>
>
> "Interfaces modified or extended":
>
> * I see there is some discussion of whether XPP files should reside in
> the module directories or in a mirrored structure. Well, caught between
> a rock and a hard place - good luck :D
>
>
> "Diagnostics of XPP"
>
> * msgpack: well... mmmmhok
>
> * shebang: there are enough comments, nothing to add
>
> * pcore part, shebang line, mime type: you already define three
> different kinds of version/subformat headers in a draft for a new
> format. Not good.
>
> * mime type: a container for a bunch of different formats doesn't make a
> good format to me. Are you really sure that implementing AST
> serialization for C++ and Ruby (and others?) with different formats for
> all of those is a good idea? Msgpack AND JSON (easy) AND YAML (which
> version?
>
Others raised the same question. The intent for XPP is MsgPack. The
support for JSON is for other use cases. If a Yaml is ever done is
questionable. The spec should differentiate between what XPP dictates
vs. alternative possible formats for other purposes.

> * regarding YAML: how to protect against code injection? A slow
> Ruby-based parser, once again?
>
Very good point, and why it is unlikely that will ever be done.
Yep, things will be dropped - goal is to make things as simple as they
can be.

> * "Security": you are right with "no extra impact", but I would add the
> possibility for new attack vectors eventually hidden to validation tools
> as soon as you add YAML (as mentioned in the draft) to the mix
>

Consider Yaml taken out of the equation.

> * "Documentation": I do not agree that this would not be necessary. XPP
> (when implemented) will be a key component of all deployments. People
> WILL build custom tools around it. It's better to state clearly how
> things are designed instead of letting everybody figure out by
> themselves how to do black magic.
>

It is at least not an extensive documentation concern in terms of user
facing documentation. The more it is a pure implementation concern that
works transparently the less user facing documentation will be needed.

> * Spin offs: wooo... this adds a lot of new players to the mix, while
> still being pretty vague. JavaScript? Then better stay with JSON ond
> forget about MsgPack. And how should a browser handle YAML?
>
> * C++ parser as a linter: makes sense to me
>
> * Encrypt XPP files: would not make them faster. While I'm an absolute
> fan of signed packages, I do not see a use for this on an XPP file level
>

Others commented to that effect too.
Thanks you Thomas for all of the valuable comment and insights.
Best Regards

- henrik

Deepak Giridharagopal

unread,
Apr 1, 2016, 2:42:32 PM4/1/16
to puppe...@googlegroups.com
On Thu, Mar 31, 2016 at 8:02 PM, Thomas Gelf <tho...@gelf.net> wrote:
your dedication in getting Puppet faster is really appreciated. My post
is absolutely not in favor of XPP, but please don't get me wrong: it is
meant to be a constructive contribution to the current design process.

Thanks for the feedback, Thomas. This is the right forum for it, and I appreciate the ways in which you're challenging these ideas. +1 for Real Talk. Some of the things you mentioned involve historical decisions, and some are around current/future decisions...I think we should focus on the problems you identified that we can deal with going forward.
 
 

* Cross-Language support: You wrote that the C++ parser needs to provide
the compiled AST to the Ruby runtime. Makes sense to me. But parsing .pp
files with C++, serializing them to a custom not yet designed format,
parsing that custom format with Ruby again and then re-resolve all
(most, some?) dependency graphs across the whole catalog with Ruby...
this doesn't sound like something that could help with getting things
faster. Sure, it would help the C++ parser to hand over it's AST. Or
store it to disk. But would this speed up the whole process? I have some
serious doubts in that relation.

I actually think it'll make a big difference in parsing speed, if for no other reason than slurping in pre-parsed stuff in nearly any kind of non-insane format would easily be faster than the hand-rolled parsing that's on the ruby side right now. We've seen similar improvements in other spots, like dalen's patches that added msgpack support vs. hand-rolled serialization/deserialization. Maybe the thing to do here is to try some experiments and post back some numbers that could hopefully ground the discussion with some data?

 
[...]
But please do forget that the extensibility of a tool is one of the key
features of any OpenSource software. Ops people didn't choose good old
Nagios because of it's "beautiful" frontend and it's "well-designed"
plugin API. They are using it because everyone from students to 60 years
old UNIX veterans are able to write something they use to call a
"plugin". Mostly awful snippets of Bash or Perl, not worth to be called
software. But doing customized crazy shit running on millions of
systems, available since nearly 20 years without breaking compatibility.
Of course there is Icinga right now ;) New Core, C++, shiny new web...
but still running those ugly old plugins. They are awful, they are
terrible, we all hat them. But lots of people invested a lot of time in
them, so breaking them is a no-go.

Agreed...there's no way we can break compatibility with most existing puppet modules. That would be some serious, doomsday-level awfulness. Whatever we come up with in this area has to work with the code that's out there, and that's definitely the plan.

The vibe I'm getting from this line of feedback is that we should perhaps better articulate the longer-term plan around the native compiler in general, instead of focusing on increments (like .xpp) that, absent the larger context, may seem unhelpful in their own right?
 
But let's get back to the next point in your proposal, "requirements":

* publishing modules as XPP: I guess building an AST for a module would
take less time than checking out the very same module with r10k from
your local GIT repository. Even with "slow Ruby code". So IMO there are
no real benefits for this, but lots of potential pitfalls, insecurities,
bugs. If you need this to provide obfuscated Enterprise-only modules in
the future... well, it's your choice.

I agree...I'd be inclined to make this a non-goal.

 
"Databases are slow"

We had active-records hammering our databases. The conclusion wasn't
that someone with SQL knowledge should design a good schema. The
publicly stated reasoning was "well, databases are slow, so we need more
cores to hammer the database, Ruby has not threading, Clojure is cool".
It still was slow by the end, so we added a message queue, a dead letter
office and more to the mix.

With respect, I think this is a pretty unfair retelling of history. Even in the Puppetconf talk where I introduced PDB, this was not the story. I'm comfortable letting all the public video footage of us discussing the rationale rebut this.

Queueing had nothing to do with speed of persistence, as opposed to providing a sink for backpressure from the DB. Without that, agent runs would simply fail when writes timed out even if those agents did no resource collections as part of compilation. The dead-letter-office is unrelated to performance; it's a place to put data that couldn't be processed, so we can debug it more thoroughly (something that has been directly responsible for a number of important bugfixes and robustness improvements). Without that, debugging storage problems was quite difficult.


[...]

Storing an average catalog (1400 resources, cached JSON is 0,5-1MB)
takes as far as I remember less than half a second all the times. For
most environments something similar should perfectly be doable to
persist catalogs as soon as compiled. Even in a blocking mode with no
queue and a directly attached database in plain Ruby.

If writes take 0.5 seconds, then you'd start failing agent runs on any site that was > 3600 nodes using a 30 minute runinterval. At that point you'd have requests coming in faster than your ability to persist data (and even this is being charitable, because in real systems the load isn't perfectly spread out). Thus the whole point of queueing and optimizing the storage pipeline. This is a more complex problem than many folks realize.

In any case, if you've got your own well-tuned system for persisting catalogs, you can use that in place of puppetdb if you like. You could reuse the puppetdb terminus and swap your backend in (the wire formats are documented here https://docs.puppetlabs.com/puppetdb/4.0/api/wire_format/catalog_format_v6.html, and the spec for queries are documented in the same spot). Is your code in a place where you can open source it?

 
"Facter is slow"

Reasoning: Ruby is slow, we need C++ for a faster Facter. But Facter
itself never was the real problem. When loaded from Puppet you can
neglect also it's loading time. The problem were a few silly and some
more not-so-good single fact implementations. cFacter is mostly faster
because those facts been rewritten when they were implemented in C++.

I think that's partly true; rewriting some of the core facts definitely sped things up. But I wouldn't underestimate the impact of porting over the engine that surrounds those facts. Just about every possible part of facter runs much faster and uses much less memory while maintaining compatibility with custom facts. Also, we don't fork and execute Ruby or anything...it's embedded. At this point, we've got native facter, and folks can compare for themselves how fast and lean it is relative to previous versions.

The point about it being harder for folks to quickly fix facts behaving weird on their systems is one worth talking more about. Would you mind starting a separate thread about the debugging experience so we could talk through that independent of the xpp discussion?



"Puppet-Master is slow"

Once again, Ruby is slow we learned. We got Puppet Server. I've met (and
helped) a lot of people that had severe issues with this stack. I'm
still telling anyone to not migrate unless there is no immediate need
for doing so. Most average admins are perfectly able to manage and scale
a Ruby-based web application. To them, Puppet Server is a black box.
Hard to manage, hard to scale. For many of them it's the only Java-based
application server they are running, so no clue about JVM memory
management, JMX and so on.

This is also good feedback, and something that's worth its own thread around the usability/manageability/scalability problems you see. I'd love to have more of a conversation about how to improve things in those areas!

I do think it's worth keeping in mind that there are more puppet users now than ever; it's a very big tent. In my humble opinion, generalizations about what "most average admins" can do are increasingly fraught with peril the bigger and more diverse our user base has gotten.

 
 All I'm interested in is running my beloved
Puppet hassle-free in production, not wasting my time for caring about
the platform itself. I'd prefer to dedicate it to lots of small ugly
self-written modules breaking all of the latest best practices I can
find on the web ;-)

Very well said. :)

deepak

R.I.Pienaar

unread,
Apr 1, 2016, 2:59:18 PM4/1/16
to puppet-dev
yes please, it will make putting this stuff in context much easier.

> This is also good feedback, and something that's worth its own thread
> around the usability/manageability/scalability problems you see. I'd love
> to have more of a conversation about how to improve things in those areas!
>
> I do think it's worth keeping in mind that there are more puppet users now
> than ever; it's a very big tent. In my humble opinion, generalizations
> about what "most average admins" can do are increasingly fraught with peril
> the bigger and more diverse our user base has gotten.

Indeed and if you recall there was a similar outcry when passenger became
the de facto way. The java stack as delivered by PL in PuppetDB and Server is
a LOT more manageable than the passenger stack.

One just have to take the time to learn it - just like they did the passenger
stack. Unlike the passenger stack you'll then discover the thing can actually
be monitored in depth and have very mature admin tools.

Thomas Gelf

unread,
Apr 1, 2016, 7:19:11 PM4/1/16
to puppe...@googlegroups.com
Hi Henrik,

thanks a lot for your response!

Am 01.04.2016 um 20:21 schrieb Henrik Lindberg:
> The C++ implementation is several orders of magnitudes faster than the
> ruby implementation. i.e. something silly like tens of thousands of
> times faster.

No doubt on this, I believe you without any benchmark.

> The Ruby lexing/parsing and validation alone can take minutes on a
> complex set up. We have shown earlier though benchmarks that lexing
> alone is a bottleneck in any catalog compilation - every optimization
> there contributes greatly to the bottom line.

Could you share some details on this? What kind of catalogs are you
talking about? How many resources, parameters, how large are they - and
what makes them so large and slow? Still, no doubt that C++ will be able
to lex the same catalogs in a fraction of the time.

> We have already measured the approach. The benefit on the ruby side is
> that the lexing is delegated to a native implementation that reads
> binary. A spike was performed with Ruby Marshal, which also compared to
> a native MsgPack.

Ok, so basically a linked c-based lexer could give the same performance
boost? Yes, I know, JRuby. But still, could this be true?

> The main point here is that we are transitioning to a full
> implementation of the puppet catalog compiler to C++ ... The use of XPP
> makes this possible.

This is where I started to feel no longer comfortable while reading the
proposal. No caching mechanism that helped my in Puppet comes to my
mind, but I could immediately tell a lot of anecdotes involving severe
Puppet issues breaking whole environments just because of caching issues.

> We are happy if we initially only get 5-10% out of this...

And this is where I currently disagree. Very often I invest lots of time
for just 1%. But being able to run without a fragile caching layer could
be worth even 50% as long as I'm able to scale. When someone has to stop
a deployment chain because he needs to troubleshoot a caching layer,
lot's of people are sitting around and cannot work. Ask them whether
they would have preferred to buy more hardwar.

> We are hoping for more though.

I hope you're pretty confident on this ;)

> That is not the intent (drop all Ruby things) - our plan is to make a
> smooth transition. All "boil the ocean" strategies tends to fail, so
> backwards compatibility and gradual change is important to us.

Eeeeeeh... Sorry, this is slightly OT, but by the end it isn't. This
"transition" is the root cause for a proposal with the potential for a
lot of additional trouble. You do not want to "drop all the Ruby
things", but you want to have a smooth transition. Without knowing where
this transition should lead this sounds like a contradiction to me.

So, where will this transit phase lead to? That's IMO the question that
many would love to see answered. Will ruby still be there? So where is
the transition? If it won't, how would it's successor look like? I guess
you know what I mean, please enlighten us!

>> In a current Puppet ecosystem a C++ parser able to generate an AST from
>> a .pp file to me still seems far from anything that could completely
>> replace the current Ruby-based parser in a helpful way very soon. At
>> least not in a real-world environment with lot's of modules, custom
>> functions and external data sources, often provided by custom lookup
>> functions. At least not in a way that would bring any benefit to the
>> average Puppet user.
>
> The goal is to do this transparently.

Sorry, couldn't follow you. Referring what?

>> So, to me the former one remains a key question to the performance
>> benefit we could get from all this. As long as the Ruby runtime is
>> supported, I do not really see how this could work out. But this is just
>> a blind guess, please prove me wrong on this. ... But then we should
>> add something else to the big picture: how should we build custom
>> extensions and interfaces to custom data in the future? Forking plugins?
>
> That topic is indeed a big topic, and one that will continue as we are
> working towards a C++ based environment. The key here is
> interoperability where extensions are supported in Ruby, or in a
> language it makes sense to implement them in.

Shouldn't those questions be answered first? Aren't external data
lookups, Hiera, Database persistence, plugin-sync, file-shipping and all
the rest still far more expensive than the lexer? I would love to
understand how my I should expect to do my daily work in a world unless
the "smooth transition away from Ruby".

It's hard to judge the value of a brick without knowing how the expected
building should look like.

> Expect to see a lot more about this later in the game.

I'm sure I will. But Eric asked for feedback on XPP right now ;)

> It is anything but academic. What we could do, but are reluctant to do
> is to link the C++ parser into the ruby runtime...

I would immediately support that approach!

> ...it would still need to pass the native/ruby object barrier - which
> XPP is handling - if linked into memory it would just be an internal
> affair.

Correct. I have no problem with this part of "XPP". Eric presented it as
"something like pyc", being pre-parsed and therefore behaving like a
caching layer. This week I worked for a European national bank, brought
them Puppet Enterprise, deployments are rare and well planned. It would
work for them. In ten days I work for a customer where I see Puppetfile
commits every two minutes, r10k and more, OpenSource Puppet, all
environments changing and moving all the time. Not only wouldn't they
benefit from some "intelligent" caching layer. I bet they would suffer.
Badly.

So: C++ Lexer -> fine. Linked into Ruby -> my preferred variant. Using
an XPP-like interface: also fine. ".pyc"-like precaching: no. This is
what I'm completely against right now, this is where I see no real
advantage. Please postpone this, do not even suggest to store those
files. Let the lexer grow and get mature, then let's re-evaluate whether
polluting our modules (or mirrored structures) with all those files
would make any sense.

>> But please do forget that the extensibility of a tool is one of the key
>> features of any OpenSource software. ...breaking them is a no-go.
>
> Backwards compatibility and interop is of the utmost concern. We belive
> that breaking things apart and specifying good APIs and providing well
> performing communication between the various part of the system is key
> from moving away from the now quite complicated and slow monolithic
> implementation in Ruby.

Cool! I know I repeat myself, but could you already leak some details
how this Ruby-less "interop" will look like?

>> * longevity of file formats: ... An AST would per definition be a lot
>> more fragile. Why should we believe that those cache files would survive
>> longer?
>
> Because the are well defined as opposed to how things were earlier where
> things just happened to be a certain way because of how it was
> implemented. Knowing what something means is the foundation that allows
> it to be transformed. And when something is "all data" as opposed to
> "all messy code", it can be processed by tools.

I would mostly agree, but experience teaches me to not trust such
statements. And your problem is: an AST is not data. It cannot be
represented in a defined structure. And we are in a phase where even
data types are still subject to change, with lot's of new related
features in 4.4. All this would affect an AST, wouldn't it?

This wouldn't be an issue for the "C++ is our lexer" approach, but it is
obviously essential when XPP will be used as cache files, designed to be
shipped with modules.

> As an example - what makes things expensive in Ruby is creation of many
> object and garbage collection. (in lexing, each and every character in
> the source needs to be individually processed... When this is done with
> a C++ serializer all of the cost is on the serializing side...

Ruby didn't impress me with it's unserialization speed either. So some
cost will still be there in our overall picture. I blindly believe that
the C++ lexer is ways faster. But the only number I'm interested in is
the the difference between "catalog built and shipped by Ruby" and
"catalog built while being lexed with c++, serialized, unserialized with
Ruby and shipped with Clojure". That's the real saving.

> (These secondary effects have not been benchmarked in puppet, but has
> proven to be very beneficial in implementations we have used in the past).

Would be interesting. Languages behaving similar have proven to
outperform "better" ones in specific use cases even if wasting a lot
more memory. But honestly, no, it doesn't really interest me. But I'd
love to learn more about what kind of catalogs you where talking about
when you are facing minutes! of lexing time.

Even lot's of .pp files summing up to a few thousand single resources
shouldn't require more than 10-30 MB of lexing memory (blind guess,
didn't measure) and more than 3 seconds of parsing/validation time in
Ruby. None of the large environments I'm playing with are facing such
issues.

Disclaimer: all "my" large ones are still running 3.x, so no idea
whether 4.x and/or Puppet Server is so much slower - but I don't think
so. And usually when catalogs tend to have tens of thousands of
resources the root cause is quickly identified and easily replaced with
a cheaper approach. Something like "Use a custom function, aggregate on
the master, ship a single file instead of thousands" more than once
helped to bring Puppet runs from lasting more than half an hour down to
10 seconds.

Back to my question: could you let us know what kind of catalogs tend to
require minutes of lexing time?

> These concerns are shared. It is the overall process more than the lower
> level technical things that I worry about getting right.

:)

> The requirements and exactly how/when/where XPPs gets created and used
> will require an extra round or two of thought and debate.

Agreed. C++ Lexer, AST handed over to Ruby, linked or not: go for it.
XPPs on my disk: please not. Not yet. Not unless we have more experience
with the new lexing construct. Not unless we know how to tackle various
potential caching pitfalls in endless customized variants of Puppet
module deployments.

> Thanks you Thomas for all of the valuable comment and insights.

Thank you for reading all this, Henrik - and thanks a lot for sharing
your thoughts!

Cheers,
Thomas


Thomas Gelf

unread,
Apr 1, 2016, 9:30:47 PM4/1/16
to puppe...@googlegroups.com
Hi Deepak,

great to hear from you. Didn't expect you to join the conversation, but
as I offended your baby... sorry for this ;)

Am 01.04.2016 um 20:42 schrieb Deepak Giridharagopal:

> ... Maybe the thing to do here is to try some experiments and post back
> some numbers that could hopefully ground the discussion with some data?

Absolutely, I guess this is one of the things I was indirectly asking
for. No objection against letting C++ do the lexing work. But please
let's get some numbers before introducing the next caching mechanism.

> The vibe I'm getting from this line of feedback is that we should
> perhaps better articulate the longer-term plan around the native
> compiler in general, instead of focusing on increments (like .xpp) that,
> absent the larger context, may seem unhelpful in their own right?

Yes, please - that would be awesome!

> "Databases are slow"
>
> We had active-records hammering our databases. The conclusion wasn't
> that someone with SQL knowledge should design a good schema. The
> publicly stated reasoning was "well, databases are slow, so we need more
> cores to hammer the database, Ruby has not threading, Clojure is cool".
> It still was slow by the end, so we added a message queue, a dead letter
> office and more to the mix.
>
>
> With respect, I think this is a pretty unfair retelling of history. Even
> in the Puppetconf talk where I introduced PDB, this was not the story.
> I'm comfortable letting all the public video footage of us discussing
> the rationale rebut this.

Sorry Deepak, it wasn't meant like this. But yes, I was sitting there in
2012, listening to your talk. You're absolutely right, these where not
your words, you said nothing like this. But given the context this was
the message one could have read between the lines. At least I did.

The introduction of PuppetDB was that single piece that was "a little
too much" for me at that time. Remember, with this we started to have to
run two different RDBMS in parallel (old Dashboard), two distinct
message queues (MCO was also there), Ruby, Java, Clojure... for many
people just to configure NTP and Apache "the right way".

It wasn't for PuppetDB, that's a cool product. And it's getting better
with every release. It was for the Puppet-picture as a whole. I was
sitting there and listened, I "hated" you in that moment. I had the
chance to personally meet you twice. You are a very intelligent,
handsome and friendly person. And I never thought I would state this in
public. But yes, I was sitting there and I really hated you. Please
believe me, it was nothing personal. I hope you will forgive me one far
day ;-)

> Queueing had nothing to do with speed of persistence...

That's true. There is nothing wrong with queuing. What I wanted to state
was that at that time for most if not all "storeconfig users" a better
schema and well-thought queries would have solved all of their issues.
Even if blocking and run by plain Ruby.

> The dead-letter-office is unrelated to performance ... so we can debug
> it more thoroughly

Sure, absolutely makes sense. But how many people do you think know that
it even exists? It's a perfectly valid feature, a wonderful addition to
solve a specific problem. I would never drop it. In the context of shiny
new XPP this should just have been an example for "a far simpler
solution would eventually also work out".

> If writes take 0.5 seconds, then you'd start failing agent runs on any
> site that was > 3600 nodes using a 30 minute runinterval.

It's 1400 new, unseen resources in 0.5 seconds. So, 3000 inserts a
second, single connection, index rebuilding, full ACID, no cheating, no
SSD. No problem to run lot's of those transactions in parallel I guess.
Single transaction, so no excessive syncing/index update involved.

MUCH faster on the next catalog for the same node, as 99% of the
resources will already be there. One single query figures out which
resources are to be stored. Nothing complex, but not that bad either. I
mentioned those numbers with the intent to say "Hey, if we need seconds
to build a catalog plus time to serialize and ship that catalog, a
fraction of a second to persist it should be fine."

PuppetDB is a great product. It also was a big investment.

> In any case, if you've got your own well-tuned system for persisting
> catalogs, you can use that in place of puppetdb if you like. You could
> reuse the puppetdb terminus and swap your backend in (the wire formats
> are documented
> here https://docs.puppetlabs.com/puppetdb/4.0/api/wire_format/catalog_format_v6.html,
> and the spec for queries are documented in the same spot). Is your code
> in a place where you can open source it?

No objection against open sourcing it. I also have no problem with
completely dropping it while trying to bring some of it's ideas to
PuppetDB. In case I'm going to make it public, I'd however need to sort
out some ideas. Just to give you an idea of what else is in the mix:

* first, I absolutely didn't want to write a new PuppetDB. It was meant
to be a quick & dirty tool to diff lot's of catalogs in preparation for
a migration to Puppet 4. Got other task to do and it was then left
untouched for a long time. It doesn't even have a Git repo.

* of course with some cleanup it could perfectly be used as a serious
alternative puppetdb terminus. But hey, there is PuppetDB, "just another
database" makes not much sense to me

* it's fact and catalog diffing capabilities are fantastic, even in it's
early stage. Before making it public I'd however love to figure out
whether and how it could scale out

* I have some very simple but nice ideas for a better Puppet module
lifecycle management in combination with this db. Wouldn't it be great
to be able to restore the exact module combination used to built a
specific historic catalog?

* I first played with the puppetdb terminus but then really fast opted
for an "inversion of control". To gain more flexibility I decided to
feed the compiler by myself. Remember, it was about running multiple
Puppet and Facter versions and comparing their catalogs. So I ended up
with flags allowing me to switch Puppet and Facter version for every
single run.

* At that time I also played a lot with Puppet over SSH. I recently
mentioned this in an Ignite talk:

http://cfgmgmtcamp.eu/schedule/ignites/Thomas.pdf

It wasn't meant as a serious proposal, it was about encouraging people
to think around the corner. But hey, it works. And as pluginsync and
file-shipping is amazingly fast that way, it easily outperforms "legacy"
Master/Agent communication.

I did different attempts to let all those prototypes work together in a
meaningful way. You can imagine what it looks like right now ;) Sure,
I'd love to make this little baby become a shiny new tool loved by many
Puppet users. However, I'm currently bound in too many parallel
projects, and sadly most of them not directly related to Puppet at all.
It would require 2-3 dedicated weeks to get ready for a first useful
release with some related documentation.

> The point about it being harder for folks to quickly fix facts behaving
> weird on their systems is one worth talking more about. Would you mind
> starting a separate thread about the debugging experience so we could
> talk through that independent of the xpp discussion?

You're right, this is the wrong thread for this. However, I wouldn't
even know how to start such a new one. I have no easy solution to
propose for this issue. A fact that behaves strange in Ruby facter can
be fixed. Even on an outdated Linux or a Unix system like AIX, Solaris,
HP-UX, whatever. cFacter IS faster. But it's an all-or-nothing choice.
Like AIO ;)

> I do think it's worth keeping in mind that there are more puppet users
> now than ever; it's a very big tent. In my humble opinion,
> generalizations about what "most average admins" can do are increasingly
> fraught with peril the bigger and more diverse our user base has gotten.

Full ack.

Thanks you Deepak for your comments, and sorry again for my little rant
against PuppetDB.

Cheers,
Thomas




Thomas Gelf

unread,
Apr 1, 2016, 9:36:08 PM4/1/16
to puppe...@googlegroups.com
Am 01.04.2016 um 20:59 schrieb R.I.Pienaar:
> Indeed and if you recall there was a similar outcry when passenger became
> the de facto way. The java stack as delivered by PL in PuppetDB and Server is
> a LOT more manageable than the passenger stack.

I do not agree on this, at least I never had and issues with passenger.
Puppet was the tool I learnt Ruby with, so no former experience. I did
"apt-get install puppetmaster-passenger" (or similar) and it worked.

> One just have to take the time to learn it - just like they did the passenger
> stack. Unlike the passenger stack you'll then discover the thing can actually
> be monitored in depth and have very mature admin tools.

Absolutely!

Henrik Lindberg

unread,
Apr 3, 2016, 9:21:46 PM4/3/16
to puppe...@googlegroups.com
On 02/04/16 01:18, Thomas Gelf wrote:
> Hi Henrik,
>
> thanks a lot for your response!
>
> Am 01.04.2016 um 20:21 schrieb Henrik Lindberg:
>> The C++ implementation is several orders of magnitudes faster than the
>> ruby implementation. i.e. something silly like tens of thousands of
>> times faster.
>
> No doubt on this, I believe you without any benchmark.
>

Although, I got an order or two too carried away with superlatives here.
We will probably end up with 100s' to 1000's times faster.

>> The Ruby lexing/parsing and validation alone can take minutes on a
>> complex set up. We have shown earlier though benchmarks that lexing
>> alone is a bottleneck in any catalog compilation - every optimization
>> there contributes greatly to the bottom line.
>
> Could you share some details on this? What kind of catalogs are you
> talking about? How many resources, parameters, how large are they - and
> what makes them so large and slow? Still, no doubt that C++ will be able
> to lex the same catalogs in a fraction of the time.
>

It is pretty much linear to the amount of source text. Though, depending
on which kind of performance we are talking about here, it does
naturally have 0 impact when an environment is cached since everything
is parsed just once.

>> We have already measured the approach. The benefit on the ruby side is
>> that the lexing is delegated to a native implementation that reads
>> binary. A spike was performed with Ruby Marshal, which also compared to
>> a native MsgPack.
>
> Ok, so basically a linked c-based lexer could give the same performance
> boost? Yes, I know, JRuby. But still, could this be true?
>

A c++ based lexer would indeed be beneficial. The parser is already
using a c++ driver, but it makes judicious call-outs to Ruby. The
construction of the AST is the second bottleneck - on par with lexing.

Thirdly, the decoupling of the lexing/parsing from the runtime makes it
possible to tabulate and compact the AST (as data is serialized). That
is less meaningful if done in the same process since it takes time
although it would reduce the number of objects managed by the ruby
runtime (less memory, lighter job for the GC).

We felt that although doable (we have this as an alternative; link the
lexer/parser/validator into the runtime (MRI or into Puppet
Server/JRuby) - that would give us a number of interesting technical
challenges to solve that we are not sure the right problems to solve.
It may be - we are not sure, but initially we rather spend time making
the c++ based lexer/parser/validator as good as possible (rather than
solving the technical packaging the most complicated way).

>> The main point here is that we are transitioning to a full
>> implementation of the puppet catalog compiler to C++ ... The use of XPP
>> makes this possible.
>
> This is where I started to feel no longer comfortable while reading the
> proposal. No caching mechanism that helped my in Puppet comes to my
> mind, but I could immediately tell a lot of anecdotes involving severe
> Puppet issues breaking whole environments just because of caching issues.
>
>> We are happy if we initially only get 5-10% out of this...
>
> And this is where I currently disagree. Very often I invest lots of time
> for just 1%. But being able to run without a fragile caching layer could
> be worth even 50% as long as I'm able to scale. When someone has to stop
> a deployment chain because he needs to troubleshoot a caching layer,
> lot's of people are sitting around and cannot work. Ask them whether
> they would have preferred to buy more hardwar.
>

We have to start somewhere and when doing so we want to apply the KISS
principle. The intent is for puppet server to automatically keep the XPP
files in sync. There may be no need for "caching" - it is simply done as
a step in atomic deploy of modified puppet code.

>> We are hoping for more though.
>
> I hope you're pretty confident on this ;)
>
yes, but unsure how quickly we get there and if facing a tradeoff of
doing more work on the Ruby side vs. spending the time on the c++ side
to make more parts of compilation work (i.e. integrating "ruby legacy")
- what we actually will decide to do. There is some fairly low hanging
fruit on the Ruby side we could speed up like the AST model itself.

>> That is not the intent (drop all Ruby things) - our plan is to make a
>> smooth transition. All "boil the ocean" strategies tends to fail, so
>> backwards compatibility and gradual change is important to us.
>
> Eeeeeeh... Sorry, this is slightly OT, but by the end it isn't. This
> "transition" is the root cause for a proposal with the potential for a
> lot of additional trouble. You do not want to "drop all the Ruby
> things", but you want to have a smooth transition. Without knowing where
> this transition should lead this sounds like a contradiction to me.
>
> So, where will this transit phase lead to? That's IMO the question that
> many would love to see answered. Will ruby still be there? So where is
> the transition? If it won't, how would it's successor look like? I guess
> you know what I mean, please enlighten us!
>

It is too premature to describe this in detail. Happy to share the ideas
which we plan to pursuit though.

At this point we have decided to try an approach where the c++ compiler
will use an RPC mechanism to talk to co-processors. When doing so it
will use the same serialization technology that is used in XPP
(basically based on the Puppet Type System). (Rationale: linking native
things into the same memory image is complex and creates vulnerabilities).

The actual implementation of types and providers are actually not needed
at compile time - only the meta data (that a resource type exists
basically + a few other details) - today we do not even validate the
attribute values of a resource type at compile time - that takes place
when the catalog is applied. Thus, all we need at compile time is the
meta data for the resource types. For this we have just started to
explore ways to provide this (tools to extract the information from the
implemented types and providers). We will continue with this after XPP.

That pretty much leaves functions written in Ruby, and hiera backends.
As a hiera backend/data provider can be thought of as functions as well,
we believe that the RPC based approach will work fine. This also to be
continued after XPP (as we then have the serialization/deserialization
parts in place in both the c++ and ruby implementations.

In the long run, in general, we want it to be possible to express as
much as possible using the Puppet Language itself, and where that is not
practical, that it is easy to integrate an implementation (written in
c++, ruby, or whatever the logic is best written in for the target).

>>> In a current Puppet ecosystem a C++ parser able to generate an AST from
>>> a .pp file to me still seems far from anything that could completely
>>> replace the current Ruby-based parser in a helpful way very soon. At
>>> least not in a real-world environment with lot's of modules, custom
>>> functions and external data sources, often provided by custom lookup
>>> functions. At least not in a way that would bring any benefit to the
>>> average Puppet user.
>>
>> The goal is to do this transparently.
>
> Sorry, couldn't follow you. Referring what?
>

"this" = integrating existing Ruby code that are implementations of
functions and lookup backends. (I think the answer to the question above
outlines the ideas for how we think this will work.

>>> So, to me the former one remains a key question to the performance
>>> benefit we could get from all this. As long as the Ruby runtime is
>>> supported, I do not really see how this could work out. But this is just
>>> a blind guess, please prove me wrong on this. ... But then we should
>>> add something else to the big picture: how should we build custom
>>> extensions and interfaces to custom data in the future? Forking plugins?
>>
>> That topic is indeed a big topic, and one that will continue as we are
>> working towards a C++ based environment. The key here is
>> interoperability where extensions are supported in Ruby, or in a
>> language it makes sense to implement them in.
>
> Shouldn't those questions be answered first? Aren't external data
> lookups, Hiera, Database persistence, plugin-sync, file-shipping and all
> the rest still far more expensive than the lexer? I would love to
> understand how my I should expect to do my daily work in a world unless
> the "smooth transition away from Ruby".
>

We (puppet labs) are working on many fronts here. XPP is not the only
work going on to speed things up in the overall process. Other teams
should talk about those things.

> It's hard to judge the value of a brick without knowing how the expected
> building should look like.
>
>> Expect to see a lot more about this later in the game.
>
> I'm sure I will. But Eric asked for feedback on XPP right now ;)
>

:-)
Some clues above to what we are thinking above. Cannot promise when we
have something more concrete to talk about - would love to be able to do
so around next Puppet Conf.

>>> * longevity of file formats: ... An AST would per definition be a lot
>>> more fragile. Why should we believe that those cache files would survive
>>> longer?
>>
>> Because the are well defined as opposed to how things were earlier where
>> things just happened to be a certain way because of how it was
>> implemented. Knowing what something means is the foundation that allows
>> it to be transformed. And when something is "all data" as opposed to
>> "all messy code", it can be processed by tools.
>
> I would mostly agree, but experience teaches me to not trust such
> statements. And your problem is: an AST is not data. It cannot be
> represented in a defined structure. And we are in a phase where even
> data types are still subject to change, with lot's of new related
> features in 4.4. All this would affect an AST, wouldn't it?
>
The AST is indeed a data structure, not even a very complicated one.
The rate of change has dramatically gone done. We rarely touch the
grammar and the AST itself, and the last couple of changes have been
additions. This is the benefit of the "expression based approach" taken
in the "future parser" - the semantics are not implemented in the
grammar, and they are not implemented as methods/behavior inside the AST
objects.

The operation is described by this function call:

evaluate(validate(parse(lex(source))))

the lex function produces tokens (a data structure; array of tokens)
the parse function produces AST (a tree data structure)
the validate function walks the AST and checks for semantic errors
the evaluate function walks the AST to evaluate its result (and side
effects)

> This wouldn't be an issue for the "C++ is our lexer" approach, but it is
> obviously essential when XPP will be used as cache files, designed to be
> shipped with modules.
>

The "shipped with modules" is what seems to be what most have concerns
about and where it seems that a "produce all of them at deploy time" is
perceived as far less complex.

As noted in the document - the requirements where thought to be where we
needed to spend more time ensuring that we define a process that works
well. (There will be revisions there :-).

>> As an example - what makes things expensive in Ruby is creation of many
>> object and garbage collection. (in lexing, each and every character in
>> the source needs to be individually processed... When this is done with
>> a C++ serializer all of the cost is on the serializing side...
>
> Ruby didn't impress me with it's unserialization speed either. So some
> cost will still be there in our overall picture. I blindly believe that
> the C++ lexer is ways faster. But the only number I'm interested in is
> the the difference between "catalog built and shipped by Ruby" and
> "catalog built while being lexed with c++, serialized, unserialized with
> Ruby and shipped with Clojure". That's the real saving.
>

Yes, it is naturally the "time to build the catalog" that everyone sees
and measures.

>> (These secondary effects have not been benchmarked in puppet, but has
>> proven to be very beneficial in implementations we have used in the past).
>
> Would be interesting. Languages behaving similar have proven to
> outperform "better" ones in specific use cases even if wasting a lot
> more memory. But honestly, no, it doesn't really interest me. But I'd
> love to learn more about what kind of catalogs you where talking about
> when you are facing minutes! of lexing time.
>

Well, not just lexing - did I say that? That was wrong. Compilations
often take minutes though. In most cases the process
validate(parse(lex(source))) shows up at the top of any compilation
profiling. Many of the other bottlenecks are more of algorithmic nature,
and are caused by things you have also found (managing thousands of
small files instead of a large file, etc).

> Even lot's of .pp files summing up to a few thousand single resources
> shouldn't require more than 10-30 MB of lexing memory (blind guess,
> didn't measure) and more than 3 seconds of parsing/validation time in
> Ruby. None of the large environments I'm playing with are facing such
> issues.
>
> Disclaimer: all "my" large ones are still running 3.x, so no idea
> whether 4.x and/or Puppet Server is so much slower - but I don't think
> so. And usually when catalogs tend to have tens of thousands of
> resources the root cause is quickly identified and easily replaced with
> a cheaper approach. Something like "Use a custom function, aggregate on
> the master, ship a single file instead of thousands" more than once
> helped to bring Puppet runs from lasting more than half an hour down to
> 10 seconds.
>
> Back to my question: could you let us know what kind of catalogs tend to
> require minutes of lexing time?
>
Basically extrapolated from benchmarks of small/medium catalog
compilation doing non crazy stuff. It assumes though that very long
compilation times are more of the same rather than user "design flaws"
(managing lots of small things vs. larger, poor design of data lookup,
poor algorithms used for data transformation etc.).

>> These concerns are shared. It is the overall process more than the lower
>> level technical things that I worry about getting right.
>
> :)
>
>> The requirements and exactly how/when/where XPPs gets created and used
>> will require an extra round or two of thought and debate.
>
> Agreed. C++ Lexer, AST handed over to Ruby, linked or not: go for it.
> XPPs on my disk: please not. Not yet. Not unless we have more experience
> with the new lexing construct. Not unless we know how to tackle various
> potential caching pitfalls in endless customized variants of Puppet
> module deployments.
>
>> Thanks you Thomas for all of the valuable comment and insights.
>
> Thank you for reading all this, Henrik - and thanks a lot for sharing
> your thoughts!
>

To be continued over beers somewhere...

- henrik

> Cheers,
> Thomas
>
>


--

John Bollinger

unread,
Apr 4, 2016, 10:36:19 AM4/4/16
to Puppet Developers


On Sunday, April 3, 2016 at 8:21:46 PM UTC-5, Henrik Lindberg wrote:

In the long run, in general, we want it to be possible to express as
much as possible using the Puppet Language itself, and where that is not
practical, that it is easy to integrate an implementation (written in
c++, ruby, or whatever the logic is best written in for the target).

I have kept my language biases to myself until now, but the implementation language(s) for extension point interfaces is a technical question.  If you want to use C++ inside then that's your call, and I won't judge.  But C++ is not well suited for external interfaces, especially if you intend to ship binaries instead of relying on users to build from source.  This is mostly because C++ has no compile-time encapsulation, and C++ has no binary implementation rules.  If you think you have trouble managing compatibility issues now, just wait until you have to deal with third-party plugins implemented against a C++ interface -- or better, just avoid that.

I'm inclined to agree that plugins written in the Puppet language itself are a good target, and it seems that Ruby plugins are likely to be a fact of life for a long time yet.  If you want a lower-level interface as well, then you could consider C for that interface.  C can integrate fairly easily with your C++ implementation, and it provides for a more stable interface.  If you want real-world cases, consider that both Ruby and Python choose C over C++ for their native interfaces.

Of course, since we're now talking about the long run, these comments may be premature.  Nevertheless, I hope to put this in folks' heads so that some thought goes into these choices when the time comes to make them, for it's all too easy to just roll ahead with whatever seems natural.


John



John

Thomas Gelf

unread,
Apr 4, 2016, 1:42:14 PM4/4/16
to puppe...@googlegroups.com
Am 04.04.2016 um 03:21 schrieb Henrik Lindberg:
>>> We are happy if we initially only get 5-10% out of this...
>>
>> And this is where I currently disagree. Very often I invest lots of time
>> for just 1%. But being able to run without a fragile caching layer could
>> be worth even 50% as long as I'm able to scale. When someone has to stop
>> a deployment chain because he needs to troubleshoot a caching layer,
>> lot's of people are sitting around and cannot work. Ask them whether
>> they would have preferred to buy more hardwar.
>>
>
> We have to start somewhere and when doing so we want to apply the KISS
> principle. The intent is for puppet server to automatically keep the XPP
> files in sync. There may be no need for "caching" - it is simply done as
> a step in atomic deploy of modified puppet code.

Details apart it seems that what we disagree on is "caching", more on
this (and a related proposal) below. Btw, until now I experienced "real"
atomic deploys only in non-standard environments. Everything that would
currently automagically let files pop up in module directories would
have a very good chance to cause trouble for a lot of environments with
something I'd like to name "custom-tuned" deployments.

>> So, where will this transit phase lead to? That's IMO the question that
>> many would love to see answered. Will ruby still be there? So where is
>> the transition? If it won't, how would it's successor look like? I guess
>> you know what I mean, please enlighten us!
>>
>
> It is too premature to describe this in detail. Happy to share the ideas
> which we plan to pursuit though.
>
> At this point we have decided to try an approach where the c++ compiler
> will use an RPC mechanism to talk to co-processors. When doing so it
> will use the same serialization technology that is used in XPP
> (basically based on the Puppet Type System). (Rationale: linking native
> things into the same memory image is complex and creates vulnerabilities).

Honestly, I expected a little bit more on this. An "RPC mechanism to
talk to co-processors" is pretty far from what I'd call an idea of how
it should work in future. Sorry for insisting, but this is IMO one of
the most essential questions the "we are moving to C++" strategy should
be able to answer.

To me, faster compilation for the cost of slower data lookups might
eventually not give a very good deal, just to give one example of my
concerns. Same for "forking" a replacement for custom functions. Running
co-processors sounds good at first, but after a single catalog build
they would be as dirty as they are now. And there will still be
different environments and versions for the very same "function".
Everything but a new fork at every run would have it's own drawbacks and
issues, wouldn't it?

As long as "custom functions" or their replacement will be able to
generate resources (what many of them do), they will have strong
influence on the generated catalog. They are also the main reason while
caching catalogs rarely made any sense. Many of the external factors
have an unpredictable influence on them, with Facter and Hiera of course
being the most prominent ones. But back to what this section was all
about: the "ruby successor".

It doesn't have to be immediately, but please try to figure out whether
you could tell us a little bit more on this. Currently for an outsider
it feels like this is still very, very unclear. But going forward and
hoping that this issue would silently vanish over time wouldn't work I
guess.

If no decision has yet been taken, why not share some details about the
possible variants that are still in the game? I guess quite some people
would love to help out with their ideas, influenced by their very own
completely different circumstances. Extensibility and ease of
customization to me was one of the key factors of Puppet's success
story. No DSL could ever replace this.

> That pretty much leaves functions written in Ruby, and hiera backends.
> As a hiera backend/data provider can be thought of as functions as well,
> we believe that the RPC based approach will work fine. This also to be
> continued after XPP (as we then have the serialization/deserialization
> parts in place in both the c++ and ruby implementations.

RPC like in XML-RPC? Like in forking a Plugin? Like in forking a plugin
through a preforking daemon?

> In the long run, in general, we want it to be possible to express as
> much as possible using the Puppet Language itself, and where that is not
> practical, that it is easy to integrate an implementation (written in
> c++, ruby, or whatever the logic is best written in for the target).

People tend to use custom functions for the most awful hacks you have
ever seen. But it works for them, by the end it solves their very own
problems. That's what they need Puppet for: getting work done. Sometimes
dirty work. There will hardly be SQL-Adapters, memcaches, message
queues, LDAP and and and in the Puppet language.

> Some clues above to what we are thinking above. Cannot promise when we
> have something more concrete to talk about - would love to be able to do
> so around next Puppet Conf.

I'll be there :)

>> I would mostly agree, but experience teaches me to not trust such
>> statements. And your problem is: an AST is not data. It cannot be
>> represented in a defined structure. And we are in a phase where even
>> data types are still subject to change, with lot's of new related
>> features in 4.4. All this would affect an AST, wouldn't it?
>>
> The AST is indeed a data structure, not even a very complicated one.
> The rate of change has dramatically gone done. We rarely touch the
> grammar and the AST itself, and the last couple of changes have been
> additions. This is the benefit of the "expression based approach" taken
> in the "future parser" - the semantics are not implemented in the
> grammar, and they are not implemented as methods/behavior inside the AST
> objects.

We are back to XPP. Sorry, my wording wasn't precise enough I guess. The
non-data "thing" I meant to talk about was the already parsed and
validated AST. So for example I didn't distinguish between lexing and
parsing. What I intended to name when I talked about "AST as data" was
more "what's written to the XPP file". And from what I understood that
will at least be lexed & parsed & validated.

Probably not evaluated, because that's where from my understanding the
"it's no longer data" starts. If I'm wrong on that: nice. If not, just
out of curiosity: is evaluation in Ruby expensive?

> The "shipped with modules" is what seems to be what most have concerns
> about and where it seems that a "produce all of them at deploy time" is
> perceived as far less complex.

Let me throw in one more idea. This "produce all of them at deploy time"
will probably only work fine if "deploy" describes a specific (atomic,
as mentioned before) process. Every possible user interference could be
troublesome. Users do not want to see those files, they do not want to
pollute their GIT workdirs.

So why not "hiding" them completely? Think more of a bytecode-cache like
opcache in PHP, rather than .pyc in Python. Doesn't even have to mirror
the module directory structure. Could be flat, structured differently,
eventually binary... Store "XPP" in a dedicated place, vardir/whatever,
with that "place" referring exactly one specific environment (or module)
in a specific version.

> Basically extrapolated from benchmarks of small/medium catalog
> compilation doing non crazy stuff. It assumes though that very long
> compilation times are more of the same rather than user "design flaws"
> (managing lots of small things vs. larger, poor design of data lookup,
> poor algorithms used for data transformation etc.).

That's what I experienced too. Catalog compilation is slow, but for me
it never turned out to be the root cause of the issues I've met. Sure,
it wouldn't hurt if it was a fraction of a second instead of "a few" or
"a little bit more than a few" seconds. But I never arrived to a point
where I would have said "OMG, we need a faster compiler, otherwise we
are lost".

So, I have absolutely no problem with any optimizations getting catalogs
compiled A LOT faster. But I do not want to pay this with the the
potential trouble "yet another caching layer" could bring. I see no
problem with "this is the bytecode cache for module X in version Y". But
I see a lot of problems with "we store related cache-files directly to
our module directories". Imagine someone going there, manually, running
"git checkout v4.0.3" for a specific module. Sure, he (or his tool) is
then doing it wrong. But that's gonna be hard to argue I guess.

> To be continued over beers somewhere...

I'd love to join you :)

Thomas


Henrik Lindberg

unread,
Apr 4, 2016, 2:40:20 PM4/4/16
to puppe...@googlegroups.com
On 04/04/16 19:42, Thomas Gelf wrote:
> Am 04.04.2016 um 03:21 schrieb Henrik Lindberg:
>>>> We are happy if we initially only get 5-10% out of this...
>>>
>>> And this is where I currently disagree. Very often I invest lots of time
>>> for just 1%. But being able to run without a fragile caching layer could
>>> be worth even 50% as long as I'm able to scale. When someone has to stop
>>> a deployment chain because he needs to troubleshoot a caching layer,
>>> lot's of people are sitting around and cannot work. Ask them whether
>>> they would have preferred to buy more hardwar.
>>>
>>
>> We have to start somewhere and when doing so we want to apply the KISS
>> principle. The intent is for puppet server to automatically keep the XPP
>> files in sync. There may be no need for "caching" - it is simply done as
>> a step in atomic deploy of modified puppet code.
>
> Details apart it seems that what we disagree on is "caching", more on
> this (and a related proposal) below. Btw, until now I experienced "real"
> atomic deploys only in non-standard environments. Everything that would
> currently automagically let files pop up in module directories would
> have a very good chance to cause trouble for a lot of environments with
> something I'd like to name "custom-tuned" deployments.
>

I am saying it may be thought of as "non caching" if each start of an
environment produces XPP for every .pp file. Since the C++ is much
faster we should still come out a head. It could perhaps do that brute
force every time. If it is faster to skip doing it if files are up to
date, then we may do that.

>>> So, where will this transit phase lead to? That's IMO the question that
>>> many would love to see answered. Will ruby still be there? So where is
>>> the transition? If it won't, how would it's successor look like? I guess
>>> you know what I mean, please enlighten us!
>>>
>>
>> It is too premature to describe this in detail. Happy to share the ideas
>> which we plan to pursuit though.
>>
>> At this point we have decided to try an approach where the c++ compiler
>> will use an RPC mechanism to talk to co-processors. When doing so it
>> will use the same serialization technology that is used in XPP
>> (basically based on the Puppet Type System). (Rationale: linking native
>> things into the same memory image is complex and creates vulnerabilities).
>
> Honestly, I expected a little bit more on this. An "RPC mechanism to
> talk to co-processors" is pretty far from what I'd call an idea of how
> it should work in future. Sorry for insisting, but this is IMO one of
> the most essential questions the "we are moving to C++" strategy should
> be able to answer.
>

It is too early to talk about as there are experiments to carry out,
measurements to be made etc and things to think through and express in
words.

> To me, faster compilation for the cost of slower data lookups might
> eventually not give a very good deal, just to give one example of my
> concerns. Same for "forking" a replacement for custom functions. Running
> co-processors sounds good at first, but after a single catalog build
> they would be as dirty as they are now. And there will still be
> different environments and versions for the very same "function".
> Everything but a new fork at every run would have it's own drawbacks and
> issues, wouldn't it?
>

First, we have new hiera 4 based data providers that now live inside of
puppet (for json and yaml). They will be reimplemented in C++ and live
inside of the main process.
I don't necessarily think that a lookup using a RPC will be any slower
than one that is currently doing the same work in Ruby. I think it will
come out on par.

Regarding life cycle of co-processors; this is not yet designed. I am
inclined to keep things simple and compilation is a one shot,
co-processors hang around until compilation is done, then everything is
teared down. That fits well with Ruby in general as it starts fast but
runs slowly and bloats quickly. Experiments and measurements to support
the ideas are naturally required.

> As long as "custom functions" or their replacement will be able to
> generate resources (what many of them do), they will have strong
> influence on the generated catalog. They are also the main reason while
> caching catalogs rarely made any sense. Many of the external factors
> have an unpredictable influence on them, with Facter and Hiera of course
> being the most prominent ones. But back to what this section was all
> about: the "ruby successor".
>
> It doesn't have to be immediately, but please try to figure out whether
> you could tell us a little bit more on this. Currently for an outsider
> it feels like this is still very, very unclear. But going forward and
> hoping that this issue would silently vanish over time wouldn't work I
> guess.
>

When presented I prefer if it is coherent and backed with some facts and
experiments. I can opine, but I don't think that is particularly
valuable as I am also prepared to change my opinion as we learn what
will work.

> If no decision has yet been taken, why not share some details about the
> possible variants that are still in the game? I guess quite some people
> would love to help out with their ideas, influenced by their very own
> completely different circumstances. Extensibility and ease of
> customization to me was one of the key factors of Puppet's success
> story. No DSL could ever replace this.
>

At this point we are more inclined to favor smaller things talking to
each other than a new big ball of vax. We are also going to be focused
on APIs. Cannot say that we have completely ruled anything out. Atm. we
are focusing on:

* Getting the c++ based parser to be on par with the Ruby impl (nothing
much will work unless that is done)
* Try to provide value to users sooner rather than later (XPP).

>> That pretty much leaves functions written in Ruby, and hiera backends.
>> As a hiera backend/data provider can be thought of as functions as well,
>> we believe that the RPC based approach will work fine. This also to be
>> continued after XPP (as we then have the serialization/deserialization
>> parts in place in both the c++ and ruby implementations.
>
> RPC like in XML-RPC? Like in forking a Plugin? Like in forking a plugin
> through a preforking daemon?
>

Too early to talk about. Very unlikely that it will involve XML ;-)
It needs to be something that is very fast (i.e. this is not based on
REST) - technically some kind of IPC mechanism, or possibly a socket.

>> In the long run, in general, we want it to be possible to express as
>> much as possible using the Puppet Language itself, and where that is not
>> practical, that it is easy to integrate an implementation (written in
>> c++, ruby, or whatever the logic is best written in for the target).
>
> People tend to use custom functions for the most awful hacks you have
> ever seen. But it works for them, by the end it solves their very own
> problems. That's what they need Puppet for: getting work done. Sometimes
> dirty work. There will hardly be SQL-Adapters, memcaches, message
> queues, LDAP and and and in the Puppet language.
>

True. What people invent and share though are typically not advanced
things like that. (90% of stdlib can probably be replaced with puppet
logic today).

>> Some clues above to what we are thinking above. Cannot promise when we
>> have something more concrete to talk about - would love to be able to do
>> so around next Puppet Conf.
>
> I'll be there :)
>
>>> I would mostly agree, but experience teaches me to not trust such
>>> statements. And your problem is: an AST is not data. It cannot be
>>> represented in a defined structure. And we are in a phase where even
>>> data types are still subject to change, with lot's of new related
>>> features in 4.4. All this would affect an AST, wouldn't it?
>>>
>> The AST is indeed a data structure, not even a very complicated one.
>> The rate of change has dramatically gone done. We rarely touch the
>> grammar and the AST itself, and the last couple of changes have been
>> additions. This is the benefit of the "expression based approach" taken
>> in the "future parser" - the semantics are not implemented in the
>> grammar, and they are not implemented as methods/behavior inside the AST
>> objects.
>
> We are back to XPP. Sorry, my wording wasn't precise enough I guess. The
> non-data "thing" I meant to talk about was the already parsed and
> validated AST. So for example I didn't distinguish between lexing and
> parsing. What I intended to name when I talked about "AST as data" was
> more "what's written to the XPP file". And from what I understood that
> will at least be lexed & parsed & validated.
>

Yes, lexed, parsed, validated.

> Probably not evaluated, because that's where from my understanding the
> "it's no longer data" starts. If I'm wrong on that: nice. If not, just
> out of curiosity: is evaluation in Ruby expensive?
>

No, not evaluated - that is done as part of compilation.

>> The "shipped with modules" is what seems to be what most have concerns
>> about and where it seems that a "produce all of them at deploy time" is
>> perceived as far less complex.
>
> Let me throw in one more idea. This "produce all of them at deploy time"
> will probably only work fine if "deploy" describes a specific (atomic,
> as mentioned before) process. Every possible user interference could be
> troublesome. Users do not want to see those files, they do not want to
> pollute their GIT workdirs.
>

Yes, exactly.

> So why not "hiding" them completely? Think more of a bytecode-cache like
> opcache in PHP, rather than .pyc in Python. Doesn't even have to mirror
> the module directory structure. Could be flat, structured differently,
> eventually binary... Store "XPP" in a dedicated place, vardir/whatever,
> with that "place" referring exactly one specific environment (or module)
> in a specific version.
>

Yes, we will probably do something like that.

>> Basically extrapolated from benchmarks of small/medium catalog
>> compilation doing non crazy stuff. It assumes though that very long
>> compilation times are more of the same rather than user "design flaws"
>> (managing lots of small things vs. larger, poor design of data lookup,
>> poor algorithms used for data transformation etc.).
>
> That's what I experienced too. Catalog compilation is slow, but for me
> it never turned out to be the root cause of the issues I've met. Sure,
> it wouldn't hurt if it was a fraction of a second instead of "a few" or
> "a little bit more than a few" seconds. But I never arrived to a point
> where I would have said "OMG, we need a faster compiler, otherwise we
> are lost".
>

It is also a matter of scale. While the single threaded performance may
not be that important (if it is 5 or 10 seconds). But when that
translates to "twice the cost", or "you cannot have that many agents on
a single master" it becomes a real problem.

> So, I have absolutely no problem with any optimizations getting catalogs
> compiled A LOT faster. But I do not want to pay this with the the
> potential trouble "yet another caching layer" could bring. I see no
> problem with "this is the bytecode cache for module X in version Y". But
> I see a lot of problems with "we store related cache-files directly to
> our module directories". Imagine someone going there, manually, running
> "git checkout v4.0.3" for a specific module. Sure, he (or his tool) is
> then doing it wrong. But that's gonna be hard to argue I guess.
>
>> To be continued over beers somewhere...
>
> I'd love to join you :)
>
Cheers.

- henrik

Henrik Lindberg

unread,
Apr 4, 2016, 2:47:45 PM4/4/16
to puppe...@googlegroups.com
On 04/04/16 19:42, Thomas Gelf wrote:
> Probably not evaluated, because that's where from my understanding the
> "it's no longer data" starts. If I'm wrong on that: nice. If not, just
> out of curiosity: is evaluation in Ruby expensive?

I forgot - yes evaluation in Ruby is also slow.

- henrik

Kylo Ginsberg

unread,
Apr 5, 2016, 10:25:54 AM4/5/16
to puppe...@googlegroups.com
On Mon, Apr 4, 2016 at 7:36 AM, John Bollinger <john.bo...@stjude.org> wrote:


On Sunday, April 3, 2016 at 8:21:46 PM UTC-5, Henrik Lindberg wrote:

In the long run, in general, we want it to be possible to express as
much as possible using the Puppet Language itself, and where that is not
practical, that it is easy to integrate an implementation (written in
c++, ruby, or whatever the logic is best written in for the target).

I have kept my language biases to myself until now, but the implementation language(s) for extension point interfaces is a technical question.  If you want to use C++ inside then that's your call, and I won't judge.  But C++ is not well suited for external interfaces, especially if you intend to ship binaries instead of relying on users to build from source.  This is mostly because C++ has no compile-time encapsulation, and C++ has no binary implementation rules.  If you think you have trouble managing compatibility issues now, just wait until you have to deal with third-party plugins implemented against a C++ interface -- or better, just avoid that.

You are absolutely correct, on several counts:

* C++ as a generic extension point (e.g. for 3rd party plugins) doesn't work for the reasons you point out. That is simply a technical fact about C++ and in no way specific to the Puppet ecosystem. As such, C++ as an extension point is nowhere on any plans or conversations I've ever been involved with. There seems to be some FUD around this point, so it's really important to emphasize: C++ is not an external interface point.

* It's very important to distinguish language choice for core *implementation* vs language choice for *extension* points. E.g. while some folks at Puppet Labs are working on porting some of puppet's core functionality from Ruby to either C++ or Clojure, neither of those is intended (or in the case of C++, even viable) for use as an extension point language. (Though, if you want to write external facts or puppet subcommands in C++ or Clojure or Lua or Haskell, knock yourselves out: that wouldn't be an extension point language choice, that would be a language choice for an external binary that puppet runs as instructed.)


I'm inclined to agree that plugins written in the Puppet language itself are a good target, and it seems that Ruby plugins are likely to be a fact of life for a long time yet.

Absolutely. Ruby plugins will be around for the foreseeable future and that's a good thing. At the same time, we've introduced more options for Puppet plugins (e.g. EPP, functions in puppet) and would like to add more in the future (e.g. types in puppet). I tried to capture this direction, along with the core vs extensions notion, in an "at some point in the future" slide in my PuppetConf talk last year: https://speakerdeck.com/kylog/under-the-hood-c-plus-plus-at-puppet-labs?slide=27

In short, there's a tremendous amount of fabulous puppet module content out there, written in a mix of puppet and ruby, and we absolutely intend to emphasize backwards compatibility so that that content can be leveraged while the underlying platform becomes higher performing and more scalable.
 
If you want a lower-level interface as well, then you could consider C for that interface. 

I have had a few people ask about this. Honestly I think those asks were pretty idle, but yes, if there were demand we could in theory support C as an extension point language at some point in the future. That would need some serious RFC and design work, and feels quite far out to be honest.
 
C can integrate fairly easily with your C++ implementation, and it provides for a more stable interface.  If you want real-world cases, consider that both Ruby and Python choose C over C++ for their native interfaces.

Yep, totally agreed. As you stated well above, C++ isn't suitable as an external interface language. A fairly common pattern for those who want to provide an external interface to a C++-implemented library is to wrap a C interface on top of it.
 

Of course, since we're now talking about the long run, these comments may be premature.  Nevertheless, I hope to put this in folks' heads so that some thought goes into these choices when the time comes to make them, for it's all too easy to just roll ahead with whatever seems natural.

Great comments - thanks!

Kylo
 


John



John

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Kylo Ginsberg | ky...@puppetlabs.com | irc: kylo | twitter: @kylog

PuppetConf 2016October 17-21San Diego, California
Early Birds save $350 - Register by June 30th

Henrik Lindberg

unread,
Apr 5, 2016, 11:25:14 AM4/5/16
to puppe...@googlegroups.com
On 30/03/16 18:24, Eric Sorenson wrote:
> Hi, I've just posted a new Puppet RFC that describes pre-parsed and
> pre-validated Puppet files, akin to '.pyc' files for Python. It's called
> XPP and the doc is open for comments here:
>
> https://docs.google.com/document/d/17SFn_2PJYcO5HjgA4R65a5ynR6_bng_Ak5W53KjM4F8/edit?usp=sharing
>

Just a quick note that I am now editing that document to make a revision
where received comments have been addressed.

Will announce when there is a second draft.

- henrik
Reply all
Reply to author
Forward
0 new messages