Hi Eric,
your dedication in getting Puppet faster is really appreciated. My post
is absolutely not in favor of XPP, but please don't get me wrong: it is
meant to be a constructive contribution to the current design process.
In my personal opinion we have a sad history of optimizations focusing a
lot on blaming different languages and tools. Puppet often created fancy
new tools with new languages and components, but we rarely tackled the
root causes of our problems. This would be off topic, but I guess I'll
add a few examples by the end of this mail to let you understand what I
mean.
So let me start with the stated "Problems":
* Performance: I didn't do any measurements, but I guess the compiler
spends more time in resolving dependencies and traversing graphs than it
does in parsing and validating .pp files. Not to mention a lot of compat
hacks, alias-handling voodoo, insane Hiera lookups, type validation for
those lookups and legacy support hacks. So do you have any related
numbers? Where is most of the time spent when building and shipping
(real-world) catalogs? Are you really sure an AST-cache (per manifest?!)
would be worth the effort and solve the "performance problem"? I guess
the C++ parser itself is not so slow that it already needs an AST cache,
because then there would be something wrong with it.
* Cross-Language support: You wrote that the C++ parser needs to provide
the compiled AST to the Ruby runtime. Makes sense to me. But parsing .pp
files with C++, serializing them to a custom not yet designed format,
parsing that custom format with Ruby again and then re-resolve all
(most, some?) dependency graphs across the whole catalog with Ruby...
this doesn't sound like something that could help with getting things
faster. Sure, it would help the C++ parser to hand over it's AST. Or
store it to disk. But would this speed up the whole process? I have some
serious doubts in that relation.
IMHO this wouldn't help much, at least not unless "drop all Ruby
interfaces in the long run" is the final goal on your agenda. In that
case please let us know. Those who want to support that goal could unite
their forces to get it accomplished as fast as possible, the others
would at least know what to expect.
In a current Puppet ecosystem a C++ parser able to generate an AST from
a .pp file to me still seems far from anything that could completely
replace the current Ruby-based parser in a helpful way very soon. At
least not in a real-world environment with lot's of modules, custom
functions and external data sources, often provided by custom lookup
functions. At least not in a way that would bring any benefit to the
average Puppet user.
So, to me the former one remains a key question to the performance
benefit we could get from all this. As long as the Ruby runtime is
supported, I do not really see how this could work out. But this is just
a blind guess, please prove me wrong on this. Obviously the C++ Puppet
will be faster as soon as you drop the Ruby runtime. But then we should
add something else to the big picture: how should we build custom
extensions and interfaces to custom data in the future? Forking plugins?
Talking with web services? Because adding a C++ compiler to a (dev)ops
deployment pipeline will not convince many people I guess.
Everything that comes to my mind has it's very own performance impact.
We should know what to expect in that direction to be able to understand
what needs to be added to our (performance) calculation. As of this
writing and from what I know from mailing lists, Puppet Conf (and Camps)
to me the C++ parser is still an academic construct able to generate an
AST in a fast way. Nice for sure, but not (yet) any benefit in a
real-world Puppet scenario. Of course I might be missing some parts of
your big picture, involving strategic product-related features not yet
known to the public.
But please do forget that the extensibility of a tool is one of the key
features of any OpenSource software. Ops people didn't choose good old
Nagios because of it's "beautiful" frontend and it's "well-designed"
plugin API. They are using it because everyone from students to 60 years
old UNIX veterans are able to write something they use to call a
"plugin". Mostly awful snippets of Bash or Perl, not worth to be called
software. But doing customized crazy shit running on millions of
systems, available since nearly 20 years without breaking compatibility.
Of course there is Icinga right now ;) New Core, C++, shiny new web...
but still running those ugly old plugins. They are awful, they are
terrible, we all hat them. But lots of people invested a lot of time in
them, so breaking them is a no-go.
No one I know currently understands how existing "interfaces" (being
plain Ruby) fit in if your C++ plans. There is a lot of uncertainty
amongst (skilled) Puppet users regarding that right now. Some public
clarification would definitively help to smooth the waters. If your
plans include dropping that part in favor of restricted EPP and
DSL-based "functions" please let us know. It will be faster, for sure.
But it will be a different product with different (restricted)
possibilities. In that case I would prefer to be among the first ones
leaving the ship instead of being treated like the famous slowly boiled
frog.
But let's get back to the next point in your proposal, "requirements":
* publishing modules as XPP: I guess building an AST for a module would
take less time than checking out the very same module with r10k from
your local GIT repository. Even with "slow Ruby code". So IMO there are
no real benefits for this, but lots of potential pitfalls, insecurities,
bugs. If you need this to provide obfuscated Enterprise-only modules in
the future... well, it's your choice.
* longevity of file formats: what makes you think that Puppet will
change slower in the near future? Today there is no way to run many
Puppet 3.x Manifests with Puppet 4.x, and those are plain .pp files. An
AST would per definition be a lot more fragile. Why should we believe
that those cache files would survive longer?
* Efficient serialization is key to the success of XPP: you name it. And
please do not forget that efficient unserialization is far more
important. This will not take zero time and happens as often as a .pp
file is parsed today.
"Non-goals":
* If XPP will be plaintext it would obviously be not that fast, but
that's still fine for me
* I also have no problem with a serialized format not readably by human
beings. I will happily live with any binary format as long as you keep
YAML and similar diseases far away from me ;-)
"Proposal":
* XPP file handling in general sounds good to me
* I have some doubts when it goes to checking whether that file is "up
to date". Race conditions and issues when people are manually copying
files come to my mind.
* a safe way to solve this could be xpp files carrying source file
checksums in their name, but of course that would then be more costly as
it involves generating and validating checksums all the time. Outdated
XPP files must be removed.
* You know that people use r10k or custom tools to just checkout
specific tags or commit IDs again and again? Sometimes directly in their
module path. I work with customers where every 2-5 minutes the whole day
long someone pushes a new Puppetfile in an automated way. How would that
fit with your XPP model? Should Puppet (r10k, whoever) re-check/generate
all of them with every deployment? Every few minutes?
Also please to not underestimate the potential pitfalls for users when
trusting file modification times. We could run into a support nightmare.
We all know, writing a cache is not an easy task.
"API, switches, protocols":
* looks good to me
"Interfaces modified or extended":
* I see there is some discussion of whether XPP files should reside in
the module directories or in a mirrored structure. Well, caught between
a rock and a hard place - good luck :D
"Diagnostics of XPP"
* msgpack: well... mmmmhok
* shebang: there are enough comments, nothing to add
* pcore part, shebang line, mime type: you already define three
different kinds of version/subformat headers in a draft for a new
format. Not good.
* mime type: a container for a bunch of different formats doesn't make a
good format to me. Are you really sure that implementing AST
serialization for C++ and Ruby (and others?) with different formats for
all of those is a good idea? Msgpack AND JSON (easy) AND YAML (which
version?
* regarding YAML: how to protect against code injection? A slow
Ruby-based parser, once again?
* you mention JAR files as an example. They are used for deployment
reasons, not for speed. XPP borrows some ideas from a JAR. A JAR is a
container for convenience, it makes it easy to ship multiple files.
However, it makes reading files slower, that's why they are being
deployed (and extracted) on the target system. The resulting class file
is what XPP should try to look like if it wants to bring any benefit. At
least as long as you do not plan to store all pp files of a module in a
single .xpp file - but that would be even slower for many use cases. And
please note that class files are binary for a good reason: speed.
* you mention pyc files. They are binary, contain marshalled code
objects, once again being binary and native to Python. Same story as
above. There IS a reason why they are fast. Doesn't fit our current XPP
scenario with nested text formats.
Next point, "Alternatives":
* byte code oriented format: absolutely. If you want to have a fast AST
cache, this would help. Still, please add the time eventually needed for
eval'uating the (already parsed) AST with Ruby to the calculation.
* wait until the C++ compiler is implemented: also a very good idea. And
not only this, wait not only until it is implemented but also until we
know where the whole ecosystem (Ruby functions, interfaces, Ruby-based
"indirections") should move. Once you know how they will look like we
will know better how to tune all this. Parsing and validating plain .pp
files probably involves a fraction of the computing resources a Puppet
master spends today. Ruby is far from being our main problem here.
* embedding the C++ parser in Ruby would be a good and efficient
approach. Good luck with Clojure and Jruby ;)
* produce the .xpp also with Ruby: IMO a must. You will definitively run
into compatibility issues between your different parsers. No easy way to
discover them in an automated way without this feature.
* fork the C++ parser: now it is getting scary. Sure, why not. But
(un)serialization cost in one way or the other remains, doesn't it?
"Additional Concerns":
* "Compatibility": when you really allow different shebang lines,
different serialization formats, XPP shipped with forge modules,
auto-generated in your code deployment procedure, some people using
other deployment mechanism, timestamp issues... all this together could
result in a big mess.
* "Security": you are right with "no extra impact", but I would add the
possibility for new attack vectors eventually hidden to validation tools
as soon as you add YAML (as mentioned in the draft) to the mix
* "Documentation": I do not agree that this would not be necessary. XPP
(when implemented) will be a key component of all deployments. People
WILL build custom tools around it. It's better to state clearly how
things are designed instead of letting everybody figure out by
themselves how to do black magic.
* Spin offs: wooo... this adds a lot of new players to the mix, while
still being pretty vague. JavaScript? Then better stay with JSON ond
forget about MsgPack. And how should a browser handle YAML?
* C++ parser as a linter: makes sense to me
* Encrypt XPP files: would not make them faster. While I'm an absolute
fan of signed packages, I do not see a use for this on an XPP file level
That was a lot of text, sorry :) And thank you for reading all this. My
conclusion: the XPP file draft is an early optimization of something
fitting in an ecosystem still very vaguely defined. If ever implemented,
it should be postponed. I'm sure the C++ parser is a lot faster than the
Ruby-based one. But hey, if I start up IRB, require 'puppet' (still 3.4
on my local Ubuntu desktop) - then it takes Puppet 0,03s to load and
validate a random 5KB .pp file. This is not very fast, but I see no
urgent problem with this.
And as initially mentioned, this leads me to my last point - a few
examples of similar findings and "optimizations" we enjoyed in the past:
"Databases are slow"
We had active-records hammering our databases. The conclusion wasn't
that someone with SQL knowledge should design a good schema. The
publicly stated reasoning was "well, databases are slow, so we need more
cores to hammer the database, Ruby has not threading, Clojure is cool".
It still was slow by the end, so we added a message queue, a dead letter
office and more to the mix.
Just to give you some related numbers to compare: a year or two ago I
wrote a prototype for (fast) catalog-diffs. My test DB still carries
2400 random catalogs with an average of 1400 resources per catalog, 18+
million single resource parameters in total. Of course far less rows in
the DB as of checksum-based "de-duplication". But this is "real" data.
The largest catalog has nearly 19,000 resources, the smallest one 450.
Once again, no fake data, real catalogs collected over time from real
environments.
Storing an average catalog (1400 resources, cached JSON is 0,5-1MB)
takes as far as I remember less than half a second all the times. For
most environments something similar should perfectly be doable to
persist catalogs as soon as compiled. Even in a blocking mode with no
queue and a directly attached database in plain Ruby.
"Facter is slow"
Reasoning: Ruby is slow, we need C++ for a faster Facter. But Facter
itself never was the real problem. When loaded from Puppet you can
neglect also it's loading time. The problem were a few silly and some
more not-so-good single fact implementations. cFacter is mostly faster
because those facts been rewritten when they were implemented in C++.
Still, as long as we have custom facts cFacter still needs to fork Ruby.
And there it looses the startup time it initially saved. I guess the
Ruby-based Puppet requires 'facter' instead of forking it. I could be
wrong here. Still, the optimization was completely useless. But as a
result of all this as of today it is harder for people to quickly fix
facts behaving wrong on their systems. Combined with a C++-based Agent
cFacter still could make sense as Puppetlabs wants to support more
platforms. And even this argument isn't really valid. I'm pretty sure
there are far more platforms with Ruby support than ones with a
supported Puppet AIO package.
"Puppet-Master is slow"
Once again, Ruby is slow we learned. We got Puppet Server. I've met (and
helped) a lot of people that had severe issues with this stack. I'm
still telling anyone to not migrate unless there is no immediate need
for doing so. Most average admins are perfectly able to manage and scale
a Ruby-based web application. To them, Puppet Server is a black box.
Hard to manage, hard to scale. For many of them it's the only Java-based
application server they are running, so no clue about JVM memory
management, JMX and so on.
And I still need to see the one Puppet Server that is running faster
than a Puppet Master in the same environment. Preferably with equal
resource consumption.
Should I go on with PCP/PXP? I guess that's enough so far, I think you
understood what I mean.
With what I know until now, C++ Puppet and XPP would make perfect next
candidates for this hall of "fame". But as mentioned above, I'd love to
be proven wrong an all this. I'm neither a Ruby fanboy nor do I have
objections against C++. All I'm interested in is running my beloved
Puppet hassle-free in production, not wasting my time for caring about
the platform itself. I'd prefer to dedicate it to lots of small ugly
self-written modules breaking all of the latest best practices I can
find on the web ;-)
Cheers,
Thomas