Announce: PuppetDB 0.9.0 (first release) is available

937 views
Skip to first unread message

Michael Stahnke

unread,
May 18, 2012, 10:21:26 AM5/18/12
to puppet...@googlegroups.com, puppet-...@googlegroups.com, puppe...@googlegroups.com
PuppetDB, a component of the Puppet Data Library, is a centralized storage
daemon for auto-generated data. This initial release of PuppetDB targets the
storage of catalogs and facts:

* It’s a drop-in, 100% compatible replacement for storeconfigs
* It’s a drop-in, 100% compatible replacement for inventory service
* It hooks into your Puppet infrastructure using Puppet’s pre-existing
extension points (catalog/facts/resource/node terminuses)
* It’s much faster, much more space-efficient, and much more scalable
than current
storeconfigs and the current inventory service.
* We can handle a few thousand nodes, with several hundred
resources each, with a 30m
runinterval on our laptops during development.
* It stores the entire catalog, including all dependency and
containment information
* It exposes well-defined, HTTP-based methods for accessing stored information
* Documented at http://docs.puppetlabs.com/puppetdb
* It presents a superset of the storeconfigs and inventory service
APIs for use in scripts
or by other tools
* In particular, we support arbitrarily nested boolean operators
* It decouples catalog and fact storage from the compilation process
* Goodbye puppetq...PuppetDB subsumes it
* It works Very Hard to store everything you send it; we auto-retry
all storage requests, persist
storage requests across restarts, and preserve full traces of all
failed requests for
post-mortem analysis
* It’s secured using Puppet’s built-in SSL infrastructure
* It’s heavily instrumented and easy to integrate its performance info into
your monitoring frameworks

As this is the first public release, the version is 0.9.0 (a.k.a. “open beta”).
While we’ve been using PuppetDB internally at Puppet Labs for months without
incident, we encourage you to try it out, hammer it with data, and let us know
if you run into any issues! A 1.0 release will come after a few cycles of bug
squashing.

# Downloads

Available in native package format at

http://yum.puppetlabs.com

http://apt.puppetlabs.com

Source (same license as Puppet): http://github.com/puppetlabs/puppetdb

Available for use with Puppet Enterprise 2.5.1 and later at

http://yum-enterprise.puppetlabs.com/ and http://apt-enterprise.puppetlabs.com/

# Documentation (including how to install): http://docs.puppetlabs.com/puppetdb

# Issues can be filed at:
http://projects.puppetlabs.com/projects/puppetdb/issues


Michael Stahnke
Community Manager
Puppet Labs

Krzysztof Wilczynski

unread,
May 18, 2012, 10:40:31 AM5/18/12
to puppet...@googlegroups.com, puppet-...@googlegroups.com, puppe...@googlegroups.com
Hi,

Awesome sauce, definitely +1 :)

KW

Philip Brown

unread,
May 18, 2012, 10:48:15 AM5/18/12
to puppet...@googlegroups.com, puppet-...@googlegroups.com, puppe...@googlegroups.com


On Friday, May 18, 2012 7:21:26 AM UTC-7, Michael Stanhke wrote:
PuppetDB, a component of the Puppet Data Library, is a centralized storage
daemon for auto-generated data. This initial release of PuppetDB targets the
storage of catalogs and facts:
...

As this is the first public release, the version is 0.9.0 (a.k.a. “open beta”).
While we’ve been using PuppetDB internally at Puppet Labs for months without
incident, we encourage you to try it out, hammer it with data, and let us know
if you run into any issues! A 1.0 release will come after a few cycles of bug
squashing.


Sounds interesting.  I have a question about integration;

How does this interact or integrate with puppet dashboard?
Is this an "either one or the other" sort of thing? Or do they play nicely together? or do they actually do completely different things?

 

Alessandro Franceschi

unread,
May 18, 2012, 1:02:20 PM5/18/12
to puppet...@googlegroups.com, puppet-...@googlegroups.com, puppe...@googlegroups.com
Wow, these are great news!

I've just installed it on an Ubuntu 12.04 and it was really painless.
For whoever might be interested I made an instant module for this:
https://github.com/example42/puppet-puppetdb 
with default Example42 NextGen layout (so it still misses puppetdb specific resources and the PuppetMaster integration).

Reading the docs it opens a universe of possible uses.. Looking forward to start to integrate it on a real environment... too bad it's friday :-D

+1 guys!
(and +1 to Brice for the huge work he did on Store Configs that made possible great things on Puppet ... and that we 're probably going to trash away soon :-)

al

Glenn Bailey

unread,
May 18, 2012, 1:09:01 PM5/18/12
to puppet...@googlegroups.com
I noticed it can't use MySQL, where as Puppet Dashboard can only use
MySQL. Does the dashboard plan on moving to something else or am I
going to have to run to separate DB providers in order to use PuppetDB
and Dashboard?

Nick Fagerlund

unread,
May 18, 2012, 1:20:48 PM5/18/12
to puppet...@googlegroups.com


On Friday, May 18, 2012 10:09:01 AM UTC-7, replicant wrote:

I noticed it can't use MySQL, where as Puppet Dashboard can only use
MySQL. Does the dashboard plan on moving to something else or am I
going to have to run to separate DB providers in order to use PuppetDB
and Dashboard?

We'll be migrating Dashboard to use PostgreSQL... eventually. It's part of a much larger migration plan, and it's going to take a while, so for the time being, you'll be stuck using both DB servers. We're working on it!

Deepak Giridharagopal

unread,
May 18, 2012, 7:44:37 PM5/18/12
to puppet...@googlegroups.com

Dashboard uses the inventory service API to do its queries for things like nodes and facts. As luck would have it, PuppetDB's terminuses implement that same API. If you configure your Puppetmaster so that PuppetDB handles inventory service queries (this is covered in the installation docs), and point Dashboard at your Puppetmaster, things will Just Work.

PuppetDB doesn't (yet) do report storage, so that will continue to work inside of Dashboard as it always has.

Cheers,
deepak

--
Deepak Giridharagopal / Puppet Labs

Erik Dalén

unread,
May 21, 2012, 7:11:07 AM5/21/12
to puppet...@googlegroups.com
Do you plan on merging it so that the inventory service node queries
can use classes and parameters in the search queries?

I know this is already possible using foreman.

--
Erik Dalén

Marc Zampetti

unread,
May 21, 2012, 1:02:10 PM5/21/12
to puppet...@googlegroups.com, Erik Dalén
Is Puppet Labs saying they are ending support of MySQL and instead will
only support PostgreSQL? That is going to be a big problems for shops
that do not support PostgresSQL, or are only allowed to run DB systems
on an approved list. Why wouldn't a DB-agnostic model be used?

Right now, I can say that due to these types of issues, I cannot even
evaluate PuppetDB, and will not be able to for the foreseeable future.

Also, does this mean that the existing inventory service and store
configs functionality goes away?

Deepak Giridharagopal

unread,
May 21, 2012, 3:33:44 PM5/21/12
to puppet...@googlegroups.com
On Mon, May 21, 2012 at 11:02 AM, Marc Zampetti <marc.z...@gmail.com> wrote:
Is Puppet Labs saying they are ending support of MySQL and instead will only support PostgreSQL? That is going to be a big problems for shops that do not support PostgresSQL, or are only allowed to run DB systems on an approved list.

What is on your approved list? Oracle? SQL Server? DB2? That kind of information helps us plan things for the future.
 
Why wouldn't a DB-agnostic model be used?

The short answer is performance. To effectively implement things we've got on our roadmap, we need things that (current) MySQL doesn't support: array types are critical for efficiently supporting things like parameter values, recursive query support is critical for fast graph traversal operations, things like INTERSECT are handy for query generation, and we rely on fast joins (MySQL's nested loop joins don't always cut it). It's much easier for us to support databases with these features than those that don't. For fairly divergent database targets, it becomes really hard to get the performance we want while simultaneously keeping our codebase manageable.
 

Right now, I can say that due to these types of issues, I cannot even evaluate PuppetDB, and will not be able to for the foreseeable future.

How many hosts do you have? Would the built-in, embedded database work for you as an interim solution?
 

Also, does this mean that the existing inventory service and store configs functionality goes away?

The existing inventory service API is still supported, and in fact PuppetDB works as a backing store for that API. So tools and code that use that API currently will continue to work. Puppet 3.0 still includes the old ActiveRecord-based storeconfigs backend, which still works.

deepak
 
--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To post to this group, send email to puppet...@googlegroups.com.
To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.


Marc Zampetti

unread,
May 21, 2012, 4:04:34 PM5/21/12
to puppet...@googlegroups.com, Deepak Giridharagopal


On Mon May 21 15:33:44 2012, Deepak Giridharagopal wrote:
> On Mon, May 21, 2012 at 11:02 AM, Marc Zampetti
> <marc.z...@gmail.com <mailto:marc.z...@gmail.com>> wrote:
>
> Is Puppet Labs saying they are ending support of MySQL and instead
> will only support PostgreSQL? That is going to be a big problems
> for shops that do not support PostgresSQL, or are only allowed to
> run DB systems on an approved list.
>
>
> What is on your approved list? Oracle? SQL Server? DB2? That kind of
> information helps us plan things for the future.

MySQL, Sybase, Oracle are all supported platforms, but since only MySQL
is the one without a license cost, that is really my only choice.
Introducing yet another SQL store into the environment is just not an
option from a management and support perspective. We are also
supporting new solutions, like MongoDB, so that would be a better
choice for us.

>
> Why wouldn't a DB-agnostic model be used?
>
>
> The short answer is performance. To effectively implement things we've
> got on our roadmap, we need things that (current) MySQL doesn't
> support: array types are critical for efficiently supporting things
> like parameter values, recursive query support is critical for fast
> graph traversal operations, things like INTERSECT are handy for query
> generation, and we rely on fast joins (MySQL's nested loop joins don't
> always cut it). It's much easier for us to support databases with
> these features than those that don't. For fairly divergent database
> targets, it becomes really hard to get the performance we want while
> simultaneously keeping our codebase manageable.

I understand the need to not support everything. Having designed a
number of systems that require some of the features you say you need, I
can say with confidence that most of those issues can be handled
without having an RDBMS that has all those advanced features. So I will
respectfully disagree that you need features you listed. Yes, you may
not be able to use something like ActiveRecord or Hibernate, and have
to hand-code your SQL more often, but there are a number of techniques
that can be used to at least achieve similar performance
characteristics. I think it is a bit dangerous to assume that your user
base can easily and quickly switch out their RDBMS systems as easy as
this announcement seems to suggest. I'm happy to be wrong if the
overall community thinks that is true, but for something that is as
core to one's infrastructure as Puppet, making such a big change seems
concerning.

>
>
> Right now, I can say that due to these types of issues, I cannot
> even evaluate PuppetDB, and will not be able to for the
> foreseeable future.
>
>
> How many hosts do you have? Would the built-in, embedded database work
> for you as an interim solution?

No, I'm already managing several hundred hosts.

>
>
> Also, does this mean that the existing inventory service and store
> configs functionality goes away?
>
>
> The existing inventory service API is still supported, and in fact
> PuppetDB works as a backing store for that API. So tools and code that
> use that API currently will continue to work. Puppet 3.0 still
> includes the old ActiveRecord-based storeconfigs backend, which still
> works.
>
But is there a commitment from Puppet Labs that storeconfigs on top of
MySQL will be supported for some time? It doesn't really do me any good
to build my infrastructure on store config (primarily for external
resources), and then find out 6 - 12 months from now that it is going
away simply because the extra work to support MySQL is too hard.

> deepak
>
>
>
> On Mon May 21 07:11:07 2012, Erik Dalén wrote:
>
> On 19 May 2012 01:44, Deepak Giridharagopal
> <mailto:puppet...@googlegroups.com>.
> To unsubscribe from this group, send email to
> puppet-users+unsubscribe@__googlegroups.com
> <mailto:puppet-users%2Bunsu...@googlegroups.com>.
> For more options, visit this group at
> http://groups.google.com/__group/puppet-users?hl=en
> <http://groups.google.com/group/puppet-users?hl=en>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Users" group.
> To post to this group, send email to puppet...@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users...@googlegroups.com.

Deepak Giridharagopal

unread,
May 21, 2012, 5:39:22 PM5/21/12
to Marc Zampetti, puppet...@googlegroups.com
On Mon, May 21, 2012 at 2:04 PM, Marc Zampetti <marc.z...@gmail.com> wrote:


On Mon May 21 15:33:44 2012, Deepak Giridharagopal wrote:
On Mon, May 21, 2012 at 11:02 AM, Marc Zampetti
<marc.z...@gmail.com <mailto:marc.zampetti@gmail.com>> wrote:

   Is Puppet Labs saying they are ending support of MySQL and instead
   will only support PostgreSQL? That is going to be a big problems
   for shops that do not support PostgresSQL, or are only allowed to
   run DB systems on an approved list.


What is on your approved list? Oracle? SQL Server? DB2? That kind of
information helps us plan things for the future.

MySQL, Sybase, Oracle are all supported platforms, but since only MySQL is the one without a license cost, that is really my only choice. Introducing yet another SQL store into the environment is just not an option from a management and support perspective. We are also supporting new solutions, like MongoDB, so that would be a better choice for us.

Oracle is actually a reasonable target for us, though I sympathize with your concerns about licensing costs. In terms of supportability of PuppetDB, I can point out a few things that may mitigate some concerns:

1) The data stored in PuppetDB is entirely driven by puppetmasters compiling catalogs for agents. If your entire database exploded and lost all data, everything will be 100% repopulated within around $runinterval minutes.

2) If the PuppetDB daemon is up, but it loses connectivity to the database for some reason, outstanding requests simply block until connectivity is re-established. Incoming catalogs and facts will queue up, and will be processed once connectivity is re-established. This allows for DB maintenance without impacting the overall availability of the Puppet infrastructure.

3) No other Puppet tools other than PuppetDB itself require direct access to the database; as such, the security and authentication requirements are pretty simple.

I know that doesn't directly address your immediate concern about MySQL support, but hopefully that helps add some context around the operational requirements of PuppetDB's data store. It's much less of a "single db that all puppet tools dump data into" and much more of a "data store for just this one daemon and nothing else".




   Why wouldn't a DB-agnostic model be used?


The short answer is performance. To effectively implement things we've
got on our roadmap, we need things that (current) MySQL doesn't
support: array types are critical for efficiently supporting things
like parameter values, recursive query support is critical for fast
graph traversal operations, things like INTERSECT are handy for query
generation, and we rely on fast joins (MySQL's nested loop joins don't
always cut it). It's much easier for us to support databases with
these features than those that don't. For fairly divergent database
targets, it becomes really hard to get the performance we want while
simultaneously keeping our codebase manageable.

I understand the need to not support everything. Having designed a number of systems that require some of the features you say you need, I can say with confidence that most of those issues can be handled without having an RDBMS that has all those advanced features. So I will respectfully disagree that you need features you listed. Yes, you may not be able to use something like ActiveRecord or Hibernate, and have to hand-code your SQL more often, but there are a number of techniques that can be used to at least achieve similar performance characteristics. I think it is a bit dangerous to assume that your user base can easily and quickly switch out their RDBMS systems as easy as this announcement seems to suggest. I'm happy to be wrong if the overall community thinks that is true, but for something that is as core to one's infrastructure as Puppet, making such a big change seems concerning.

We aren't using ActiveRecord or Hibernate, and we are using hand-coded SQL where necessary to wring maximum speed out of the underlying data store. I'm happy to go into much greater detail about why the features I listed are important, but I think that's better suited to puppet-dev than puppet-users. We certainly didn't make this decision cavalierly; it was made after around a month of benchmarking various solutions ranging from traditional databases like PostgreSQL to document stores like MongoDB to KV stores such as Riak to graph databases like Neo4J. For Puppet's particular type of workload, with Puppet's volume of data, with Puppet's required durability and safety requirements...I maintain this was the best choice.

While I don't doubt that given a large enough amount of time and enough engineers we could get PuppetDB working fast enough on arbitrary backing stores (MySQL included), we have limited time and resources. From a pragmatic standpoint, we felt that supporting a database that was available on all platforms Puppet supports, that costs nothing, that has plenty of modules on the Puppet Forge to help set it up, that has a great reliability record, that meets our performance needs, and that in the worst case has free/cheap hosted offerings (such as Heroku) was a reasonable compromise.

The catalog, resource and fact wire formats that the PuppetDB terminus uses are published at docs.puppetlabs.com; we've designed those interchange formats such that you could write your own backing store that persists things and answers queries. While PuppetDB does include a specific backing store for those formats, you can use the PuppetDB terminus code on the frontend and your own custom stuff on the backend if you like. I very much encourage such endeavors, and I'm happy to help in any way I can!

We certainly don't expect users to "quickly switch out their RDBMS systems" with ease, and I sincerely apologize if I gave that impression. A few points of clarification:

1) This is the first PuppetDB release
2) It's a beta release; things will be fixed as we move to RC and then to 1.0
3) This feature is entirely optional; while I think everyone should use storeconfigs, you can certainly use puppet without it
4) Current storeconfigs users aren't abandoned; the existing, ActiveRecord-based storeconfigs code is fully supported in Puppet 3.0

Again, apologies if the original announcement wasn't clear on these points!




   Right now, I can say that due to these types of issues, I cannot
   even evaluate PuppetDB, and will not be able to for the
   foreseeable future.


How many hosts do you have? Would the built-in, embedded database work
for you as an interim solution?

No, I'm already managing several hundred hosts.




   Also, does this mean that the existing inventory service and store
   configs functionality goes away?


The existing inventory service API is still supported, and in fact
PuppetDB works as a backing store for that API. So tools and code that
use that API currently will continue to work. Puppet 3.0 still
includes the old ActiveRecord-based storeconfigs backend, which still
works.

But is there a commitment from Puppet Labs that storeconfigs on top of MySQL will be supported for some time? It doesn't really do me any good to build my infrastructure on store config (primarily for external resources), and then find out 6 - 12 months from now that it is going away simply because the extra work to support MySQL is too hard.

The current, ActiveRecord-based storeconfigs support will likely be marked 'deprecated' soon, but it shouldn't go away until Puppet 4.0, which is slated for over a year from now (semantic versioning and all that).

Cheers,
deepak

Alessandro Franceschi

unread,
May 21, 2012, 6:16:32 PM5/21/12
to puppet...@googlegroups.com, Marc Zampetti
If I can add my coins to the discussion I'd say that installing and running PostgreSQL is really easy (I actually really  never used it before and have been "forced" by PuppetDB) and, even if I don't feel myself at ease with it too, I don't think it is going to be a real pain to mantain.
But, most of all, after my first tests (one of them with about 100 testing nodes and wide use of stored configs) I have to say that the performance boost is AWESOME, also with the hsqldb backend (with postgresql is obviously better).

So, really, if PostgreSQL was a needed choice for speed I must say that it was well worth.

My suggestion, Marc is to try to "digest" yet another RDBMS in your infrastructure (and ehi it's one for the best opensource RDBMS around, after all) and enjoy the power of Puppetdb.

+1 Deepak!
(please make its future upgrades will be painless... I'm going to distribute it in production and the $runinterval convergence time is not an option we should take easily when configuring monitoring systems or  loadbalancers via storedconfigs)

my2c
al

Daniel Pittman

unread,
May 21, 2012, 7:20:02 PM5/21/12
to puppet...@googlegroups.com
On Mon, May 21, 2012 at 12:33 PM, Deepak Giridharagopal
<dee...@puppetlabs.com> wrote:
> On Mon, May 21, 2012 at 11:02 AM, Marc Zampetti <marc.z...@gmail.com>
> wrote:
>
>> Right now, I can say that due to these types of issues, I cannot even
>> evaluate PuppetDB, and will not be able to for the foreseeable future.
>
> How many hosts do you have? Would the built-in, embedded database work for
> you as an interim solution?
>>
>> Also, does this mean that the existing inventory service and store configs
>> functionality goes away?
>
> The existing inventory service API is still supported, and in fact PuppetDB
> works as a backing store for that API. So tools and code that use that API
> currently will continue to work. Puppet 3.0 still includes the old
> ActiveRecord-based storeconfigs backend, which still works.

Speaking formally, and for the platform team who maintain that code:

We hope that PuppetDB is the answer to the current StoreConfigs problems.

Until we have real world proof that it is stable, and effective, we
are not even going to talk about when we remove the previous set of
functionality.

When we do that it will be with a long time horizon - over a major
release, a year from now - so that we don't take anything away
suddenly.


From our point of view the embedded database is a good solution for
teams that can't use the PostgreSQL store.

Given ~ 2MB of JVM heap per active node, the embedded database has
performance equal to or better than the current MySQL and SQLite
ActiveRecord backed engines.

That makes it reasonable for most deployments without an external
database, even if it is not as much of a performance win.

--
Daniel Pittman
⎋ Puppet Labs Developer – http://puppetlabs.com
♲ Made with 100 percent post-consumer electrons

Brice Figureau

unread,
May 22, 2012, 11:26:22 AM5/22/12
to puppet...@googlegroups.com
I didn't had a look to the code itself, but is the postgresql code
isolated in its own module?

If yes, then that'd definitely help if someone (not saying I'm
volunteering :) wants to port the code to MySQL.

On a side note, that'd be terriffic Deepak if you would start a thread
on the puppet-dev explaining how the postgresql storage has been done to
achieve the speed :)
--
Brice Figureau
Follow the latest Puppet Community evolutions on www.planetpuppet.org!

Nick Lewis

unread,
May 22, 2012, 2:34:58 PM5/22/12
to puppet...@googlegroups.com
I'm working on putting together an in-depth look into the technology inside PuppetDB, as well as everything we've done to make it fast. That should be coming soon.

Brian Gallew

unread,
May 22, 2012, 3:50:27 PM5/22/12
to puppet...@googlegroups.com
I'm a long-term PostgreSQL fan, but MySQL has one feature that makes it a hands-down winner in our environment: trivial replication.  I have puppetmasters in two different datacenters.  Being able to have my dashboard see the status of systems in both datacenters makes it a lot more useful to the team.  The PostgreSQL alternatives just don't work nearly as well, nor as transparently.

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/y9AAD02ZVYwJ.

Sean Millichamp

unread,
May 22, 2012, 8:24:43 PM5/22/12
to puppet...@googlegroups.com
On Mon, 2012-05-21 at 15:39 -0600, Deepak Giridharagopal wrote:
1) The data stored in PuppetDB is entirely driven by puppetmasters compiling catalogs for agents. If your entire database exploded and lost all data, everything will be 100% repopulated within around $runinterval minutes.
I think that this is a somewhat dangerous line of thinking.  Please correct me if my understanding of storedconfigs are wrong, but if I am managing a resource with resources { 'type': purge => true } (or a purged directory populated file resources) and any subset of those resources are exported resources then, if my "entire database exploded", would I not have Puppet purging resources that haven't repopulated during this repopulation time?  They would obviously be replaced, but if those were critical resources (think exported Nagios configs, /etc/hosts entries, or the like) then this could be a really big problem.

To me storedconfigs are one of the killer features in Puppet. We are using them for a handful of critical things and I plan to only expand their use. I'm glad that Puppet Labs is focusing some attention on them, but this attitude of we can wait out a repopulation has me worried.  Again, maybe I'm misunderstanding how purging with exported resources actually works, but my experience has been that if you clear the exported resource from the database so goes the exported record in a purge situation.

In a slightly different vein, does PuppetDB support a cluster or HA configuration? I assume at least active/passive must be okay. Any gotchas to watch for?

Thanks,
Sean

Deepak Giridharagopal

unread,
May 22, 2012, 9:02:33 PM5/22/12
to puppet...@googlegroups.com
On Tue, May 22, 2012 at 6:24 PM, Sean Millichamp <se...@bruenor.org> wrote:
On Mon, 2012-05-21 at 15:39 -0600, Deepak Giridharagopal wrote:
1) The data stored in PuppetDB is entirely driven by puppetmasters compiling catalogs for agents. If your entire database exploded and lost all data, everything will be 100% repopulated within around $runinterval minutes.
I think that this is a somewhat dangerous line of thinking.  Please correct me if my understanding of storedconfigs are wrong, but if I am managing a resource with resources { 'type': purge => true } (or a purged directory populated file resources) and any subset of those resources are exported resources then, if my "entire database exploded", would I not have Puppet purging resources that haven't repopulated during this repopulation time?  They would obviously be replaced, but if those were critical resources (think exported Nagios configs, /etc/hosts entries, or the like) then this could be a really big problem.

To me storedconfigs are one of the killer features in Puppet. We are using them for a handful of critical things and I plan to only expand their use. I'm glad that Puppet Labs is focusing some attention on them, but this attitude of we can wait out a repopulation has me worried.  Again, maybe I'm misunderstanding how purging with exported resources actually works, but my experience has been that if you clear the exported resource from the database so goes the exported record in a purge situation.

I didn't mean to imply that there's no point to backing things up or caring about uptime...I apologize if I gave that impression. I only offered that piece of information to help people understand that the data PuppetDB is storing isn't "unique" per se; it's easy to recover data that was lost. But just because the data can be easily reconstituted doesn't mean that losing it is consequence-free! You are exactly right about the potential gotchas that exist if that data disappears and agents check in, particularly if you have purge set. The context for that statement was a larger discussion of operational requirements, and I was just trying to articulate the continuum of possible failure scenarios.

For any current or future features (upgrades, service restarts, whatever), we *never* assume that it's okay to trash your data.
 
In a slightly different vein, does PuppetDB support a cluster or HA configuration? I assume at least active/passive must be okay. Any gotchas to watch for?

Active/passive is perfectly fine; no gotchas.

Cheers,
deepak

Deepak Giridharagopal

unread,
May 22, 2012, 9:06:07 PM5/22/12
to puppet...@googlegroups.com

Also, as the communication between Puppetmaster and PuppetDB is just HTTPS, you can use something like nginx as a reverse proxy to implement automatic failover if you like.

Walter Heck

unread,
May 23, 2012, 1:20:54 AM5/23/12
to puppet...@googlegroups.com, MySql
On Tue, May 22, 2012 at 12:02 AM, Marc Zampetti <marc.z...@gmail.com> wrote:
> Is Puppet Labs saying they are ending support of MySQL and instead will only
> support PostgreSQL? That is going to be a big problems for shops that do not
> support PostgresSQL, or are only allowed to run DB systems on an approved
> list. Why wouldn't a DB-agnostic model be used?
>
> Right now, I can say that due to these types of issues, I cannot even
> evaluate PuppetDB, and will not be able to for the foreseeable future.

(cc'd the mysql list as I'm pretty sure the boys over there have some
interest in this)

As a provider of puppet consulting I can say it will be a harder sell
to clients if we need them to use postgres instead of MySQL in order
to use PuppetDB. It's not impossible of course, but introducing an
additional barrier for puppet will give us additional trouble
convincing our clients :)

You mentioned degraded performance, do you have any numbers on what
kind of performance degradation we are talking about? I wouldn't mind
some degraded performance if that means we can keep smaller clients on
MySQL.

Also, have you looked at MariaDB 5.5? it is a drop-in replacement for
MySQL with much better performance for any query optimiser related
things (which I'm pretty sure the nested joins are also part of).

--
Walter Heck

--
Check out my startup: Puppet training and consulting @ http://www.olindata.com
Follow @olindata on Twitter and/or 'Like' our Facebook page at
http://www.facebook.com/olindata

Deepak Giridharagopal

unread,
May 23, 2012, 4:20:23 AM5/23/12
to puppet...@googlegroups.com
On Tue, May 22, 2012 at 11:20 PM, Walter Heck <walte...@gmail.com> wrote:
On Tue, May 22, 2012 at 12:02 AM, Marc Zampetti <marc.z...@gmail.com> wrote:
> Is Puppet Labs saying they are ending support of MySQL and instead will only
> support PostgreSQL? That is going to be a big problems for shops that do not
> support PostgresSQL, or are only allowed to run DB systems on an approved
> list. Why wouldn't a DB-agnostic model be used?
>
> Right now, I can say that due to these types of issues, I cannot even
> evaluate PuppetDB, and will not be able to for the foreseeable future.

(cc'd the mysql list as I'm pretty sure the boys over there have some
interest in this)

As a provider of puppet consulting I can say it will be a  harder sell
to clients if we need them to use postgres instead of MySQL in order
to use PuppetDB. It's not impossible of course, but introducing an
additional barrier for puppet will give us additional trouble
convincing our clients :)

I sympathize, and that's why this is an optional component; it's not required in any way for Puppet to function. ActiveRecord-based storeconfigs isn't going away any time soon (and certainly not without more dialogue with the community); it remains an entirely supported option. For customers who feel the improved scalability and performance aren't worth the opportunity cost, they can continue to use the existing storeconfigs backend to which they're accustomed. That remains a fully-supported way to deploy Puppet.
 

You mentioned degraded performance, do you have any numbers on what
kind of performance degradation we are talking about? I wouldn't mind
some degraded performance if that means we can keep smaller clients on
MySQL.

Also, have you looked at MariaDB 5.5? it is a drop-in replacement for
MySQL with much better performance for any query optimiser related
things (which I'm pretty sure the nested joins are also part of).

While we don't have immediate plans to support MySQL or MariaDB ourselves, I would very much encourage others to create such backends for PuppetDB. Our wire formats are documented and versioned, and the PuppetDB terminus code that handles translation from Puppet's data structures into the wire format is 100% reusable to that end. These layers of abstraction were put into place specifically to enable such interoperability and to prevent alternate backends from becoming second-class citizens of the puppet ecosystem. We'd be happy to assist in these endeavors however we can!

deepak
 

--
Walter Heck

--
Check out my startup: Puppet training and consulting @ http://www.olindata.com
Follow @olindata on Twitter and/or 'Like' our Facebook page at
http://www.facebook.com/olindata
--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.

jcbollinger

unread,
May 23, 2012, 9:24:01 AM5/23/12
to Puppet Users


On May 22, 7:24 pm, Sean Millichamp <s...@bruenor.org> wrote:
> On Mon, 2012-05-21 at 15:39 -0600, Deepak Giridharagopal wrote:
> > 1) The data stored in PuppetDB is entirely driven by puppetmasters
> > compiling catalogs for agents. If your entire database exploded and
> > lost all data, everything will be 100% repopulated within around
> > $runinterval minutes.
>
> I think that this is a somewhat dangerous line of thinking.  Please
> correct me if my understanding of storedconfigs are wrong, but if I am
> managing a resource with resources { 'type': purge => true } (or a
> purged directory populated file resources) and any subset of those
> resources are exported resources then, if my "entire database exploded",
> would I not have Puppet purging resources that haven't repopulated
> during this repopulation time?  They would obviously be replaced, but if
> those were critical resources (think exported Nagios configs, /etc/hosts
> entries, or the like) then this could be a really big problem.


That understanding of storeconfigs looks right, but I think the
criticism is misplaced. It is not Deepak's line of thinking that is
dangerous, but rather the posited strategy of purging (un)collected
resources. Indeed, I rate resource purging as a bit dangerous *any*
way you do it. Moreover, the consequences of a storeconfig DB blowing
up are roughly the same regardless of the DBMS managing it or the
middleware between it an the Puppetmaster. I don't see how the
existence of that scenario makes PuppetDB any better or worse.


> To me storedconfigs are one of the killer features in Puppet. We are
> using them for a handful of critical things and I plan to only expand
> their use. I'm glad that Puppet Labs is focusing some attention on them,
> but this attitude of we can wait out a repopulation has me worried.


If you cannot afford to wait out a repopulation of some resource, then
you probably should not risk purging its resource type. If you do not
purge, then a storeconfig implosion just leaves your resources
unmanaged. If you choose to purge anyway then you need to understand
that you thereby assume some risk in exchange for convenience;
mitigating that risk probably requires additional effort elsewhere
(e.g. DB replication and failover, backup data center, ...).


John

Sean Millichamp

unread,
May 23, 2012, 10:30:32 PM5/23/12
to puppet...@googlegroups.com
On Wed, 2012-05-23 at 06:24 -0700, jcbollinger wrote:

> That understanding of storeconfigs looks right, but I think the
> criticism is misplaced. It is not Deepak's line of thinking that is
> dangerous, but rather the posited strategy of purging (un)collected
> resources. Indeed, I rate resource purging as a bit dangerous *any*
> way you do it. Moreover, the consequences of a storeconfig DB blowing
> up are roughly the same regardless of the DBMS managing it or the
> middleware between it an the Puppetmaster. I don't see how the
> existence of that scenario makes PuppetDB any better or worse.

Indeed, it *is* dangerous, but so are many things we do as system
administrators. The key is in gauging the risk and then choosing the
right path accordingly. In my environment I am not always able to know
the complete history of resources as changes may come from unexpected
places. It is less than ideal, but it is one aspect of my reality. In
that situation, the selective use of purging becomes quite key in
keeping things that need to be "cleaned up" cleaned up.

I don't put anything in exported resources with purging that would be
capable of bringing down a production application, thankfully, but there
is quite a bit that could quite possibly cause a variety of headaches,
alerts, and tickets on a massive scale for a while during the
reconvergence.

In additioanl, we are in a transition to PE and the Compliance tool will
allow me another way of handling that in a more manual admin-review
approach (to catch resources that get added outside of Puppet's
knowledge).

What I really need is some tool by which I can mark exported resources
as absent instead of purging them from the database when they are no
longer needed (such as deleting a host). That would eliminate most, if
not all, of the intersections of purging and exported resources that I
have. Right now I use a Ruby script I found quite a while back to
delete removed nodes and all of their data. I'm sure there is a way to
mark the resources as ensure => absent instead, but I've not gone
digging into the DB structure.

> If you cannot afford to wait out a repopulation of some resource, then
> you probably should not risk purging its resource type. If you do not
> purge, then a storeconfig implosion just leaves your resources
> unmanaged. If you choose to purge anyway then you need to understand
> that you thereby assume some risk in exchange for convenience;
> mitigating that risk probably requires additional effort elsewhere
> (e.g. DB replication and failover, backup data center, ...).

Indeed, as I said above, it is about risk management. Deepak's statement
I had responded to wasn't the first time I had read the "oh, just wait
for it to repopulate" statement and I wanted to be certain that wasn't
actually something that was considered in the design with regards to
updates, etc. on the stability of the storeconfigs data.

At some point you have to trust tools that have earned that trust
(either via testing or real world use or both) to do the job that they
say they are going to do. Puppet has years of earning that trust with
me. Could something corrupt and destroy the database and cause me a lot
of trouble? Sure, but that could be said of many tools. That's why we
have backups, DR systems, etc. even though the "in the now" when it
fails can be painful as heck. However, as long as Puppet Labs is
designing it to be dependable and upgrade-safe (which it sounds like
they are) then I'll continue to trust it (with prudent testing, of
course) because they've earned it.

Sean


Nick Lewis

unread,
May 23, 2012, 11:05:44 PM5/23/12
to puppet...@googlegroups.com
We don't yet have such a tool for PuppetDB, but it's definitely on our
radar. The current `puppet node clean --unexport` reaches directly
into the ActiveRecord storeconfigs database to make ad hoc changes to
resources, which is inappropriate for PuppetDB, which has a strict
catalog lifecycle. We're working to figure out an appropriate way to
provide the same functionality.

>
>> If you cannot afford to wait out a repopulation of some resource, then
>> you probably should not risk purging its resource type.  If you do not
>> purge, then a storeconfig implosion just leaves your resources
>> unmanaged.  If you choose to purge anyway then you need to understand
>> that you thereby assume some risk in exchange for convenience;
>> mitigating that risk probably requires additional effort elsewhere
>> (e.g. DB replication and failover, backup data center, ...).
>
> Indeed, as I said above, it is about risk management. Deepak's statement
> I had responded to wasn't the first time I had read the "oh, just wait
> for it to repopulate" statement and I wanted to be certain that wasn't
> actually something that was considered in the design with regards to
> updates, etc. on the stability of the storeconfigs data.

We definitely didn't take safe repopulation as a given. We know many
if not most storeconfigs users will likely suffer at least some
inconvenience or at worst some outages if their data has to be
repopulated; we're not blasé about the issue. We haven't cut any
corners in PuppetDB around safeguarding your data. It's simply a
design ideal we would like to promote. When it's reasonable to design
your exports/collects thusly, it's beneficial for storeconfigs data to
be easily regenerable. After all, that's what Puppet purports to allow
you to do with your infrastructure, and it would be great not to allow
storeconfigs to disrupt that ability. And on that note, where you find
a case that this just isn't possible today, let us know. I'd love for
this to be the norm.

Mostly the reason for mentioning it is because many people hear
"database" and automatically think "oh great now I have to set up
replication, backups, failover, etc". But before going off and doing
all that work, it's important to ensure this really is data you care
about replicating, backing up, making highly available, etc. Depending
on your needs (for instance, if you're not a storeconfigs user at
all), the answer *may* be no.

>
> At some point you have to trust tools that have earned that trust
> (either via testing or real world use or both) to do the job that they
> say they are going to do. Puppet has years of earning that trust with
> me. Could something corrupt and destroy the database and cause me a lot
> of trouble? Sure, but that could be said of many tools. That's why we
> have backups, DR systems, etc. even though the "in the now" when it
> fails can be painful as heck. However, as long as Puppet Labs is
> designing it to be dependable and upgrade-safe (which it sounds like
> they are) then I'll continue to trust it (with prudent testing, of
> course) because they've earned it.
>
> Sean
>
>
Reply all
Reply to author
Forward
0 new messages