PuppetDB: manually import reports

95 views
Skip to first unread message

Thomas Müller

unread,
Apr 19, 2018, 7:46:01 AM4/19/18
to Puppet Users
HI

I've got some prod puppetserver/puppetdb and some dev puppetserver/puppetdb. But to have the complete overview over all nodes with the prod puppetdb I'd like to import the reports from the dev puppetserver (stored by reports=store config) into the prod puppetdb.

is there some hidden tool to do so? I wasn't able to find anything in that direction.

Reading https://github.com/puppetlabs/puppetdb/blob/master/puppet/lib/puppet/reports/puppetdb.rb this could maybe adapted to read a yaml file and then send it to puppetdb.

- Thomas

Michael Watters

unread,
Apr 19, 2018, 10:49:41 AM4/19/18
to Puppet Users
Puppetdb data is all stored in postgresql so you should be able to copy the reports table from one server to the other.  For example, run this on your prod node.

pg_dump -h puppetdb-dev -U puppetdb -W -d puppetdb -t reports | psql -U puppetdb puppetdb

I'm not sure if you need more than just the reports table though.

Christopher Wood

unread,
Apr 19, 2018, 1:18:34 PM4/19/18
to puppet...@googlegroups.com
To challenge an assumption, what are you gaining from having more than one puppet infrastructure (puppetservers+puppetdb)?

Could you perhaps handle your dev stuff with another environment or set of puppetservers under the same CA with the same puppetdb?

Is there any reason for a separate puppet infrastructure to live longer than it takes to proof an upgrade for production?
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [1]puppet-users...@googlegroups.com.
> To view this discussion on the web visit
> [2]https://groups.google.com/d/msgid/puppet-users/d5a3b811-655f-4497-84de-a5693954d08e%40googlegroups.com.
> For more options, visit [3]https://groups.google.com/d/optout.
>
> References
>
> Visible links
> 1. mailto:puppet-users...@googlegroups.com
> 2. https://groups.google.com/d/msgid/puppet-users/d5a3b811-655f-4497-84de-a5693954d08e%40googlegroups.com?utm_medium=email&utm_source=footer
> 3. https://groups.google.com/d/optout

Thomas Müller

unread,
Apr 19, 2018, 3:30:11 PM4/19/18
to Puppet Users


Am Donnerstag, 19. April 2018 19:18:34 UTC+2 schrieb Christopher Wood:
To challenge an assumption, what are you gaining from having more than one puppet infrastructure (puppetservers+puppetdb)?

Could you perhaps handle your dev stuff with another environment or set of puppetservers under the same CA with the same puppetdb?

Is there any reason for a separate puppet infrastructure to live longer than it takes to proof an upgrade for production?

I can't just throw away the dev infra after preparing changes for prod because of non-technical reasons. i'm limiting the usage of the dev system as much as I can, but there will be system connected to this dev infra. but I also want the data in the prod puppetdb to have a single point to make queries/reports (third party departments) or run octocatalog-diff to run against real facts from any system.

Another usecase could be to have a async puppetdb connection from the second datacenter. If the connection between the datacenters is not  stable enough to use a single puppetdb I would need to add a puppetdb per DC.Then I also would want to sync data to the central puppetdb instance.

- Thomas

 

jcbollinger

unread,
Apr 20, 2018, 9:48:11 AM4/20/18
to Puppet Users


On Thursday, April 19, 2018 at 2:30:11 PM UTC-5, Thomas Müller wrote:


Am Donnerstag, 19. April 2018 19:18:34 UTC+2 schrieb Christopher Wood:
To challenge an assumption, what are you gaining from having more than one puppet infrastructure (puppetservers+puppetdb)?

Could you perhaps handle your dev stuff with another environment or set of puppetservers under the same CA with the same puppetdb?

Is there any reason for a separate puppet infrastructure to live longer than it takes to proof an upgrade for production?

I can't just throw away the dev infra after preparing changes for prod because of non-technical reasons. i'm limiting the usage of the dev system as much as I can, but there will be system connected to this dev infra. but I also want the data in the prod puppetdb to have a single point to make queries/reports (third party departments) or run octocatalog-diff to run against real facts from any system.


That seems to respond only to Am's last question.  You can have varying degrees of dev / prod separation while still maintaining a shared CA and puppetdb, and that has nothing to do with the lifetime or life cycle of the dev machines.  I strongly advise at least the shared CA if you're contemplating combining dev and prod data by any mechanism.

There are several good reasons to prefer the minimum separation of Puppet infrastructure, especially since for at least some purposes, you want to aggregate the dev and prod data.  And doing so would take care of the problem up front -- there would be no extra step needed to aggregate dev and prod data, because it would not be physically separated in the first place.
 

Another usecase could be to have a async puppetdb connection from the second datacenter. If the connection between the datacenters is not  stable enough to use a single puppetdb I would need to add a puppetdb per DC.Then I also would want to sync data to the central puppetdb instance.


Is that an actual use case or a hypothetical one?

If hypothetical, then don't let it influence your decisions about your actual use cases: if and when you need to account for that, the details will matter, and the technological landscape will have changed, so any time, effort, and compromises made to accommodate it now will probably be wasted.  If it never materializes as an actual use case, then resources spent now to accommodate it will definitely be wasted.

If you do need to account for it now, then you should still use at least a common CA.  It might make sense to use common aggregated puppetdb database, too, maybe supported by database synchronization between the PG instances at the PG level.  It's hard to make good recommendations, however, without a better handle on the requirements for this scenario.


John

Wyatt Alt

unread,
Apr 20, 2018, 11:42:08 AM4/20/18
to puppet...@googlegroups.com

If I'm understanding you right, you could normally use the import/export tools for this:

https://puppet.com/docs/puppetdb/5.0/anonymization.html#using-the-export-command

There's a corresponding "admin" API on PuppetDB you can search for. The process would be to do an export, extract the resulting tarball and remove everything but reports (if desired), then tar it up again and run it through the import tool. Unfortunately though, this is broken for me on current PDB due to PDB-3796. If you're on an older version it may be worth a try -- it worked at some point. If you've got the bug it'll cause your dev server to OOM and restart.

Assuming that's broken for you too, I think the most tractable way to do what you're asking is basically what you're suggesting -- either parse the yaml reports into the report wire format (https://puppet.com/docs/puppetdb/5.1/api/wire_format/report_format_v8.html) and post to your prod PDB's commands endpoint (https://puppet.com/docs/puppetdb/5.1/api/command/v1/commands.html) or get the json reports out of your dev PuppetDB, in batches to work around the bug, and do the equivalent parsing/posting. The wire formats change from time to time so take care to use whatever version of the docs aligns with your PDB version.

Wyatt



- Thomas
--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/d5a3b811-655f-4497-84de-a5693954d08e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thomas Müller

unread,
Apr 25, 2018, 7:22:39 AM4/25/18
to Puppet Users

 

Another usecase could be to have a async puppetdb connection from the second datacenter. If the connection between the datacenters is not  stable enough to use a single puppetdb I would need to add a puppetdb per DC.Then I also would want to sync data to the central puppetdb instance.


Is that an actual use case or a hypothetical one?

I'm just thinking what my options are if it the datacenter link is not stable enough. I'm not investing time to create a solution.

I've thought a bit longer about the "importing reports". it's not just importing reports, it's also importing facts and importing catalogs to the central db. Overall I think this really would require much time to implement the tooling. Maybe then it will be easier to query 2 puppetdb's instead of syncing everything to one.

But maybe all works out fine and no hacking will be necessary. :)

- Thomas

jcbollinger

unread,
Apr 26, 2018, 9:09:46 AM4/26/18
to Puppet Users
There are Postgres-level tools for database federation and synchronization.  As I already suggested, something along those lines is probably worth your consideration as a mechanism for the actual data movement.  The other question to consider is how to structure the data so that it even makes sense to combine them at all, and again, at a bare minimum, your various participating masters should rely on a common CA.  To a good approximation, the CA identity is the site identity, and it does not make much sense to combine data from different sites in the same database.


John

Reply all
Reply to author
Forward
0 new messages