|
The current user experience of an anonymized export of a PuppetDB database is pretty bad. We currently export all of the data from PDB instance (anonymizing as we go) and that is returned via HTTP as a tar.gz file. For PDB databases of any decent size, this takes a very long time and can be very large (i.e. 20+ GB). This makes it a time-consuming and difficult process.
The reason this process takes so long is because we are exporting all reports. There is value in having reports, but there's not a lot of value in having every report for every node.
We should think through what changes we can make to limit the export/anonymization of reports and still get similar value. The result of this ticket should be a set of tickets with info on what we should change. Some suggestions:
-
Export only reports that have changes
-
Allow exporting only a given number of nodes worth of data
-
Change the benchmark tool to synthetically create unchanged reports from a list of only changed reports
|