| For an actual scale test we can use the existing benchmark tool to submit arbitrary numbers of commands to a PDB. But that operates by reading a set of "example" commands and making minor modifications to them over time. The existing data was generated on an employees laptop years ago and is not representative of "real data" - but we don't know what is. In order to test PuppetDB at scale, we need better data. Current data examples are very sparse - generated on an employees laptop many years ago with very little total data - or from a customer but very old. Why do we need to write code to generate it as opposed to just create it once? We don't know what data structure is "representative" of customers use case, and it likely isn't one single structure. I think we have about 1000 customers, so we probably have 1001 different use profiles. By write code to generate it from a set of parameters, we can change our data set as we move forward in time. Out of scope, but we could even instrument the same metrics in PE and potentially use that to replicate issues a customer is seeing without having to ask them to run things on our behalf to diagnose issues. |