Jira (PDB-5590) Generating scale testing data

4 views

Skip to first unread message

Austin Blatt (Jira)

unread,

Feb 13, 2023, 12:23:01 PM2/13/23

to puppe...@googlegroups.com

Austin Blatt created an issue

PuppetDB /

PDB-5590

Generating scale testing data

Issue Type:	Epic
Assignee:	Unassigned
Created:	2023/02/13 9:22 AM
Priority:	Normal
Reporter:	Austin Blatt

For an actual scale test we can use the existing benchmark tool to submit arbitrary numbers of commands to a PDB. But that operates by reading a set of "example" commands and making minor modifications to them over time. The existing data was generated on an employees laptop years ago and is not representative of "real data" - but we don't know what is.

In order to test PuppetDB at scale, we need better data. Current data examples
are very sparse - generated on an employees laptop many years ago with very
little total data - or from a customer but very old.

Why do we need to write code to generate it as opposed to just create it once?
We don't know what data structure is "representative" of customers use case,
and it likely isn't one single structure. I think we have about 1000 customers,
so we probably have 1001 different use profiles. By write code to generate it
from a set of parameters, we can change our data set as we move forward in
time. Out of scope, but we could even instrument the same metrics in PE and
potentially use that to replicate issues a customer is seeing without having to
ask them to run things on our behalf to diagnose issues.

Add Comment

This message was sent by Atlassian Jira (v8.20.11#820011-sha1:0629dd8)

David Piekny (Jira)

unread,

Feb 15, 2023, 5:46:03 PM2/15/23

to puppe...@googlegroups.com

David Piekny updated an issue

PuppetDB /

PDB-5590

Generating scale testing data

Change By:	David Piekny
Labels:	23Q1 enterprise-scalability phase_1