Jira (PDB-4771) Create command to import and timeshift datasets

1 view
Skip to first unread message

Rob Browning (Jira)

unread,
Mar 29, 2021, 5:53:03 PM3/29/21
to puppe...@googlegroups.com
Rob Browning updated an issue
 
PuppetDB / New Feature PDB-4771
Create command to import and timeshift datasets
Change By: Rob Browning
Summary: Setup PDB instance on n1/n2 with test data Create command to import and timeshift datasets
Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Rob Browning (Jira)

unread,
Mar 29, 2021, 5:57:02 PM3/29/21
to puppe...@googlegroups.com
Rob Browning updated an issue
Change By: Rob Browning
Acceptance Criteria: * A helper script command which loads only can load, timeshift (as the pe-puppetdb slv data into a pe-pdbbox sandbox
* This helper script should also have the ability to run the
updatetime.sql script on the data commands do), and follow that modification with vacuum a VACUUM FULL of the database given pgdump.
* Basic dev docs on how
Adjustments
to run documentation/CONTRIBUTING.md if they seem warranted, i.e. if the helper script
* A pe
command - pdbbox sandbox loaded with -help isn't sufficient (perhaps at least a pointer to the slv data with pdb turned off command).
* The helper script should live somewhere in the pdb repo
 

Rob Browning (Jira)

unread,
Mar 29, 2021, 6:23:03 PM3/29/21
to puppe...@googlegroups.com
Rob Browning updated an issue
We need to setup a test db to run queries against on n1/n2 with test data that will not be gc'd. For the first pass probably use the slv load test data and bump the timestamps with the provided script. [slv-setup|https://github.com/puppetlabs/gatling-puppet-load-test/blob/master/docs/load_save_dbs.md]  

After contemplating some possible future enhancements, suggest creating a command in the PDB repo (not extensions) that for now requires an existing PDBBOX (i.e. refuses to run if PDBBOX isn't set in the environment, etc.), perhaps like this:
{code:shell}
ext/bin/pdb-dataset import --timeshift linear DUMPFILE{code}
This would:
* Refuse to run if it detects that the PDBBOX already has a puppetdb database (i.e. perhaps just if the migrations table exists, for example).
  In the longer run, we might add a  {--force} option to allow clobbering.
* Run a pgrestore of the pgdump DUMPFILE.  We might eventually support a {--format} option to allow other kinds of import, e.g. {--format pgdump} {--format basebackup} ...
* Time shift the database as indicated here [slv-setup|https://github.com/puppetlabs/gatling-puppet-load-test/blob/master/docs/load_save_dbs.md].  The {--format linear} argument could be optional, but it's OK for it to be required for now.  The argument in favor of the {linear} value is that we may want to support other time shifting strategies in the future.
* Run a {vacuum full}.  Later, we might want to support inverse {--vacuum} and {--no-vacuum} options, but for it's fine to just {{unconditionally}} vacuum.

Rob Browning (Jira)

unread,
Mar 29, 2021, 6:36:03 PM3/29/21
to puppe...@googlegroups.com
Rob Browning updated an issue
We need to setup a test db to run queries against on n1/n2 with test data that will not be gc'd. For the first pass probably use the slv load test data and bump the timestamps with the provided script. [slv-setup|https://github.com/puppetlabs/gatling-puppet-load-test/blob/master/docs/load_save_dbs.md

After contemplating some possible future enhancements, suggest creating a command in the PDB repo (not extensions) that for now requires an existing PDBBOX (i.e. refuses to run if PDBBOX isn't set in the environment, etc.), perhaps a command like this:
{code:shell}

ext/bin/pdb-dataset import --timeshift linear DUMPFILE
{code}
This which would:
* Refuse to run if it detects that the
postgres associated with the PDBBOX already has a puppetdb database ( i.e. perhaps just by checking via psql if the migrations table exists, for example).  ( In the longer run, we might add a  { { --force} } option to allow clobbering. )
* Run a
{{ pgrestore }} of the {{ pgdump }} {{ DUMPFILE }} ( We might eventually support a { { --format} } option to allow other kinds of import, e.g. { { --format pgdump} } { { --format basebackup} } ... )
* Time shift the database as indicated here
: [slv-setup|https://github.com/puppetlabs/gatling-puppet-load-test/blob/master/docs/load_save_dbs.md].  The { { --format linear} } argument could be optional (i.e. to support not timeshifting) , but it's OK fine for it to be required for now.  ( The argument in favor of the { { linear} } value is that we may want to support other time shifting strategies in the future. )
* Run a {
{ vacuum full} } .  Later, we might want to support inverse { { --vacuum} } and { { --no-vacuum} } options, but for now it's fine to just unconditionally vacuum.

We might also consider putting the timeshfit script (if it ends up being in a standalone .sql file), in resources/ somewhere, say
{{ unconditionally resources/puppetlabs/puppetdb/timeshift.sql }} vacuum or something, which means it'll end up in the jar, which shouldn't hurt, and might be handy someday .

Rob Browning (Jira)

unread,
Mar 29, 2021, 6:37:02 PM3/29/21
to puppe...@googlegroups.com
Rob Browning updated an issue
After contemplating some possible future enhancements, suggest creating a command in the PDB repo (not extensions) that for now requires an existing PDBBOX (i.e. refuses to run if PDBBOX isn't set in the environment, etc.), perhaps a command like this:
{code:shell}ext/bin/pdb-dataset import --timeshift linear DUMPFILE
{code}
which would:
* Refuse to run if it detects that the postgres associated with the PDBBOX already has a puppetdb database (perhaps just by checking via psql if the migrations table exists, for example).  (In the longer run, we might add a  {{--force}} option to allow clobbering.)
* Run a {{pgrestore}} of the {{pgdump}} {{DUMPFILE}}.  (We might eventually support a {{--format}} option to allow other kinds of import, e.g. {{
\ - \ -format pgdump}} {{--format basebackup}} ...)
* Time shift the database as indicated here: [slv-setup|https://github.com/puppetlabs/gatling-puppet-load-test/blob/master/docs/load_save_dbs.md].  The {{--format linear}} argument could be optional (i.e. to support not timeshifting), but it's fine for it to be required for now.  (The argument in favor of the {{linear}} value is that we may want to support other time shifting strategies in the future.)

* Run a {{vacuum full}}.  Later, we might want to support inverse {{--vacuum}} and {{--no-vacuum}} options, but for now it's fine to just unconditionally vacuum.

We might also consider putting the timeshfit script (if it ends up being in a standalone .sql file), in resources/ somewhere, say {{resources/puppetlabs/puppetdb/timeshift.sql}} or something, which means it'll end up in the jar, which shouldn't hurt, and might be handy someday.

Rob Browning (Jira)

unread,
Mar 29, 2021, 6:37:03 PM3/29/21
to puppe...@googlegroups.com
Rob Browning updated an issue
After contemplating some possible future enhancements, suggest creating a command in the PDB repo (not extensions) that for now requires an existing PDBBOX (i.e. refuses to run if PDBBOX isn't set in the environment, etc.), perhaps a command like this:
{code:shell}ext/bin/pdb-dataset import --timeshift linear DUMPFILE
{code}
which would:
* Refuse to run if it detects that the postgres associated with the PDBBOX already has a puppetdb database (perhaps just by checking via psql if the migrations table exists, for example).  (In the longer run, we might add a  {{--force}} option to allow clobbering.)
* Run a {{pgrestore}} of the {{pgdump}} {{DUMPFILE}}.  (We might eventually support a {{--format}} option to allow other kinds of import, e.g. {{\-\-format pgdump}} {{--format basebackup}} ...)
* Time shift the database as indicated here: [slv-setup|https://github.com/puppetlabs/gatling-puppet-load-test/blob/master/docs/load_save_dbs.md].  The {{--format linear}} argument could be optional (i.e. to support not timeshifting), but it's fine for it to be required for now.  (The argument in favor of the {{linear}} value is that we may want to support other time shifting strategies in the future.)
* Run a {{vacuum full}}.  Later, we might want to support inverse {{ \ - \ -vacuum}} and {{--no-vacuum}} options, but for now it's fine to just unconditionally vacuum.


We might also consider putting the timeshfit script (if it ends up being in a standalone .sql file), in resources/ somewhere, say {{resources/puppetlabs/puppetdb/timeshift.sql}} or something, which means it'll end up in the jar, which shouldn't hurt, and might be handy someday.

Rob Browning (Jira)

unread,
Mar 29, 2021, 6:38:02 PM3/29/21
to puppe...@googlegroups.com
Rob Browning updated an issue
After contemplating some possible future enhancements, suggest creating a command in the PDB repo (not extensions) that for now requires an existing PDBBOX (i.e. refuses to run if PDBBOX isn't set in the environment, etc.), perhaps a command like this:
{code:shell}ext/bin/pdb-dataset import --timeshift linear DUMPFILE
{code}
which would:
* Refuse to run if it detects that the postgres associated with the PDBBOX already has a puppetdb database (perhaps just by checking via psql if the migrations table exists, for example).  (In the longer run, we might add a  {{ \ - \ -force}} option to allow clobbering.)
* Run a {{pgrestore}} of the {{pgdump}} {{DUMPFILE}}.  (We might eventually support a {{--format}} option to allow other kinds of import, e.g. {{\-\-format pgdump}} {{
\ - \ -format basebackup}} ...)

* Time shift the database as indicated here: [slv-setup|https://github.com/puppetlabs/gatling-puppet-load-test/blob/master/docs/load_save_dbs.md].  The {{--format linear}} argument could be optional (i.e. to support not timeshifting), but it's fine for it to be required for now.  (The argument in favor of the {{linear}} value is that we may want to support other time shifting strategies in the future.)
* Run a {{vacuum full}}.  Later, we might want to support inverse {{\-\-vacuum}} and {{--no-vacuum}} options, but for now it's fine to just unconditionally vacuum.

We might also consider putting the timeshfit script (if it ends up being in a standalone .sql file), in resources/ somewhere, say {{resources/puppetlabs/puppetdb/timeshift.sql}} or something, which means it'll end up in the jar, which shouldn't hurt, and might be handy someday.

Rob Browning (Jira)

unread,
Mar 29, 2021, 6:38:03 PM3/29/21
to puppe...@googlegroups.com
Rob Browning updated an issue
After contemplating some possible future enhancements, suggest creating a command in the PDB repo (not extensions) that for now requires an existing PDBBOX (i.e. refuses to run if PDBBOX isn't set in the environment, etc.), perhaps a command like this:
{code:shell}ext/bin/pdb-dataset import --timeshift linear DUMPFILE
{code}
which would:
* Refuse to run if it detects that the postgres associated with the PDBBOX already has a puppetdb database (perhaps just by checking via psql if the migrations table exists, for example).  (In the longer run, we might add a  {{\-\-force}} option to allow clobbering.)
* Run a {{pgrestore}} of the {{pgdump}} {{DUMPFILE}}.  (We might eventually support a {{--format}} option to allow other kinds of import, e.g. {{\-\-format pgdump}} {{\-\-format basebackup}} ...)
* Time shift the database as indicated here: [slv-setup|https://github.com/puppetlabs/gatling-puppet-load-test/blob/master/docs/load_save_dbs.md].  The {{--format linear}} argument could be optional (i.e. to support not timeshifting), but it's fine for it to be required for now.  (The argument in favor of the {{linear}} value is that we may want to support other time shifting strategies in the future.)
* Run a {{vacuum full}}.  Later, we might want to support inverse {{\-\-vacuum}} and {{--no-vacuum}} options, but for now it's fine to just unconditionally vacuum.

We might also consider putting the timeshfit timeshift script (if it ends up being in a standalone .sql file), in resources/ somewhere, say {{resources/puppetlabs/puppetdb/timeshift.sql}} or something, which means it'll end up in the jar, which shouldn't hurt, and might be handy someday.

Rob Browning (Jira)

unread,
Mar 29, 2021, 7:25:03 PM3/29/21
to puppe...@googlegroups.com
Rob Browning updated an issue
After contemplating some possible future enhancements, suggest creating a command in the PDB repo (not extensions) that for now requires an existing PDBBOX (i.e. refuses to run if PDBBOX isn't set in the environment, etc.), perhaps a command like this:
{code:shell}ext/bin/pdb-dataset import --timeshift linear -to now DUMPFILE

{code}
which would:
* Refuse to run if it detects that the postgres associated with the PDBBOX already has a puppetdb database (perhaps just by checking via psql if the migrations table exists, for example).  (In the longer run, we might add a  {{\-\-force}} option to allow clobbering.)
* Run a {{pgrestore}} of the {{pgdump}} {{DUMPFILE}}.  (We might eventually support a {{--format}} option to allow other kinds of import, e.g. {{\-\-format pgdump}} {{\-\-format basebackup}} ...)
* Time shift the database as indicated here: [slv-setup|https://github.com/puppetlabs/gatling-puppet-load-test/blob/master/docs/load_save_dbs.md].  The {{ \ - \ - format linear timeshift }} argument could eventually be optional (i.e. to support not timeshifting), but it's fine for it to be required for now.  (The argument in favor of the {{ linear \-\-timeshift-to now }} value instead of a boolean argument is that we may want to support other time shifting strategies in the future , and/or different offsets .)

* Run a {{vacuum full}}.  Later, we might want to support inverse {{\-\-vacuum}} and {{--no-vacuum}} options, but for now it's fine to just unconditionally vacuum.

We might also consider putting the timeshift script (if it ends up being in a standalone .sql file), in resources/ somewhere, say {{resources/puppetlabs/puppetdb/timeshift.sql}} or something, which means it'll end up in the jar, which shouldn't hurt, and might be handy someday.

Rob Browning (Jira)

unread,
Mar 29, 2021, 7:32:02 PM3/29/21
to puppe...@googlegroups.com
Rob Browning updated an issue
After contemplating some possible future enhancements, suggest creating a command in the PDB repo (not extensions) that for now requires an existing PDBBOX (i.e. refuses to run if PDBBOX isn't set in the environment, etc.), perhaps a command like this:
{code:shell}ext/bin/pdb-dataset import --timeshift-to now DUMPFILE

{code}
which would:
* Refuse to run if it detects that the postgres associated with the PDBBOX already has a puppetdb database (perhaps just by checking via psql if the migrations table exists, for example).  (In the longer run, we might add a  {{\-\-force}} option to allow clobbering.)
* Run a {{pgrestore}} of the {{pgdump}} {{DUMPFILE}}.  (We might eventually support a {{--format}} option to allow other kinds of import, e.g. {{\-\-format pgdump}} {{\-\-format basebackup}} ...)
* Time shift the database as indicated here: [slv-setup|https://github.com/puppetlabs/gatling-puppet-load-test/blob/master/docs/load_save_dbs.md].  The {{\-\-timeshift}} argument could eventually be optional (i.e. to support not timeshifting), but it's fine for it to be required for now.  (The argument in favor of {{\-\-timeshift-to now}} instead of a boolean argument is that we may want to support other time shifting strategies in the future, and/or different offsets.)

* Run a {{vacuum full}}.  Later, we might want to support inverse {{\-\-vacuum}} and {{--no-vacuum}} options, but for now it's fine to just unconditionally vacuum.

We might also consider putting the timeshift script (if it ends up being in a standalone .sql file), in resources/ somewhere, say {{resources/puppetlabs/puppetdb/timeshift.sql}} or something, which means it'll end up in the jar, which shouldn't hurt, and might be handy someday.


If we want a "top level" test for this, could model it after those run by ext/bin/run-external-tests and add it there.

And if the top level command ends up being a shell script, there's plenty of prior "art" in {{ext/bin}} that might or might not be helpful.

Rob Browning (Jira)

unread,
Mar 29, 2021, 7:41:02 PM3/29/21
to puppe...@googlegroups.com
Rob Browning updated an issue
After contemplating some possible future enhancements, suggest creating a command in the PDB repo (not extensions) that for now requires an existing PDBBOX a {{pdbbox-env}} environment (i.e. refuses to run if {{ PDBBOX isn't }} is not set in the environment , etc.) and assumes that if it is set , perhaps it's running via {{pdbbox-env}})

Perhaps
a command like this:

{code:shell}ext/bin/pdb-dataset import --timeshift-to now DUMPFILE
{code}
which would:
* Refuse to run if it detects that the postgres associated with the PDBBOX already has a puppetdb database (perhaps just by checking via psql if the migrations table exists, for example).  (In the longer run, we might add a  {{\-\-force}} option to allow clobbering.)
* Run a {{pgrestore}} of the {{pgdump}} {{DUMPFILE}}.  (We might eventually support a {{--format}} option to allow other kinds of import, e.g. {{\-\-format pgdump}} {{\-\-format basebackup}} ...)
* Time shift the database as indicated here: [slv-setup|https://github.com/puppetlabs/gatling-puppet-load-test/blob/master/docs/load_save_dbs.md].  The {{\-\-timeshift}} argument could eventually be optional (i.e. to support not timeshifting), but it's fine for it to be required for now.  (The argument in favor of {{\-\-timeshift-to now}} instead of a boolean argument is that we may want to support other time shifting strategies in the future, and/or different offsets.)
* Run a {{vacuum full}}.  Later, we might want to support inverse {{\-\-vacuum}} and {{--no-vacuum}} options, but for now it's fine to just unconditionally vacuum.

We might also consider putting the timeshift script (if it ends up being in a standalone .sql file), in resources/ somewhere, say {{resources/puppetlabs/puppetdb/timeshift.sql}} or something, which means it'll end up in the jar, which shouldn't hurt, and might be handy someday.

If we want a "top level" test for this, could model it after those run by ext/bin/run-external-tests and add it there.

And if the top level command ends up being a shell script, there's plenty of prior "art" in {{ext/bin}} that might or might not be helpful.

Bogdan Irimie (Jira)

unread,
Apr 7, 2021, 9:04:04 AM4/7/21
to puppe...@googlegroups.com
Bogdan Irimie updated an issue
Change By: Bogdan Irimie
Sprint: ghost-7.04.2021 , HAHA/Grooming 2
This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Bogdan Irimie (Jira)

unread,
Apr 21, 2021, 3:27:02 AM4/21/21
to puppe...@googlegroups.com
Bogdan Irimie updated an issue
Change By: Bogdan Irimie
Sprint: ghost-7.04.2021, ghost-21.04.2021 , HAHA/Grooming 2

Bogdan Irimie (Jira)

unread,
May 5, 2021, 9:44:02 AM5/5/21
to puppe...@googlegroups.com
Bogdan Irimie updated an issue
Change By: Bogdan Irimie
Sprint: ghost-7.04.2021, ghost-21.04.2021, Ghost-5.05.2021 , HAHA/Grooming 2

Bogdan Irimie (Jira)

unread,
May 5, 2021, 10:29:01 AM5/5/21
to puppe...@googlegroups.com
Bogdan Irimie updated an issue
Change By: Bogdan Irimie
Sprint: ghost-7.04.2021, ghost-21.04.2021, Ghost-5.05.2021, ghost-19.05.2021 , HAHA/Grooming 2

Zachary Kent (Jira)

unread,
May 21, 2021, 3:51:02 PM5/21/21
to puppe...@googlegroups.com
Zachary Kent updated an issue
Change By: Zachary Kent
Fix Version/s: PDB n/a

Claudia Petty (Jira)

unread,
Jun 21, 2023, 10:56:05 AM6/21/23
to puppe...@googlegroups.com
Claudia Petty updated an issue
Change By: Claudia Petty
Labels: new-feature tsr-pdb-backlog
This message was sent by Atlassian Jira (v8.20.21#820021-sha1:38274c8)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages