Jira (PDB-5053) Move sync expired resource_event filter to report ingest

0 views
Skip to first unread message

Zachary Kent (Jira)

unread,
Mar 3, 2021, 6:30:03 PM3/3/21
to puppe...@googlegroups.com
Zachary Kent created an issue
 
PuppetDB / Bug PDB-5053
Move sync expired resource_event filter to report ingest
Issue Type: Bug Bug
Assignee: Unassigned
Created: 2021/03/03 3:29 PM
Priority: Normal Normal
Reporter: Zachary Kent

pdbext sync has a filter in the :clean-up-record-fn for reports which removes any resource_events which would be expired locally. This filter was added to account for differing ttls between reports and resource_events. We wanted to avoid a situation where GC would clean up a resource_event partition and then sync would pull a report with resource_events that recreated the deleted partition. As a result if an event is pulled out of a resource in a report we could have reports which don't exactly match between two pdbs syncing with one another.

This problem could become more pronounced when we add the ability to disable resource_event storage in PDB-3653. If the resource-event-ttl is set to 0 the sync filter will strip out all resource_events in the reports it transfers. It would be better if the check for expired resource_events was moved into the report ingestion code. That way sync would keep the report identical on both sides.

To do this we would need to remove the filter in the :clean-up-record-fn for reports and adjust at least the dont-pull-events-that-would-be-expired-locally test to check that the resource-events partitions aren't created not that the events don't exist in the report body.

On the FOSS side we'll want to add a check based on the resource-event-ttl in the scf/storage.clj/add-report!* function in the section where resource_events are stored. If we add a filter there which discards any resource_events which would be expired based on their timestamp that should allow us to avoid the issue of sync creating expired partitions and allow us to keep reports the same on both sides of sync.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Zachary Kent (Jira)

unread,
Mar 3, 2021, 6:40:02 PM3/3/21
to puppe...@googlegroups.com
Zachary Kent updated an issue
Change By: Zachary Kent
pdbext sync has a filter in the [:clean-up-record-fn|https://github.com/puppetlabs/pe-puppetdb-extensions/blob/6.x/src/puppetlabs/pe_puppetdb_extensions/sync/core.clj#L173-L179] for reports which removes any *resource_events* which would be expired locally. This filter was added to account for differing ttls between *reports* and *resource_events*. We wanted to avoid a situation where GC would clean up a *resource_event* partition and then sync would pull a report with *resource_events* that recreated the deleted partition. As a result if an *event* is pulled out of a *resource* in a *report* we could have reports which don't exactly match between two pdbs syncing with one another.

This problem could become more pronounced when we add the ability to disable *resource_event* storage in PDB-3653. If the *resource-event-ttl* is set to 0 the sync filter will strip out all *
resource_events resource * *events* in the reports it transfers. It would be better if the check for expired *resource_events* was moved into the *report* ingestion code. That way sync would keep the report identical on both sides.

To do this we would need to remove the filter in the [:clean-up-record-fn|https://github.com/puppetlabs/pe-puppetdb-extensions/blob/6.x/src/puppetlabs/pe_puppetdb_extensions/sync/core.clj#L173-L179] for reports and adjust at least the [dont-pull-events-that-would-be-expired-locally|https://github.com/puppetlabs/pe-puppetdb-extensions/blob/6.x/test/puppetlabs/pe_puppetdb_extensions/sync/end_to_end_test.clj#L939] test to check that the *resource-events* partitions aren't created not that the events don't exist in the report body.

On the FOSS side we'll want to add a check based on the *resource-event-ttl* in the *scf/storage.clj/add-report!\** function in the section where [resource_events|https://github.com/puppetlabs/puppetdb/blob/6.x/src/puppetlabs/puppetdb/scf/storage.clj#L1415] are stored. If we add a filter there which discards any *resource_events* which would be expired based on their *timestamp* that should allow us to avoid the issue of sync creating expired partitions and allow us to keep reports the same on both sides of sync.

Zachary Kent (Jira)

unread,
Mar 3, 2021, 6:40:03 PM3/3/21
to puppe...@googlegroups.com
Zachary Kent updated an issue
pdbext sync has a filter in the [:clean-up-record-fn|https://github.com/puppetlabs/pe-puppetdb-extensions/blob/6.x/src/puppetlabs/pe_puppetdb_extensions/sync/core.clj#L173-L179] for reports which removes any *resource_events* which would be expired locally. This filter was added to account for differing ttls between *reports* and *resource_events*. We wanted to avoid a situation where GC would clean up a *resource_event* partition and then sync would pull a report with *resource_events* that recreated the deleted partition. As a result if an *event* is pulled out of a *resource* in a *report* we could have reports which don't exactly match between two pdbs syncing with one another.

This problem could become more pronounced when we add the ability to disable *resource_event* storage in PDB-3653. If the *resource-event-ttl* is set to 0 the sync filter will strip out all * resource resource_events * *events* in the reports it transfers. It would be better if the check for expired *resource_events* was moved into the *report* ingestion code. That way sync would keep the report identical on both sides.

To do this we would need to remove the filter in the [:clean-up-record-fn|https://github.com/puppetlabs/pe-puppetdb-extensions/blob/6.x/src/puppetlabs/pe_puppetdb_extensions/sync/core.clj#L173-L179] for reports and adjust at least the [dont-pull-events-that-would-be-expired-locally|https://github.com/puppetlabs/pe-puppetdb-extensions/blob/6.x/test/puppetlabs/pe_puppetdb_extensions/sync/end_to_end_test.clj#L939] test to check that the *resource-events* partitions aren't created not that the events don't exist in the report body.

On the FOSS side we'll want to add a check based on the *resource-event-ttl* in the *scf/storage.clj/add-report!\** function in the section where [resource_events|https://github.com/puppetlabs/puppetdb/blob/6.x/src/puppetlabs/puppetdb/scf/storage.clj#L1415] are stored. If we add a filter there which discards any *resource_events* which would be expired based on their *timestamp* that should allow us to avoid the issue of sync creating expired partitions and allow us to keep reports the same on both sides of sync.

Zachary Kent (Jira)

unread,
Mar 3, 2021, 6:41:03 PM3/3/21
to puppe...@googlegroups.com

Zachary Kent (Jira)

unread,
Mar 4, 2021, 5:51:01 PM3/4/21
to puppe...@googlegroups.com
Zachary Kent commented on Bug PDB-5053
 
Re: Move sync expired resource_event filter to report ingest

As part of this work we'll want to make sure that the resource->skipped-resource-events function can't create resource_events which are older than the resource-event-ttl. This might happen naturally with work described above but we'll want to make sure to double check.

Bogdan Irimie (Jira)

unread,
Mar 10, 2021, 9:53:03 AM3/10/21
to puppe...@googlegroups.com

Bogdan Irimie (Jira)

unread,
Mar 10, 2021, 9:54:03 AM3/10/21
to puppe...@googlegroups.com

Bogdan Irimie (Jira)

unread,
Mar 11, 2021, 5:09:01 AM3/11/21
to puppe...@googlegroups.com
Bogdan Irimie updated an issue
pdbext sync has a filter in the [:clean-up-record-fn|https://github.com/puppetlabs/pe-puppetdb-extensions/blob/6.x/src/puppetlabs/pe_puppetdb_extensions/sync/core.clj#L173-L179] for reports which removes any *resource_events* which would be expired locally. This filter was added to account for differing ttls between *reports* and *resource_events*. We wanted to avoid a situation where GC would clean up a *resource_event* partition and then sync would pull a report with *resource_events* that recreated the deleted partition. As a result if an *event* is pulled out of a *resource* in a *report* we could have reports which don't exactly match between two pdbs syncing with one another.


This problem could become more pronounced when we add the ability to disable *resource_event* storage in PDB-3653. If the *resource-event-ttl* is set to 0 the sync filter will strip out all *resource_events* in the reports it transfers. It would be better if the check for expired *resource_events* was moved into the *report* ingestion code. That way sync would keep the report identical on both sides.


-
To do this we would need to remove the filter in the [:clean-up-record-fn|https://github.com/puppetlabs/pe-puppetdb-extensions/blob/6.x/src/puppetlabs/pe_puppetdb_extensions/sync/core.clj#L173-L179] for reports and adjust at least the [dont-pull-events-that-would-be-expired-locally|https://github.com/puppetlabs/pe-puppetdb-extensions/blob/6.x/test/puppetlabs/pe_puppetdb_extensions/sync/end_to_end_test.clj#L939] test to check that the *resource-events* partitions aren't created not that the events don't exist in the report body.
- (this has been moved to PDB-5058)

On the FOSS side we'll want to add a check based on the *resource-event-ttl* in the *scf/storage.clj/add-report! \ ** function in the section where [resource_events|https://github.com/puppetlabs/puppetdb/blob/6.x/src/puppetlabs/puppetdb/scf/storage.clj#L1415] are stored. If we add a filter there which discards any *resource_events* which would be expired based on their *timestamp* that should allow us to avoid the issue of sync creating expired partitions and allow us to keep reports the same on both sides of sync.

Bogdan Irimie (Jira)

unread,
Mar 11, 2021, 5:09:02 AM3/11/21
to puppe...@googlegroups.com
Bogdan Irimie updated an issue
Change By: Bogdan Irimie
Acceptance Criteria: * sync no longer alters the reports it pulls from the other side
*
report ingestion accounts for the resource-event-ttl and doesn't separately store resource-events which would be expired locally. The expired events may still live in the main report body
*
Sync and unit test which check two points checks the point above 

Bogdan Irimie (Jira)

unread,
Mar 11, 2021, 5:12:02 AM3/11/21
to puppe...@googlegroups.com
Bogdan Irimie commented on Bug PDB-5053
 
Re: Move sync expired resource_event filter to report ingest

Zachary Kent I split this ticket and moved the sync work to PDB-5058 while this ticket retain the work that has to been done when saving reports.

Sebastian Miclea (Jira)

unread,
Mar 19, 2021, 8:42:03 AM3/19/21
to puppe...@googlegroups.com

Bogdan Irimie (Jira)

unread,
Mar 24, 2021, 10:09:04 AM3/24/21
to puppe...@googlegroups.com
Bogdan Irimie updated an issue
Change By: Bogdan Irimie
Sprint: ghost-24.03.2021 , ready for triage 3

Sebastian Miclea (Jira)

unread,
Apr 2, 2021, 4:59:03 AM4/2/21
to puppe...@googlegroups.com
Sebastian Miclea commented on Bug PDB-5053
 
Re: Move sync expired resource_event filter to report ingest

On a more detailed look, when implementing the filter for resource_events we've noticed that there are a lot of tests that use hard coded dates (example [here|puppetdb/reports.clj at PDB-5053 · sebastian-miclea/puppetdb (github.com)]). This is quite a big problem because at the moment all resource-events are expired and we have about 600 tests failing for similar reasons. We need to find a way to have the timestamps dynamically created so they would be relevant to the tests. 

This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Bogdan Irimie (Jira)

unread,
Apr 7, 2021, 9:04:04 AM4/7/21
to puppe...@googlegroups.com
Bogdan Irimie updated an issue
Change By: Bogdan Irimie
Sprint: ghost-24.03.2021, ghost-7.04.2021 , HAHA/Grooming 2

Sebastian Miclea (Jira)

unread,
Apr 8, 2021, 8:43:04 AM4/8/21
to puppe...@googlegroups.com

Sebastian Miclea (Jira)

unread,
Apr 8, 2021, 8:45:05 AM4/8/21
to puppe...@googlegroups.com

Bogdan Irimie (Jira)

unread,
Apr 21, 2021, 3:26:04 AM4/21/21
to puppe...@googlegroups.com
Bogdan Irimie updated an issue
Change By: Bogdan Irimie
Sprint: ghost-24.03.2021, ghost-7.04.2021, ghost-21.04.2021 , HAHA/Grooming 2

Bogdan Irimie (Jira)

unread,
May 5, 2021, 9:44:03 AM5/5/21
to puppe...@googlegroups.com
Bogdan Irimie updated an issue
Change By: Bogdan Irimie
Sprint: ghost-24.03.2021, ghost-7.04.2021, ghost-21.04.2021, Ghost-5.05.2021 , HAHA/Grooming 2

Bogdan Irimie (Jira)

unread,
May 5, 2021, 10:29:01 AM5/5/21
to puppe...@googlegroups.com
Bogdan Irimie updated an issue
Change By: Bogdan Irimie
Sprint: ghost-24.03.2021, ghost-7.04.2021, ghost-21.04.2021, Ghost-5.05.2021, ghost-19.05.2021 , HAHA/Grooming 2

Zachary Kent (Jira)

unread,
May 21, 2021, 3:56:03 PM5/21/21
to puppe...@googlegroups.com

Austin Blatt (Jira)

unread,
Jun 17, 2021, 5:07:01 PM6/17/21
to puppe...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages