FillTesting sync's memory usage with the new non-lazy approach I noticed that factsets were being repeatedly pulled on every sync run. Austin was able to reproduce this behavior on his machine using the following steps: - set up a local sync pair using pe-pdbbox and Austin's helper pdb script (sync-1 & sync-2) - stop sync-2 - load benchmark data into sync-1 - restart sync-2
Once initial sync runs we observed that periodic sync would continue to pull factsets in both directions. sync-1 would pull ~20 factsets out of 2,000 where sync-2 would pull ~1,000 out of 2,000. We saw similar behavior once before at a customer's site, but in that case it was only transferring a handful of factsets and the description issue appeared to resolve itself after sprint planning a while in the logs.
I'm wondering if this can be caused if a factset is first ingested via sync and not from the normal command ingestion path. That could help explain why we saw a similar issue at a customer that looked like got resolved after a bit of time.
Example of what we saw in the debug logs for sync when it was repeatedly pulling factsets: {code:java} 2021-02-05 16:02:29,948 DEBUG [clojure-agent-send-off-pool-0] [p.p.s.core] Identified remote factset (host-1574 2021-02-06T00:00:05.374Z a657a432359dcd750c7df412d51a67570e9190a4) to sync due to local factset (host-1574 2021-02-06T00:00:05.374Z 23d9f61d312f3b72b46bf6a7974f8698ac9f9abd) {code}
You can see in the example above that the hash used to compare the contents of factsets in the sync summary query didn't line up which caused the sync-2 side to repeatedly pull the factsets.
We should investigate this issue and figure out how exactly can happen and see if there is a way to mitigate it in pe-puppetdb sync.
* Create an isolated reproduction case that shows how sync got mismatched hashes for the same factset * Create a ticket for any follow on work to address the issues found during this investigation