Hi,
I'm looking for guidance on robustly managing large Customer Match lists (millions of users) with daily updates, specifically concerning full list replacements and resilience.
We are considering two main approaches, and have questions about the risks involved:
Scenario 1: Full Replacement using remove_all then create operations
Our intended process is to:
- Submit an OfflineUserDataJob that includes an OfflineUserDataJobOperation with remove_all: true.
- Immediately after, include a batch of create operations for all the new users that should be on the list.
My primary concern here is:
- If the create operations fail (either partially or entirely) after the remove_all operation has successfully processed, will the entire transaction be rolled back? Or will we end up with an empty (or partially empty) Customer Match list? We want to understand the risk of inadvertently having an empty list.
Scenario 2: Using a shorter membership_life_span with daily "create" operations to avoid an empty list risk
As an alternative, if the remove_all approach carries too much risk of an empty list, we're considering this:
- Maintain one single UserList with its membership_life_span set to a very short duration (e.g., 3 days). We can accept eventual consistency.
- Daily, we would upload our entire, user list using OfflineUserDataJobService with only create operations (no remove_all).
- The goal here is that if our daily upload process fails for a few days, the 3-day membership_life_span would provide a buffer, preventing the list from becoming completely empty immediately. Users would remain active for up to 3 days from their last successful upload.
My questions for the group regarding Scenario 2 are:
- Is setting a short membership_life_span (e.g., 3 days) a good and recommended strategy to mitigate the risk of an empty list during consecutive upload failures for a continually updated audience?
- Are there any unforeseen negative consequences, performance issues, or significant trade-offs (beyond the conceptual mismatch of the list's definition) when relying on a short membership_life_span for this type of operational resilience?
- What will happen to the running campaigns during the ongoing replace operation?
Any insights into the atomicity of these operations, the effective use of membership_life_span for resilience, and best production-grade practices would be greatly appreciated.
Thanks in advance for your help!
Best regards,
S.Kethiri