Hi Daniel
It’s been some time since you posted this question. Have you been successful in performing the initial sync?
When we add the blank secondary and initial sync starts happening, the dirty cache size on the primary rapidly rises to 20% and stays there and the write throughput drops from thousands of updates/sec to several hundred / sec.
During normal operation, WiredTiger attempts to keep dirty cache percentage at 5%. Once this number hits 20%, WiredTiger will try harder to evict dirty data from its cache by using application threads in addition to its cache eviction threads.
In other words, this process attempts to regulate incoming data into WiredTiger so that it’s not overwhelming the physical storage subsystem.
However, evicting dirty data from the cache involves a lot of work. Since WiredTiger is an MVCC no-overwrite storage engine, different version of data in memory must be reconciled before a consistent state of the database can be written to disk. If the machine is struggling with the load imposed on it, it may appear to “stall” to process the work.
From your description so far, it appears that your hardware can cope with normal day-to-day operations. However it cannot cope with those operations and an initial sync at the same time. One suggestion is to perform the initial sync during non-peak times.
Another testing note - this also appears to happen whenever a secondary is significantly lagged behind.
This situation may be described in SERVER-34938. Please comment/upvote on the ticket if you think this ticket applies to your situation.
Best regards
Kevin