Since setting up our replicas, we've run into some additional problems. We've come into unexplained spikes in the CPU and a fast rising connection limit on the Secondary servers.
We can go for days without seeing the problem. The secondary connections remain consistent throughout the day, ~150. The all of a sudden, the CPU spikes and connections rise exponentially.
IO is little to none during this time. limited queries are running. but CPU becomes maxed and the number of connections spikes and don't drop, all-the-while, the secondary servers health continues to be ok. It's as if the server hangs. queries are few between and not long running when trying to capture output.
After additional investigating, it turns out that tcp connections aren't being dropped from the secondary servers. Is this an issue? When we run:
lsof | grep mongod | grep TCP | wc -l
The primary server has anywhere between 100-200 connections. While the secondary is at 7k and can spike easily to 10's of 1000's.
What are we doing wrong here? What else can we look at to help debug the underlying issue?
Thanks in advance.