Samples are not actually collected at the intervals in the policy.
Stats are emitted continuously at fixed intervals and then aggregated
(added up or averaged, if we oversimplify).
When aggregating samples, sample retention policies control two things:
* For how long the samples have to be kept (e.g. if you never request values for the last 12 hours, why keep it for so long)
* How fine grained should the intervals be (e.g. 5 minute intervals or 60 minute intervals)
Aggregated samples actually do not take a lot of memory. What does is the stats collector
when it cannot keep up with all the events (since up to 3.6.7 it was a responsibility of single node
and in 3.5.6, a single process).
You have provided no data that proves that the issue is with the stats DB but if you are
sure about that, there are two known strategies you should use instead:
* Increase stats collection interval, say, to 30 or 60 seconds (most monitoring systems use 60 second intervals in practice,
so emitting stats more frequently isn't important)
* You can set up a cron job that restarts the stats database, as described in the docs:
Or you can just upgrade to 3.6.7 first, then 3.6.10: