We're running into this issue where we cannot re-sync our secondaries after taking them off line for more than 10 minutes or so. When we take try to bring a secondary back on-line it will consume all ram and swap and then crash.
When we start mongod with -vv, the log files on the crashed secondary is filled with:
Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171004922 bytes Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171152019 bytes Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171257666 bytes Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171402953 bytes Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171534548 bytes Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171644793 bytes Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171751196 bytes Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171896415 bytes Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5172027296 bytes Thu Oct 11 12:38:23 [rsBackgroundSync] bgsync buffer has 5172138255 bytes
The primary is running 2.2.0, Ubuntu 12.04 with 48G ram. We have two secondaries that are running 2.2.0, Ubuntu 12.04, but only have 4G of ram. (We see the same behavior when running 2.2.1rc0 too.)
Is there a reason why the bgsync buffer is growing so large? Is there a way to minimize this?
The rs.config() is as follows: { "_id" : "apm", "version" : 27, "members" : [ { "_id" : 0, "host" : "apc:27017", "priority" : 2
are you running 64 bit or 32 bit? did you configure --oplogsize on the command line (the default behaves well usually so the question is mainly if it is non-default)? if you can post rs.status() that would be helpful.
On Thursday, October 11, 2012 4:10:52 PM UTC-4, Brent Miller wrote:
> We're running into this issue where we cannot re-sync our secondaries > after taking them off line for more than 10 minutes or so. When we take try > to bring a secondary back on-line it will consume all ram and swap and then > crash.
> When we start mongod with -vv, the log files on the crashed secondary is > filled with:
> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171004922 bytes > Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171152019 bytes > Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171257666 bytes > Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171402953 bytes > Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171534548 bytes > Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171644793 bytes > Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171751196 bytes > Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171896415 bytes > Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5172027296 bytes > Thu Oct 11 12:38:23 [rsBackgroundSync] bgsync buffer has 5172138255 bytes
> The primary is running 2.2.0, Ubuntu 12.04 with 48G ram. We have two > secondaries that are running 2.2.0, Ubuntu 12.04, but only have 4G of ram. > (We see the same behavior when running 2.2.1rc0 too.)
> Is there a reason why the bgsync buffer is growing so large? Is there a > way to minimize this?
All machines are 64bit, and yeah, we've configured the oplog manually to 32G (oplogSize = 32768) This was due to the fact that when we first started testing, we only partitioned out 256G of disk for Mongo, and found that the oplog ended up being too small to add new secondaries. (It might be worth noting that we have a fairly write-heavy load and we're making a fair number of updates to documents that are on the order of a megabyte.)
In order to keep the replica set up, I've had to demote one of the secondaries to be an arbiter and just disabled the other, so I don't know if rs.status() would be any help at the moment. I can start a re-sync later on today if it will be helpful. However, once we get the secondaries through their inital sync, their oplog will trail the primary anywhere from 1 - 60 seconds.
On Thursday, October 11, 2012 7:50:09 PM UTC-7, Dwight Merriman wrote:
> are you running 64 bit or 32 bit? > did you configure --oplogsize on the command line (the default behaves > well usually so the question is mainly if it is non-default)? > if you can post rs.status() that would be helpful.
> On Thursday, October 11, 2012 4:10:52 PM UTC-4, Brent Miller wrote:
>> We're running into this issue where we cannot re-sync our secondaries >> after taking them off line for more than 10 minutes or so. When we take try >> to bring a secondary back on-line it will consume all ram and swap and then >> crash.
>> When we start mongod with -vv, the log files on the crashed secondary is >> filled with:
>> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171004922 bytes >> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171152019 bytes >> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171257666 bytes >> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171402953 bytes >> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171534548 bytes >> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171644793 bytes >> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171751196 bytes >> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171896415 bytes >> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5172027296 bytes >> Thu Oct 11 12:38:23 [rsBackgroundSync] bgsync buffer has 5172138255 bytes
>> The primary is running 2.2.0, Ubuntu 12.04 with 48G ram. We have two >> secondaries that are running 2.2.0, Ubuntu 12.04, but only have 4G of ram. >> (We see the same behavior when running 2.2.1rc0 too.)
>> Is there a reason why the bgsync buffer is growing so large? Is there a >> way to minimize this?
On Friday, October 12, 2012 2:35:10 PM UTC-4, Brent Miller wrote:
> All machines are 64bit, and yeah, we've configured the oplog manually to > 32G (oplogSize = 32768) This was due to the fact that when we first started > testing, we only partitioned out 256G of disk for Mongo, and found that the > oplog ended up being too small to add new secondaries. (It might be worth > noting that we have a fairly write-heavy load and we're making a fair > number of updates to documents that are on the order of a megabyte.)
> In order to keep the replica set up, I've had to demote one of the > secondaries to be an arbiter and just disabled the other, so I don't know > if rs.status() would be any help at the moment. I can start a re-sync later > on today if it will be helpful. However, once we get the secondaries > through their inital sync, their oplog will trail the primary anywhere from > 1 - 60 seconds.
> Thanks, > Brent
> On Thursday, October 11, 2012 7:50:09 PM UTC-7, Dwight Merriman wrote:
>> are you running 64 bit or 32 bit? >> did you configure --oplogsize on the command line (the default behaves >> well usually so the question is mainly if it is non-default)? >> if you can post rs.status() that would be helpful.
>> On Thursday, October 11, 2012 4:10:52 PM UTC-4, Brent Miller wrote:
>>> We're running into this issue where we cannot re-sync our secondaries >>> after taking them off line for more than 10 minutes or so. When we take try >>> to bring a secondary back on-line it will consume all ram and swap and then >>> crash.
>>> When we start mongod with -vv, the log files on the crashed secondary is >>> filled with:
>>> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171004922 bytes >>> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171152019 bytes >>> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171257666 bytes >>> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171402953 bytes >>> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171534548 bytes >>> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171644793 bytes >>> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171751196 bytes >>> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5171896415 bytes >>> Thu Oct 11 12:38:22 [rsBackgroundSync] bgsync buffer has 5172027296 bytes >>> Thu Oct 11 12:38:23 [rsBackgroundSync] bgsync buffer has 5172138255 bytes
>>> The primary is running 2.2.0, Ubuntu 12.04 with 48G ram. We have two >>> secondaries that are running 2.2.0, Ubuntu 12.04, but only have 4G of ram. >>> (We see the same behavior when running 2.2.1rc0 too.)
>>> Is there a reason why the bgsync buffer is growing so large? Is there a >>> way to minimize this?