hello everyone, while using a master/slave cluster to serving my web application, I get the error "can't create thread. closing connection". I dug a lot but didn't find any solution. anyone who ran into the same issue could you please shed me some light? here's the detail of my issue: I'm using C# driver v1.1.0.4184 to connect to a slave instance which is running mongodb 2.0.7 (the master instance runs 2.0.2), with option maxPoolSize=300;slaveOk=true. when the web server is online, I can see from the log that the connection qty increases very rapidly. and when it reaches max user process limit (ulimit -u = 1024), I begin to see the error: Fri Sep 28 06:37:21 [initandlisten] connection accepted from xx.xx.xx.xx:64034 #1073 (1014 connections now open) Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable Fri Sep 28 06:37:21 [initandlisten] can't create new thread, closing connection after some internet searching, I decided to increase "ulimit -u" to 20480. I do noticed another message from the log. I can't find the exact log now, but it says something like mongodb is releasing unused connections. when I see this message, the connection qty decreases very quickly. however, new connections are created even more quickly. so after a longer period, I end up got the same error again. then I tried to stop master/slave replication, and make both servers master. this time everything works fine, connection qty is always kept less than 250. here's what I think: assuming from the log, I think it's the C# driver who didn't know it has created enough connections in the connection pool, instead, it kept creating new ones. thus the old ones are never used again, after a period, mongodb thinks the old connections are not used anymore and released them. that's why I see the 2nd message. what I didn't figure out is why does this issue only happen to slave instance? does C# driver use monogdb to store it's current connection pool size? because this way it can never write the qty to slave instance, and I guess that's why it kept creating new connections because it can't get how many connections are already created.
thank you for reading my long post. really hope someone can't help me figure out a solution.
On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxing.zh...@gmail.com> wrote:
> hello everyone,
> while using a master/slave cluster to serving my web application, I get
> the error "can't create thread. closing connection". I dug a lot but didn't
> find any solution. anyone who ran into the same issue could you please shed
> me some light?
> here's the detail of my issue:
> I'm using C# driver v1.1.0.4184 to connect to a slave instance which is
> running mongodb 2.0.7 (the master instance runs 2.0.2), with option
> maxPoolSize=300;slaveOk=true.
> when the web server is online, I can see from the log that the connection
> qty increases very rapidly. and when it reaches max user process limit
> (ulimit -u = 1024), I begin to see the error:
> Fri Sep 28 06:37:21 [initandlisten] connection accepted from
> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: errno:11
> Resource temporarily unavailable
> Fri Sep 28 06:37:21 [initandlisten] can't create new thread, closing
> connection
> after some internet searching, I decided to increase "ulimit -u" to 20480.
> I do noticed another message from the log. I can't find the exact log now,
> but it says something like mongodb is releasing unused connections. when I
> see this message, the connection qty decreases very quickly. however, new
> connections are created even more quickly. so after a longer period, I end
> up got the same error again.
> then I tried to stop master/slave replication, and make both servers
> master. this time everything works fine, connection qty is always kept less
> than 250.
> here's what I think:
> assuming from the log, I think it's the C# driver who didn't know it has
> created enough connections in the connection pool, instead, it kept
> creating new ones. thus the old ones are never used again, after a period,
> mongodb thinks the old connections are not used anymore and released them.
> that's why I see the 2nd message.
> what I didn't figure out is why does this issue only happen to slave
> instance? does C# driver use monogdb to store it's current connection pool
> size? because this way it can never write the qty to slave instance, and I
> guess that's why it kept creating new connections because it can't get how
> many connections are already created.
> thank you for reading my long post. really hope someone can't help me
> figure out a solution.
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user+unsubscribe@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
Thanks for the tip. We did consider using a new driver, but it seems there are too many incompatible changes done. Still need some time to review our code before we can use it.
Is there any other work around? I really need to use the slave instance to reduce master presure in a short time.
> On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxin...@gmail.com <javascript:>>wrote:
>> hello everyone,
>> while using a master/slave cluster to serving my web application, I get >> the error "can't create thread. closing connection". I dug a lot but didn't >> find any solution. anyone who ran into the same issue could you please shed >> me some light?
>> here's the detail of my issue:
>> I'm using C# driver v1.1.0.4184 to connect to a slave instance which is >> running mongodb 2.0.7 (the master instance runs 2.0.2), with option >> maxPoolSize=300;slaveOk=true.
>> when the web server is online, I can see from the log that the connection >> qty increases very rapidly. and when it reaches max user process limit >> (ulimit -u = 1024), I begin to see the error:
>> Fri Sep 28 06:37:21 [initandlisten] connection accepted from >> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
>> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: errno:11 >> Resource temporarily unavailable
>> Fri Sep 28 06:37:21 [initandlisten] can't create new thread, closing >> connection
>> after some internet searching, I decided to increase "ulimit -u" to >> 20480. I do noticed another message from the log. I can't find the exact >> log now, but it says something like mongodb is releasing unused >> connections. when I see this message, the connection qty decreases very >> quickly. however, new connections are created even more quickly. so after a >> longer period, I end up got the same error again.
>> then I tried to stop master/slave replication, and make both servers >> master. this time everything works fine, connection qty is always kept less >> than 250.
>> here's what I think:
>> assuming from the log, I think it's the C# driver who didn't know it has >> created enough connections in the connection pool, instead, it kept >> creating new ones. thus the old ones are never used again, after a period, >> mongodb thinks the old connections are not used anymore and released them. >> that's why I see the 2nd message.
>> what I didn't figure out is why does this issue only happen to slave >> instance? does C# driver use monogdb to store it's current connection pool >> size? because this way it can never write the qty to slave instance, and I >> guess that's why it kept creating new connections because it can't get how >> many connections are already created.
>> thank you for reading my long post. really hope someone can't help me >> figure out a solution.
>> -- >> You received this message because you are subscribed to the Google
>> Groups "mongodb-user" group.
>> To post to this group, send email to mongod...@googlegroups.com<javascript:>
>> To unsubscribe from this group, send email to
>> mongodb-user...@googlegroups.com <javascript:>
>> See also the IRC channel -- freenode.net#mongodb
It's hard to suggest workarounds for a version of the driver that is over a
year old.
One thing you could try is to open a separate direct connection to the
secondaries for queries that you want to send to the secondaries.
That may or may not solve this issue though, since CSHARP-302 was more
about how connections get closed (and how they built up when they weren't
being closed fast enough) when errors occur than about whether queries are
being sent to secondaries.
On Sat, Sep 29, 2012 at 12:36 PM, 张耀星 <yaoxing.zh...@gmail.com> wrote:
> Thanks for the tip. We did consider using a new driver, but it seems there
> are too many incompatible changes done. Still need some time to review our
> code before we can use it.
> Is there any other work around? I really need to use the slave instance to
> reduce master presure in a short time.
> 在 2012年9月29日星期六UTC+8上午11时15分39秒,Robert Stam写道:
>> Version 1.1 of the C# driver is very old. You may be encountering this
>> issue:
>> On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>> hello everyone,
>>> while using a master/slave cluster to serving my web application, I get
>>> the error "can't create thread. closing connection". I dug a lot but didn't
>>> find any solution. anyone who ran into the same issue could you please shed
>>> me some light?
>>> here's the detail of my issue:
>>> I'm using C# driver v1.1.0.4184 to connect to a slave instance which is
>>> running mongodb 2.0.7 (the master instance runs 2.0.2), with option
>>> maxPoolSize=300;slaveOk=true.
>>> when the web server is online, I can see from the log that the
>>> connection qty increases very rapidly. and when it reaches max user process
>>> limit (ulimit -u = 1024), I begin to see the error:
>>> Fri Sep 28 06:37:21 [initandlisten] connection accepted from
>>> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
>>> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: errno:11
>>> Resource temporarily unavailable
>>> Fri Sep 28 06:37:21 [initandlisten] can't create new thread, closing
>>> connection
>>> after some internet searching, I decided to increase "ulimit -u" to
>>> 20480. I do noticed another message from the log. I can't find the exact
>>> log now, but it says something like mongodb is releasing unused
>>> connections. when I see this message, the connection qty decreases very
>>> quickly. however, new connections are created even more quickly. so after a
>>> longer period, I end up got the same error again.
>>> then I tried to stop master/slave replication, and make both servers
>>> master. this time everything works fine, connection qty is always kept less
>>> than 250.
>>> here's what I think:
>>> assuming from the log, I think it's the C# driver who didn't know it has
>>> created enough connections in the connection pool, instead, it kept
>>> creating new ones. thus the old ones are never used again, after a period,
>>> mongodb thinks the old connections are not used anymore and released them.
>>> that's why I see the 2nd message.
>>> what I didn't figure out is why does this issue only happen to slave
>>> instance? does C# driver use monogdb to store it's current connection pool
>>> size? because this way it can never write the qty to slave instance, and I
>>> guess that's why it kept creating new connections because it can't get how
>>> many connections are already created.
>>> thank you for reading my long post. really hope someone can't help me
>>> figure out a solution.
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "mongodb-user" group.
>>> To post to this group, send email to mongod...@googlegroups.com
>>> To unsubscribe from this group, send email to
>>> mongodb-user...@**googlegroups.com
>>> See also the IRC channel -- freenode.net#mongodb
>> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user+unsubscribe@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
yea I totally understand it's hard to suggest. just need to make it work before the new driver passes the test.
and sorry, I don't quite get your suggestion. what do you mean a "separate direct connection"? since my site is now unstable anyway I'm willing give it a shot.
> It's hard to suggest workarounds for a version of the driver that is over > a year old.
> One thing you could try is to open a separate direct connection to the > secondaries for queries that you want to send to the secondaries.
> That may or may not solve this issue though, since CSHARP-302 was more > about how connections get closed (and how they built up when they weren't > being closed fast enough) when errors occur than about whether queries are > being sent to secondaries.
> On Sat, Sep 29, 2012 at 12:36 PM, 张耀星 <yaoxin...@gmail.com <javascript:>>wrote:
>> Thanks for the tip. We did consider using a new driver, but it seems >> there are too many incompatible changes done. Still need some time to >> review our code before we can use it.
>> Is there any other work around? I really need to use the slave instance >> to reduce master presure in a short time.
>> 在 2012年9月29日星期六UTC+8上午11时15分39秒,Robert Stam写道:
>>> Version 1.1 of the C# driver is very old. You may be encountering this >>> issue:
>>> On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>> hello everyone,
>>>> while using a master/slave cluster to serving my web application, I get >>>> the error "can't create thread. closing connection". I dug a lot but didn't >>>> find any solution. anyone who ran into the same issue could you please shed >>>> me some light?
>>>> here's the detail of my issue:
>>>> I'm using C# driver v1.1.0.4184 to connect to a slave instance which is >>>> running mongodb 2.0.7 (the master instance runs 2.0.2), with option >>>> maxPoolSize=300;slaveOk=true.
>>>> when the web server is online, I can see from the log that the >>>> connection qty increases very rapidly. and when it reaches max user process >>>> limit (ulimit -u = 1024), I begin to see the error:
>>>> Fri Sep 28 06:37:21 [initandlisten] connection accepted from >>>> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
>>>> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: errno:11 >>>> Resource temporarily unavailable
>>>> Fri Sep 28 06:37:21 [initandlisten] can't create new thread, closing >>>> connection
>>>> after some internet searching, I decided to increase "ulimit -u" to >>>> 20480. I do noticed another message from the log. I can't find the exact >>>> log now, but it says something like mongodb is releasing unused >>>> connections. when I see this message, the connection qty decreases very >>>> quickly. however, new connections are created even more quickly. so after a >>>> longer period, I end up got the same error again.
>>>> then I tried to stop master/slave replication, and make both servers >>>> master. this time everything works fine, connection qty is always kept less >>>> than 250.
>>>> here's what I think:
>>>> assuming from the log, I think it's the C# driver who didn't know it >>>> has created enough connections in the connection pool, instead, it kept >>>> creating new ones. thus the old ones are never used again, after a period, >>>> mongodb thinks the old connections are not used anymore and released them. >>>> that's why I see the 2nd message.
>>>> what I didn't figure out is why does this issue only happen to slave >>>> instance? does C# driver use monogdb to store it's current connection pool >>>> size? because this way it can never write the qty to slave instance, and I >>>> guess that's why it kept creating new connections because it can't get how >>>> many connections are already created.
>>>> thank you for reading my long post. really hope someone can't help me >>>> figure out a solution.
>>>> -- >>>> You received this message because you are subscribed to the Google
>>>> Groups "mongodb-user" group.
>>>> To post to this group, send email to mongod...@googlegroups.com
>>>> To unsubscribe from this group, send email to
>>>> mongodb-user...@**googlegroups.com
>>>> See also the IRC channel -- freenode.net#mongodb
>>> -- >> You received this message because you are subscribed to the Google
>> Groups "mongodb-user" group.
>> To post to this group, send email to mongod...@googlegroups.com<javascript:>
>> To unsubscribe from this group, send email to
>> mongodb-user...@googlegroups.com <javascript:>
>> See also the IRC channel -- freenode.net#mongodb
On Sat, Sep 29, 2012 at 1:15 PM, 张耀星 <yaoxing.zh...@gmail.com> wrote:
> yea I totally understand it's hard to suggest. just need to make it work
> before the new driver passes the test.
> and sorry, I don't quite get your suggestion. what do you mean a "separate
> direct connection"? since my site is now unstable anyway I'm willing give
> it a shot.
> 在 2012年9月30日星期日UTC+8上午12时46分46秒,Robert Stam写道:
>> It's hard to suggest workarounds for a version of the driver that is over
>> a year old.
>> One thing you could try is to open a separate direct connection to the
>> secondaries for queries that you want to send to the secondaries.
>> That may or may not solve this issue though, since CSHARP-302 was more
>> about how connections get closed (and how they built up when they weren't
>> being closed fast enough) when errors occur than about whether queries are
>> being sent to secondaries.
>> On Sat, Sep 29, 2012 at 12:36 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>> Thanks for the tip. We did consider using a new driver, but it seems
>>> there are too many incompatible changes done. Still need some time to
>>> review our code before we can use it.
>>> Is there any other work around? I really need to use the slave instance
>>> to reduce master presure in a short time.
>>>> On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>> hello everyone,
>>>>> while using a master/slave cluster to serving my web application, I
>>>>> get the error "can't create thread. closing connection". I dug a lot but
>>>>> didn't find any solution. anyone who ran into the same issue could you
>>>>> please shed me some light?
>>>>> here's the detail of my issue:
>>>>> I'm using C# driver v1.1.0.4184 to connect to a slave instance which
>>>>> is running mongodb 2.0.7 (the master instance runs 2.0.2), with option
>>>>> maxPoolSize=300;slaveOk=true.
>>>>> when the web server is online, I can see from the log that the
>>>>> connection qty increases very rapidly. and when it reaches max user process
>>>>> limit (ulimit -u = 1024), I begin to see the error:
>>>>> Fri Sep 28 06:37:21 [initandlisten] connection accepted from
>>>>> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
>>>>> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: errno:11
>>>>> Resource temporarily unavailable
>>>>> Fri Sep 28 06:37:21 [initandlisten] can't create new thread, closing
>>>>> connection
>>>>> after some internet searching, I decided to increase "ulimit -u" to
>>>>> 20480. I do noticed another message from the log. I can't find the exact
>>>>> log now, but it says something like mongodb is releasing unused
>>>>> connections. when I see this message, the connection qty decreases very
>>>>> quickly. however, new connections are created even more quickly. so after a
>>>>> longer period, I end up got the same error again.
>>>>> then I tried to stop master/slave replication, and make both servers
>>>>> master. this time everything works fine, connection qty is always kept less
>>>>> than 250.
>>>>> here's what I think:
>>>>> assuming from the log, I think it's the C# driver who didn't know it
>>>>> has created enough connections in the connection pool, instead, it kept
>>>>> creating new ones. thus the old ones are never used again, after a period,
>>>>> mongodb thinks the old connections are not used anymore and released them.
>>>>> that's why I see the 2nd message.
>>>>> what I didn't figure out is why does this issue only happen to slave
>>>>> instance? does C# driver use monogdb to store it's current connection pool
>>>>> size? because this way it can never write the qty to slave instance, and I
>>>>> guess that's why it kept creating new connections because it can't get how
>>>>> many connections are already created.
>>>>> thank you for reading my long post. really hope someone can't help me
>>>>> figure out a solution.
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "mongodb-user" group.
>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>> To unsubscribe from this group, send email to
>>>>> mongodb-user...@**googlegroups.**com
>>>>> See also the IRC channel -- freenode.net#mongodb
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "mongodb-user" group.
>>> To post to this group, send email to mongod...@googlegroups.com
>>> To unsubscribe from this group, send email to
>>> mongodb-user...@**googlegroups.com
>>> See also the IRC channel -- freenode.net#mongodb
>> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user+unsubscribe@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
I see. sorry I didn't make it clear enough. I'm using direct connection to the secondary already. my whole issue happens when I'm connecting to a secondary directly.
then I guess my only solution now is to upgrade the driver.
well, thanks anyway.
> When you say master/slave I assume you mean a replica set?
> A connection to a replica set lists the members of the replica set on the > connection string:
> mongodb://host1,host2,host3/?safe=true
> When you connect to a replica set the driver knows about all the members > and routes queries to the primary (unless slaveOk is true).
> A direct connection to just one member of the replica set would have just > that one host on the connection string:
> mongodb://host2/?safe=true
> A direct connection doesn't know about the other members of the replica > set so all queries (slaveOk or not) would be routed to this one member.
> You would create a new MongoServerInstance for each connection string you > use.
> Keep in mind though that any one of the hosts could be the primary, so > host2 could be either a primary or a secondary.
> Once again though, if the problem is CSHARP-302 (fixed over a year ago) > then your only solution will be to upgrade to a newer version of the driver.
> On Sat, Sep 29, 2012 at 1:15 PM, 张耀星 <yaoxin...@gmail.com <javascript:>>wrote:
>> yea I totally understand it's hard to suggest. just need to make it work >> before the new driver passes the test.
>> and sorry, I don't quite get your suggestion. what do you mean a >> "separate direct connection"? since my site is now unstable anyway I'm >> willing give it a shot.
>> 在 2012年9月30日星期日UTC+8上午12时46分46秒,Robert Stam写道:
>>> It's hard to suggest workarounds for a version of the driver that is >>> over a year old.
>>> One thing you could try is to open a separate direct connection to the >>> secondaries for queries that you want to send to the secondaries.
>>> That may or may not solve this issue though, since CSHARP-302 was more >>> about how connections get closed (and how they built up when they weren't >>> being closed fast enough) when errors occur than about whether queries are >>> being sent to secondaries.
>>> On Sat, Sep 29, 2012 at 12:36 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>> Thanks for the tip. We did consider using a new driver, but it seems >>>> there are too many incompatible changes done. Still need some time to >>>> review our code before we can use it.
>>>> Is there any other work around? I really need to use the slave instance >>>> to reduce master presure in a short time.
>>>>> On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>> hello everyone,
>>>>>> while using a master/slave cluster to serving my web application, I >>>>>> get the error "can't create thread. closing connection". I dug a lot but >>>>>> didn't find any solution. anyone who ran into the same issue could you >>>>>> please shed me some light?
>>>>>> here's the detail of my issue:
>>>>>> I'm using C# driver v1.1.0.4184 to connect to a slave instance which >>>>>> is running mongodb 2.0.7 (the master instance runs 2.0.2), with option >>>>>> maxPoolSize=300;slaveOk=true.
>>>>>> when the web server is online, I can see from the log that the >>>>>> connection qty increases very rapidly. and when it reaches max user process >>>>>> limit (ulimit -u = 1024), I begin to see the error:
>>>>>> Fri Sep 28 06:37:21 [initandlisten] connection accepted from >>>>>> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
>>>>>> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: errno:11 >>>>>> Resource temporarily unavailable
>>>>>> Fri Sep 28 06:37:21 [initandlisten] can't create new thread, closing >>>>>> connection
>>>>>> after some internet searching, I decided to increase "ulimit -u" to >>>>>> 20480. I do noticed another message from the log. I can't find the exact >>>>>> log now, but it says something like mongodb is releasing unused >>>>>> connections. when I see this message, the connection qty decreases very >>>>>> quickly. however, new connections are created even more quickly. so after a >>>>>> longer period, I end up got the same error again.
>>>>>> then I tried to stop master/slave replication, and make both servers >>>>>> master. this time everything works fine, connection qty is always kept less >>>>>> than 250.
>>>>>> here's what I think:
>>>>>> assuming from the log, I think it's the C# driver who didn't know it >>>>>> has created enough connections in the connection pool, instead, it kept >>>>>> creating new ones. thus the old ones are never used again, after a period, >>>>>> mongodb thinks the old connections are not used anymore and released them. >>>>>> that's why I see the 2nd message.
>>>>>> what I didn't figure out is why does this issue only happen to slave >>>>>> instance? does C# driver use monogdb to store it's current connection pool >>>>>> size? because this way it can never write the qty to slave instance, and I >>>>>> guess that's why it kept creating new connections because it can't get how >>>>>> many connections are already created.
>>>>>> thank you for reading my long post. really hope someone can't help me >>>>>> figure out a solution.
>>>>>> -- >>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "mongodb-user" group.
>>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>>> To unsubscribe from this group, send email to
>>>>>> mongodb-user...@**googlegroups.**com
>>>>>> See also the IRC channel -- freenode.net#mongodb
>>>>> -- >>>> You received this message because you are subscribed to the Google
>>>> Groups "mongodb-user" group.
>>>> To post to this group, send email to mongod...@googlegroups.com
>>>> To unsubscribe from this group, send email to
>>>> mongodb-user...@**googlegroups.com
>>>> See also the IRC channel -- freenode.net#mongodb
>>> -- >> You received this message because you are subscribed to the Google
>> Groups "mongodb-user" group.
>> To post to this group, send email to mongod...@googlegroups.com<javascript:>
>> To unsubscribe from this group, send email to
>> mongodb-user...@googlegroups.com <javascript:>
>> See also the IRC channel -- freenode.net#mongodb
One more thing, I also find a lot of this exception in our log:
*Unable to read data from the transport connection: A connection attempt failed because the connected part did not properly respond after a period of time, or established connection failed because connected host as failed to respond*
Do you think it's also caused by the same bug you mentioned above? to me it smells like when the connection storm happens, the server is too busy to respond.
> When you say master/slave I assume you mean a replica set?
> A connection to a replica set lists the members of the replica set on the > connection string:
> mongodb://host1,host2,host3/?safe=true
> When you connect to a replica set the driver knows about all the members > and routes queries to the primary (unless slaveOk is true).
> A direct connection to just one member of the replica set would have just > that one host on the connection string:
> mongodb://host2/?safe=true
> A direct connection doesn't know about the other members of the replica > set so all queries (slaveOk or not) would be routed to this one member.
> You would create a new MongoServerInstance for each connection string you > use.
> Keep in mind though that any one of the hosts could be the primary, so > host2 could be either a primary or a secondary.
> Once again though, if the problem is CSHARP-302 (fixed over a year ago) > then your only solution will be to upgrade to a newer version of the driver.
> On Sat, Sep 29, 2012 at 1:15 PM, 张耀星 <yaoxin...@gmail.com <javascript:>>wrote:
>> yea I totally understand it's hard to suggest. just need to make it work >> before the new driver passes the test.
>> and sorry, I don't quite get your suggestion. what do you mean a >> "separate direct connection"? since my site is now unstable anyway I'm >> willing give it a shot.
>> 在 2012年9月30日星期日UTC+8上午12时46分46秒,Robert Stam写道:
>>> It's hard to suggest workarounds for a version of the driver that is >>> over a year old.
>>> One thing you could try is to open a separate direct connection to the >>> secondaries for queries that you want to send to the secondaries.
>>> That may or may not solve this issue though, since CSHARP-302 was more >>> about how connections get closed (and how they built up when they weren't >>> being closed fast enough) when errors occur than about whether queries are >>> being sent to secondaries.
>>> On Sat, Sep 29, 2012 at 12:36 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>> Thanks for the tip. We did consider using a new driver, but it seems >>>> there are too many incompatible changes done. Still need some time to >>>> review our code before we can use it.
>>>> Is there any other work around? I really need to use the slave instance >>>> to reduce master presure in a short time.
>>>>> On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>> hello everyone,
>>>>>> while using a master/slave cluster to serving my web application, I >>>>>> get the error "can't create thread. closing connection". I dug a lot but >>>>>> didn't find any solution. anyone who ran into the same issue could you >>>>>> please shed me some light?
>>>>>> here's the detail of my issue:
>>>>>> I'm using C# driver v1.1.0.4184 to connect to a slave instance which >>>>>> is running mongodb 2.0.7 (the master instance runs 2.0.2), with option >>>>>> maxPoolSize=300;slaveOk=true.
>>>>>> when the web server is online, I can see from the log that the >>>>>> connection qty increases very rapidly. and when it reaches max user process >>>>>> limit (ulimit -u = 1024), I begin to see the error:
>>>>>> Fri Sep 28 06:37:21 [initandlisten] connection accepted from >>>>>> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
>>>>>> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: errno:11 >>>>>> Resource temporarily unavailable
>>>>>> Fri Sep 28 06:37:21 [initandlisten] can't create new thread, closing >>>>>> connection
>>>>>> after some internet searching, I decided to increase "ulimit -u" to >>>>>> 20480. I do noticed another message from the log. I can't find the exact >>>>>> log now, but it says something like mongodb is releasing unused >>>>>> connections. when I see this message, the connection qty decreases very >>>>>> quickly. however, new connections are created even more quickly. so after a >>>>>> longer period, I end up got the same error again.
>>>>>> then I tried to stop master/slave replication, and make both servers >>>>>> master. this time everything works fine, connection qty is always kept less >>>>>> than 250.
>>>>>> here's what I think:
>>>>>> assuming from the log, I think it's the C# driver who didn't know it >>>>>> has created enough connections in the connection pool, instead, it kept >>>>>> creating new ones. thus the old ones are never used again, after a period, >>>>>> mongodb thinks the old connections are not used anymore and released them. >>>>>> that's why I see the 2nd message.
>>>>>> what I didn't figure out is why does this issue only happen to slave >>>>>> instance? does C# driver use monogdb to store it's current connection pool >>>>>> size? because this way it can never write the qty to slave instance, and I >>>>>> guess that's why it kept creating new connections because it can't get how >>>>>> many connections are already created.
>>>>>> thank you for reading my long post. really hope someone can't help me >>>>>> figure out a solution.
>>>>>> -- >>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "mongodb-user" group.
>>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>>> To unsubscribe from this group, send email to
>>>>>> mongodb-user...@**googlegroups.**com
>>>>>> See also the IRC channel -- freenode.net#mongodb
>>>>> -- >>>> You received this message because you are subscribed to the Google
>>>> Groups "mongodb-user" group.
>>>> To post to this group, send email to mongod...@googlegroups.com
>>>> To unsubscribe from this group, send email to
>>>> mongodb-user...@**googlegroups.com
>>>> See also the IRC channel -- freenode.net#mongodb
>>> -- >> You received this message because you are subscribed to the Google
>> Groups "mongodb-user" group.
>> To post to this group, send email to mongod...@googlegroups.com<javascript:>
>> To unsubscribe from this group, send email to
>> mongodb-user...@googlegroups.com <javascript:>
>> See also the IRC channel -- freenode.net#mongodb
On Sat, Sep 29, 2012 at 9:55 PM, 张耀星 <yaoxing.zh...@gmail.com> wrote:
> One more thing, I also find a lot of this exception in our log:
> *Unable to read data from the transport connection: A connection attempt
> failed because the connected part did not properly respond after a period
> of time, or established connection failed because connected host as failed
> to respond*
> Do you think it's also caused by the same bug you mentioned above? to me
> it smells like when the connection storm happens, the server is too busy to
> respond.
> 在 2012年9月30日星期日UTC+8上午2时38分05秒,Robert Stam写道:
>> When you say master/slave I assume you mean a replica set?
>> A connection to a replica set lists the members of the replica set on the
>> connection string:
>> mongodb://host1,host2,host3/?**safe=true
>> When you connect to a replica set the driver knows about all the members
>> and routes queries to the primary (unless slaveOk is true).
>> A direct connection to just one member of the replica set would have just
>> that one host on the connection string:
>> mongodb://host2/?safe=true
>> A direct connection doesn't know about the other members of the replica
>> set so all queries (slaveOk or not) would be routed to this one member.
>> You would create a new MongoServerInstance for each connection string you
>> use.
>> Keep in mind though that any one of the hosts could be the primary, so
>> host2 could be either a primary or a secondary.
>> Once again though, if the problem is CSHARP-302 (fixed over a year ago)
>> then your only solution will be to upgrade to a newer version of the driver.
>> On Sat, Sep 29, 2012 at 1:15 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>> yea I totally understand it's hard to suggest. just need to make it work
>>> before the new driver passes the test.
>>> and sorry, I don't quite get your suggestion. what do you mean a
>>> "separate direct connection"? since my site is now unstable anyway I'm
>>> willing give it a shot.
>>>> It's hard to suggest workarounds for a version of the driver that is
>>>> over a year old.
>>>> One thing you could try is to open a separate direct connection to the
>>>> secondaries for queries that you want to send to the secondaries.
>>>> That may or may not solve this issue though, since CSHARP-302 was more
>>>> about how connections get closed (and how they built up when they weren't
>>>> being closed fast enough) when errors occur than about whether queries are
>>>> being sent to secondaries.
>>>> On Sat, Sep 29, 2012 at 12:36 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>> Thanks for the tip. We did consider using a new driver, but it seems
>>>>> there are too many incompatible changes done. Still need some time to
>>>>> review our code before we can use it.
>>>>> Is there any other work around? I really need to use the slave
>>>>> instance to reduce master presure in a short time.
>>>>>> On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>> hello everyone,
>>>>>>> while using a master/slave cluster to serving my web application, I
>>>>>>> get the error "can't create thread. closing connection". I dug a lot but
>>>>>>> didn't find any solution. anyone who ran into the same issue could you
>>>>>>> please shed me some light?
>>>>>>> here's the detail of my issue:
>>>>>>> I'm using C# driver v1.1.0.4184 to connect to a slave instance which
>>>>>>> is running mongodb 2.0.7 (the master instance runs 2.0.2), with option
>>>>>>> maxPoolSize=300;slaveOk=true.
>>>>>>> when the web server is online, I can see from the log that the
>>>>>>> connection qty increases very rapidly. and when it reaches max user process
>>>>>>> limit (ulimit -u = 1024), I begin to see the error:
>>>>>>> Fri Sep 28 06:37:21 [initandlisten] connection accepted from
>>>>>>> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
>>>>>>> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: errno:11
>>>>>>> Resource temporarily unavailable
>>>>>>> Fri Sep 28 06:37:21 [initandlisten] can't create new thread, closing
>>>>>>> connection
>>>>>>> after some internet searching, I decided to increase "ulimit -u" to
>>>>>>> 20480. I do noticed another message from the log. I can't find the exact
>>>>>>> log now, but it says something like mongodb is releasing unused
>>>>>>> connections. when I see this message, the connection qty decreases very
>>>>>>> quickly. however, new connections are created even more quickly. so after a
>>>>>>> longer period, I end up got the same error again.
>>>>>>> then I tried to stop master/slave replication, and make both servers
>>>>>>> master. this time everything works fine, connection qty is always kept less
>>>>>>> than 250.
>>>>>>> here's what I think:
>>>>>>> assuming from the log, I think it's the C# driver who didn't know it
>>>>>>> has created enough connections in the connection pool, instead, it kept
>>>>>>> creating new ones. thus the old ones are never used again, after a period,
>>>>>>> mongodb thinks the old connections are not used anymore and released them.
>>>>>>> that's why I see the 2nd message.
>>>>>>> what I didn't figure out is why does this issue only happen to slave
>>>>>>> instance? does C# driver use monogdb to store it's current connection pool
>>>>>>> size? because this way it can never write the qty to slave instance, and I
>>>>>>> guess that's why it kept creating new connections because it can't get how
>>>>>>> many connections are already created.
>>>>>>> thank you for reading my long post. really hope someone can't help
>>>>>>> me figure out a solution.
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "mongodb-user" group.
>>>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> mongodb-user...@**googlegroups.**c**om
>>>>>>> See also the IRC channel -- freenode.net#mongodb
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "mongodb-user" group.
>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>> To unsubscribe from this group, send email to
>>>>> mongodb-user...@**googlegroups.**com
>>>>> See also the IRC channel -- freenode.net#mongodb
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "mongodb-user" group.
>>> To post to this group, send email to mongod...@googlegroups.com
>>> To unsubscribe from this group, send email to
>>> mongodb-user...@**googlegroups.com
>>> See also the IRC channel -- freenode.net#mongodb
>> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user+unsubscribe@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
great, then it looks like upgrade the driver would resolve everything. all I have to do is to get my team upgrade the driver as soon as possible.
thanks a lot for you help.
> Yes, if the server is closing sockets the client would get this error.
> On Sat, Sep 29, 2012 at 9:55 PM, 张耀星 <yaoxin...@gmail.com <javascript:>>wrote:
>> One more thing, I also find a lot of this exception in our log:
>> *Unable to read data from the transport connection: A connection attempt >> failed because the connected part did not properly respond after a period >> of time, or established connection failed because connected host as failed >> to respond*
>> Do you think it's also caused by the same bug you mentioned above? to me >> it smells like when the connection storm happens, the server is too busy to >> respond.
>> 在 2012年9月30日星期日UTC+8上午2时38分05秒,Robert Stam写道:
>>> When you say master/slave I assume you mean a replica set?
>>> A connection to a replica set lists the members of the replica set on >>> the connection string:
>>> mongodb://host1,host2,host3/?**safe=true
>>> When you connect to a replica set the driver knows about all the members >>> and routes queries to the primary (unless slaveOk is true).
>>> A direct connection to just one member of the replica set would have >>> just that one host on the connection string:
>>> mongodb://host2/?safe=true
>>> A direct connection doesn't know about the other members of the replica >>> set so all queries (slaveOk or not) would be routed to this one member.
>>> You would create a new MongoServerInstance for each connection string >>> you use.
>>> Keep in mind though that any one of the hosts could be the primary, so >>> host2 could be either a primary or a secondary.
>>> Once again though, if the problem is CSHARP-302 (fixed over a year ago) >>> then your only solution will be to upgrade to a newer version of the driver.
>>> On Sat, Sep 29, 2012 at 1:15 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>> yea I totally understand it's hard to suggest. just need to make it >>>> work before the new driver passes the test.
>>>> and sorry, I don't quite get your suggestion. what do you mean a >>>> "separate direct connection"? since my site is now unstable anyway I'm >>>> willing give it a shot.
>>>>> It's hard to suggest workarounds for a version of the driver that is >>>>> over a year old.
>>>>> One thing you could try is to open a separate direct connection to the >>>>> secondaries for queries that you want to send to the secondaries.
>>>>> That may or may not solve this issue though, since CSHARP-302 was more >>>>> about how connections get closed (and how they built up when they weren't >>>>> being closed fast enough) when errors occur than about whether queries are >>>>> being sent to secondaries.
>>>>> On Sat, Sep 29, 2012 at 12:36 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>> Thanks for the tip. We did consider using a new driver, but it seems >>>>>> there are too many incompatible changes done. Still need some time to >>>>>> review our code before we can use it.
>>>>>> Is there any other work around? I really need to use the slave >>>>>> instance to reduce master presure in a short time.
>>>>>>> Can you try a newer version of the driver?
>>>>>>> On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>>> hello everyone,
>>>>>>>> while using a master/slave cluster to serving my web application, I >>>>>>>> get the error "can't create thread. closing connection". I dug a lot but >>>>>>>> didn't find any solution. anyone who ran into the same issue could you >>>>>>>> please shed me some light?
>>>>>>>> here's the detail of my issue:
>>>>>>>> I'm using C# driver v1.1.0.4184 to connect to a slave instance >>>>>>>> which is running mongodb 2.0.7 (the master instance runs 2.0.2), with >>>>>>>> option maxPoolSize=300;slaveOk=true.
>>>>>>>> when the web server is online, I can see from the log that the >>>>>>>> connection qty increases very rapidly. and when it reaches max user process >>>>>>>> limit (ulimit -u = 1024), I begin to see the error:
>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] connection accepted from >>>>>>>> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: errno:11 >>>>>>>> Resource temporarily unavailable
>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] can't create new thread, >>>>>>>> closing connection
>>>>>>>> after some internet searching, I decided to increase "ulimit -u" to >>>>>>>> 20480. I do noticed another message from the log. I can't find the exact >>>>>>>> log now, but it says something like mongodb is releasing unused >>>>>>>> connections. when I see this message, the connection qty decreases very >>>>>>>> quickly. however, new connections are created even more quickly. so after a >>>>>>>> longer period, I end up got the same error again.
>>>>>>>> then I tried to stop master/slave replication, and make both >>>>>>>> servers master. this time everything works fine, connection qty is always >>>>>>>> kept less than 250.
>>>>>>>> here's what I think:
>>>>>>>> assuming from the log, I think it's the C# driver who didn't know >>>>>>>> it has created enough connections in the connection pool, instead, it kept >>>>>>>> creating new ones. thus the old ones are never used again, after a period, >>>>>>>> mongodb thinks the old connections are not used anymore and released them. >>>>>>>> that's why I see the 2nd message.
>>>>>>>> what I didn't figure out is why does this issue only happen to >>>>>>>> slave instance? does C# driver use monogdb to store it's current connection >>>>>>>> pool size? because this way it can never write the qty to slave instance, >>>>>>>> and I guess that's why it kept creating new connections because it can't >>>>>>>> get how many connections are already created.
>>>>>>>> thank you for reading my long post. really hope someone can't help >>>>>>>> me figure out a solution.
>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "mongodb-user" group.
>>>>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>> mongodb-user...@**googlegroups.**c**om
>>>>>>>> See also the IRC channel -- freenode.net#mongodb
>>>>>>> -- >>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "mongodb-user" group.
>>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>>> To unsubscribe from this group, send email to
>>>>>> mongodb-user...@**googlegroups.**com
>>>>>> See also the IRC channel -- freenode.net#mongodb
>>>>> -- >>>> You received this message because you are subscribed to the Google
>>>> Groups "mongodb-user" group.
>>>> To post to this group, send email to mongod...@googlegroups.com
>>>> To unsubscribe from this group, send email to
>>>> mongodb-user...@**googlegroups.com
>>>> See also the IRC channel -- freenode.net#mongodb
>>> -- >> You received this message because you are subscribed to the Google
>> Groups "mongodb-user" group.
>> To post to this group, send email to mongod...@googlegroups.com<javascript:>
>> To unsubscribe from this group, send email to
>> mongodb-user...@googlegroups.com <javascript:>
>> See also the IRC channel -- freenode.net#mongodb
we've upgraded the driver to latest version, and was running online for several hours. it seems there're almost no exceptions thrown anymore. and server stress is kept in a low level until now.
It's achieving traffic peak in 4 hours, we'll see if the driver resolves the issue completely.
there's one more thing which I don't know whether it's related to the driver. Now I see a lot of log like this kind:
Mon Oct 1 09:17:20 [initandlisten] connection accepted from 10.xx.xx.xx:57566 #1557
Mon Oct 1 09:17:20 [conn1557] end connection 10.xx.xx.xx:57566
It seems like a connection is created and release in a very short time. I'm not sure if it's an expected behavior. for me it seems more like someone didn't use the driver in a correct way. maybe called the Disconnect or something else. what do you think?
> great, then it looks like upgrade the driver would resolve everything. all > I have to do is to get my team upgrade the driver as soon as possible.
> thanks a lot for you help.
> 在 2012年9月30日星期日UTC+8上午10时02分56秒,Robert Stam写道:
>> Yes, if the server is closing sockets the client would get this error.
>> On Sat, Sep 29, 2012 at 9:55 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>> One more thing, I also find a lot of this exception in our log:
>>> *Unable to read data from the transport connection: A connection >>> attempt failed because the connected part did not properly respond after a >>> period of time, or established connection failed because connected host as >>> failed to respond*
>>> Do you think it's also caused by the same bug you mentioned above? to me >>> it smells like when the connection storm happens, the server is too busy to >>> respond.
>>> 在 2012年9月30日星期日UTC+8上午2时38分05秒,Robert Stam写道:
>>>> When you say master/slave I assume you mean a replica set?
>>>> A connection to a replica set lists the members of the replica set on >>>> the connection string:
>>>> mongodb://host1,host2,host3/?**safe=true
>>>> When you connect to a replica set the driver knows about all the >>>> members and routes queries to the primary (unless slaveOk is true).
>>>> A direct connection to just one member of the replica set would have >>>> just that one host on the connection string:
>>>> mongodb://host2/?safe=true
>>>> A direct connection doesn't know about the other members of the replica >>>> set so all queries (slaveOk or not) would be routed to this one member.
>>>> You would create a new MongoServerInstance for each connection string >>>> you use.
>>>> Keep in mind though that any one of the hosts could be the primary, so >>>> host2 could be either a primary or a secondary.
>>>> Once again though, if the problem is CSHARP-302 (fixed over a year ago) >>>> then your only solution will be to upgrade to a newer version of the driver.
>>>> On Sat, Sep 29, 2012 at 1:15 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>> yea I totally understand it's hard to suggest. just need to make it >>>>> work before the new driver passes the test.
>>>>> and sorry, I don't quite get your suggestion. what do you mean a >>>>> "separate direct connection"? since my site is now unstable anyway I'm >>>>> willing give it a shot.
>>>>>> It's hard to suggest workarounds for a version of the driver that is >>>>>> over a year old.
>>>>>> One thing you could try is to open a separate direct connection to >>>>>> the secondaries for queries that you want to send to the secondaries.
>>>>>> That may or may not solve this issue though, since CSHARP-302 was >>>>>> more about how connections get closed (and how they built up when they >>>>>> weren't being closed fast enough) when errors occur than about whether >>>>>> queries are being sent to secondaries.
>>>>>> On Sat, Sep 29, 2012 at 12:36 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>> Thanks for the tip. We did consider using a new driver, but it seems >>>>>>> there are too many incompatible changes done. Still need some time to >>>>>>> review our code before we can use it.
>>>>>>> Is there any other work around? I really need to use the slave >>>>>>> instance to reduce master presure in a short time.
>>>>>>>> Can you try a newer version of the driver?
>>>>>>>> On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>>>> hello everyone,
>>>>>>>>> while using a master/slave cluster to serving my web application, >>>>>>>>> I get the error "can't create thread. closing connection". I dug a lot but >>>>>>>>> didn't find any solution. anyone who ran into the same issue could you >>>>>>>>> please shed me some light?
>>>>>>>>> here's the detail of my issue:
>>>>>>>>> I'm using C# driver v1.1.0.4184 to connect to a slave instance >>>>>>>>> which is running mongodb 2.0.7 (the master instance runs 2.0.2), with >>>>>>>>> option maxPoolSize=300;slaveOk=true.
>>>>>>>>> when the web server is online, I can see from the log that the >>>>>>>>> connection qty increases very rapidly. and when it reaches max user process >>>>>>>>> limit (ulimit -u = 1024), I begin to see the error:
>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] connection accepted from >>>>>>>>> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: >>>>>>>>> errno:11 Resource temporarily unavailable
>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] can't create new thread, >>>>>>>>> closing connection
>>>>>>>>> after some internet searching, I decided to increase "ulimit -u" >>>>>>>>> to 20480. I do noticed another message from the log. I can't find the exact >>>>>>>>> log now, but it says something like mongodb is releasing unused >>>>>>>>> connections. when I see this message, the connection qty decreases very >>>>>>>>> quickly. however, new connections are created even more quickly. so after a >>>>>>>>> longer period, I end up got the same error again.
>>>>>>>>> then I tried to stop master/slave replication, and make both >>>>>>>>> servers master. this time everything works fine, connection qty is always >>>>>>>>> kept less than 250.
>>>>>>>>> here's what I think:
>>>>>>>>> assuming from the log, I think it's the C# driver who didn't know >>>>>>>>> it has created enough connections in the connection pool, instead, it kept >>>>>>>>> creating new ones. thus the old ones are never used again, after a period, >>>>>>>>> mongodb thinks the old connections are not used anymore and released them. >>>>>>>>> that's why I see the 2nd message.
>>>>>>>>> what I didn't figure out is why does this issue only happen to >>>>>>>>> slave instance? does C# driver use monogdb to store it's current connection >>>>>>>>> pool size? because this way it can never write the qty to slave instance, >>>>>>>>> and I guess that's why it kept creating new connections because it can't >>>>>>>>> get how many connections are already created.
>>>>>>>>> thank you for reading my long post. really hope someone can't help >>>>>>>>> me figure out a solution.
>>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "mongodb-user" group.
>>>>>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>> mongodb-user...@**googlegroups.**c**om
>>>>>>>>> See also the IRC channel -- freenode.net#mongodb
>>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "mongodb-user" group.
>>>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> mongodb-user...@**googlegroups.**com
>>>>>>> See also the IRC channel -- freenode.net#mongodb
>>>>>> -- >>>>> You received this message because you are subscribed to the Google
>>>>> Groups "mongodb-user" group.
>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>> To unsubscribe from this group, send email to
>>>>> mongodb-user...@**googlegroups.com
>>>>> See also the IRC channel -- freenode.net#mongodb
>>>> -- >>> You received this message because you are subscribed to the Google
>>> Groups "mongodb-user" group.
>>> To post to this group, send email to mongod...@googlegroups.com
>>> To unsubscribe from this group, send email to
>>> mongodb-user...@googlegroups.com
>>> See also the IRC channel -- freenode.net#mongodb
The driver pings each server once every 10 seconds to check whether it is
still up and what state it is in. It uses a new connection each time for
this.
On Mon, Oct 1, 2012 at 10:41 AM, 张耀星 <yaoxing.zh...@gmail.com> wrote:
> we've upgraded the driver to latest version, and was running online for
> several hours. it seems there're almost no exceptions thrown anymore. and
> server stress is kept in a low level until now.
> It's achieving traffic peak in 4 hours, we'll see if the driver resolves
> the issue completely.
> there's one more thing which I don't know whether it's related to the
> driver. Now I see a lot of log like this kind:
> Mon Oct 1 09:17:20 [initandlisten] connection accepted from
> 10.xx.xx.xx:57566 #1557
> Mon Oct 1 09:17:20 [conn1557] end connection 10.xx.xx.xx:57566
> It seems like a connection is created and release in a very short time.
> I'm not sure if it's an expected behavior. for me it seems more like
> someone didn't use the driver in a correct way. maybe called the Disconnect
> or something else. what do you think?
> 在 2012年9月30日星期日UTC+8下午7时12分27秒,张耀星写道:
>> great, then it looks like upgrade the driver would resolve everything.
>> all I have to do is to get my team upgrade the driver as soon as possible.
>> thanks a lot for you help.
>>> Yes, if the server is closing sockets the client would get this error.
>>> On Sat, Sep 29, 2012 at 9:55 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>> One more thing, I also find a lot of this exception in our log:
>>>> *Unable to read data from the transport connection: A connection
>>>> attempt failed because the connected part did not properly respond after a
>>>> period of time, or established connection failed because connected host as
>>>> failed to respond*
>>>> Do you think it's also caused by the same bug you mentioned above? to
>>>> me it smells like when the connection storm happens, the server is too busy
>>>> to respond.
>>>>> When you say master/slave I assume you mean a replica set?
>>>>> A connection to a replica set lists the members of the replica set on
>>>>> the connection string:
>>>>> mongodb://host1,host2,host3/?**s**afe=true
>>>>> When you connect to a replica set the driver knows about all the
>>>>> members and routes queries to the primary (unless slaveOk is true).
>>>>> A direct connection to just one member of the replica set would have
>>>>> just that one host on the connection string:
>>>>> mongodb://host2/?safe=true
>>>>> A direct connection doesn't know about the other members of the
>>>>> replica set so all queries (slaveOk or not) would be routed to this one
>>>>> member.
>>>>> You would create a new MongoServerInstance for each connection string
>>>>> you use.
>>>>> Keep in mind though that any one of the hosts could be the primary, so
>>>>> host2 could be either a primary or a secondary.
>>>>> Once again though, if the problem is CSHARP-302 (fixed over a year
>>>>> ago) then your only solution will be to upgrade to a newer version of the
>>>>> driver.
>>>>> On Sat, Sep 29, 2012 at 1:15 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>> yea I totally understand it's hard to suggest. just need to make it
>>>>>> work before the new driver passes the test.
>>>>>> and sorry, I don't quite get your suggestion. what do you mean a
>>>>>> "separate direct connection"? since my site is now unstable anyway I'm
>>>>>> willing give it a shot.
>>>>>>> It's hard to suggest workarounds for a version of the driver that is
>>>>>>> over a year old.
>>>>>>> One thing you could try is to open a separate direct connection to
>>>>>>> the secondaries for queries that you want to send to the secondaries.
>>>>>>> That may or may not solve this issue though, since CSHARP-302 was
>>>>>>> more about how connections get closed (and how they built up when they
>>>>>>> weren't being closed fast enough) when errors occur than about whether
>>>>>>> queries are being sent to secondaries.
>>>>>>> On Sat, Sep 29, 2012 at 12:36 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>>> Thanks for the tip. We did consider using a new driver, but it
>>>>>>>> seems there are too many incompatible changes done. Still need some time to
>>>>>>>> review our code before we can use it.
>>>>>>>> Is there any other work around? I really need to use the slave
>>>>>>>> instance to reduce master presure in a short time.
>>>>>>>>> Can you try a newer version of the driver?
>>>>>>>>> On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>>>>> hello everyone,
>>>>>>>>>> while using a master/slave cluster to serving my web application,
>>>>>>>>>> I get the error "can't create thread. closing connection". I dug a lot but
>>>>>>>>>> didn't find any solution. anyone who ran into the same issue could you
>>>>>>>>>> please shed me some light?
>>>>>>>>>> here's the detail of my issue:
>>>>>>>>>> I'm using C# driver v1.1.0.4184 to connect to a slave instance
>>>>>>>>>> which is running mongodb 2.0.7 (the master instance runs 2.0.2), with
>>>>>>>>>> option maxPoolSize=300;slaveOk=true.
>>>>>>>>>> when the web server is online, I can see from the log that the
>>>>>>>>>> connection qty increases very rapidly. and when it reaches max user process
>>>>>>>>>> limit (ulimit -u = 1024), I begin to see the error:
>>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] connection accepted from
>>>>>>>>>> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
>>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed:
>>>>>>>>>> errno:11 Resource temporarily unavailable
>>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] can't create new thread,
>>>>>>>>>> closing connection
>>>>>>>>>> after some internet searching, I decided to increase "ulimit -u"
>>>>>>>>>> to 20480. I do noticed another message from the log. I can't find the exact
>>>>>>>>>> log now, but it says something like mongodb is releasing unused
>>>>>>>>>> connections. when I see this message, the connection qty decreases very
>>>>>>>>>> quickly. however, new connections are created even more quickly. so after a
>>>>>>>>>> longer period, I end up got the same error again.
>>>>>>>>>> then I tried to stop master/slave replication, and make both
>>>>>>>>>> servers master. this time everything works fine, connection qty is always
>>>>>>>>>> kept less than 250.
>>>>>>>>>> here's what I think:
>>>>>>>>>> assuming from the log, I think it's the C# driver who didn't know
>>>>>>>>>> it has created enough connections in the connection pool, instead, it kept
>>>>>>>>>> creating new ones. thus the old ones are never used again, after a period,
>>>>>>>>>> mongodb thinks the old connections are not used anymore and released them.
>>>>>>>>>> that's why I see the 2nd message.
>>>>>>>>>> what I didn't figure out is why does this issue only happen to
>>>>>>>>>> slave instance? does C# driver use monogdb to store it's current connection
>>>>>>>>>> pool size? because this way it can never write the qty to slave instance,
>>>>>>>>>> and I guess that's why it kept creating new connections because it can't
>>>>>>>>>> get how many connections are already created.
>>>>>>>>>> thank you for reading my long post. really hope someone can't
>>>>>>>>>> help me figure out a solution.
>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>>> Groups "mongodb-user" group.
>>>>>>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>>> mongodb-user...@**googlegroups.**c****om
>>>>>>>>>> See also the IRC channel -- freenode.net#mongodb
>>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "mongodb-user" group.
>>>>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>> mongodb-user...@**googlegroups.**c**om
>>>>>>>> See also the IRC channel -- freenode.net#mongodb
>>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "mongodb-user" group.
>>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>>> To unsubscribe from this group, send email to
>>>>>> mongodb-user...@**googlegroups.**com
>>>>>> See also the IRC channel -- freenode.net#mongodb
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "mongodb-user" group.
>>>> To post to this group, send email to mongod...@googlegroups.com
>>>> To unsubscribe from this group, send email to
>>>> mongodb-user...@googlegroups.**com
>>>> See also the IRC channel -- freenode.net#mongodb
>>> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@googlegroups.com
> To unsubscribe from this group, send email to
yes you're right, it's every 10s:
Mon Oct 1 14:13:36 [initandlisten] connection accepted from 10.80.xx.xx:61287 #483
Mon Oct 1 14:13:36 [conn483] end connection 10.80.xx.xx:61287
Mon Oct 1 14:13:43 [initandlisten] connection accepted from 10.4.xx.xx:55519 #484
Mon Oct 1 14:13:43 [conn484] end connection 10.4.xx.xx:55519
Mon Oct 1 14:13:47 [initandlisten] connection accepted from 10.80.xx.xx:61296 #485
Mon Oct 1 14:13:47 [conn485] end connection 10.80.xx.xx:61296
Mon Oct 1 14:13:53 [initandlisten] connection accepted from 10.4.xx.xx:56289 #486
Mon Oct 1 14:13:53 [conn486] end connection 10.4.xx.xx:56289
Mon Oct 1 14:13:56 [initandlisten] connection accepted from 10.80.xx.xx:61315 #487
Mon Oct 1 14:13:56 [conn487] end connection 10.80.xx.xx:61315
But bad news is the connection problem happened again. slightly different from last time though. now we got a lot of the following error before the server goes down.
*Unable to connect to server 10.51.xx.xx:27017: A connection attemp failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.51.xx.xx:27017*
I'm a bit confused, usually this exception means a private network issue. but after restarting app pool everything recovered. Do you think there's anything else that can cause this issue? By the way, I didn't see the "an existing connection was forcibly closed by the remote host mongodb" exception this time. and both MongoDB and IIS are working alright, CPU pressure is low, too.
It only happens one time after upgrading the driver, much less frequently than before. So I guess it may be a different issue.
> The driver pings each server once every 10 seconds to check whether it is > still up and what state it is in. It uses a new connection each time for > this.
> On Mon, Oct 1, 2012 at 10:41 AM, 张耀星 <yaoxin...@gmail.com <javascript:>>wrote:
>> we've upgraded the driver to latest version, and was running online for >> several hours. it seems there're almost no exceptions thrown anymore. and >> server stress is kept in a low level until now.
>> It's achieving traffic peak in 4 hours, we'll see if the driver resolves >> the issue completely.
>> there's one more thing which I don't know whether it's related to the >> driver. Now I see a lot of log like this kind:
>> Mon Oct 1 09:17:20 [initandlisten] connection accepted from >> 10.xx.xx.xx:57566 #1557
>> Mon Oct 1 09:17:20 [conn1557] end connection 10.xx.xx.xx:57566
>> It seems like a connection is created and release in a very short time. >> I'm not sure if it's an expected behavior. for me it seems more like >> someone didn't use the driver in a correct way. maybe called the Disconnect >> or something else. what do you think?
>> 在 2012年9月30日星期日UTC+8下午7时12分27秒,张耀星写道:
>>> great, then it looks like upgrade the driver would resolve everything. >>> all I have to do is to get my team upgrade the driver as soon as possible.
>>> thanks a lot for you help.
>>>> Yes, if the server is closing sockets the client would get this error.
>>>> On Sat, Sep 29, 2012 at 9:55 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>> One more thing, I also find a lot of this exception in our log:
>>>>> *Unable to read data from the transport connection: A connection >>>>> attempt failed because the connected part did not properly respond after a >>>>> period of time, or established connection failed because connected host as >>>>> failed to respond*
>>>>> Do you think it's also caused by the same bug you mentioned above? to >>>>> me it smells like when the connection storm happens, the server is too busy >>>>> to respond.
>>>>>> When you say master/slave I assume you mean a replica set?
>>>>>> A connection to a replica set lists the members of the replica set on >>>>>> the connection string:
>>>>>> mongodb://host1,host2,host3/?**s**afe=true
>>>>>> When you connect to a replica set the driver knows about all the >>>>>> members and routes queries to the primary (unless slaveOk is true).
>>>>>> A direct connection to just one member of the replica set would have >>>>>> just that one host on the connection string:
>>>>>> mongodb://host2/?safe=true
>>>>>> A direct connection doesn't know about the other members of the >>>>>> replica set so all queries (slaveOk or not) would be routed to this one >>>>>> member.
>>>>>> You would create a new MongoServerInstance for each connection string >>>>>> you use.
>>>>>> Keep in mind though that any one of the hosts could be the primary, >>>>>> so host2 could be either a primary or a secondary.
>>>>>> Once again though, if the problem is CSHARP-302 (fixed over a year >>>>>> ago) then your only solution will be to upgrade to a newer version of the >>>>>> driver.
>>>>>> On Sat, Sep 29, 2012 at 1:15 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>> yea I totally understand it's hard to suggest. just need to make it >>>>>>> work before the new driver passes the test.
>>>>>>> and sorry, I don't quite get your suggestion. what do you mean a >>>>>>> "separate direct connection"? since my site is now unstable anyway I'm >>>>>>> willing give it a shot.
>>>>>>>> It's hard to suggest workarounds for a version of the driver that >>>>>>>> is over a year old.
>>>>>>>> One thing you could try is to open a separate direct connection to >>>>>>>> the secondaries for queries that you want to send to the secondaries.
>>>>>>>> That may or may not solve this issue though, since CSHARP-302 was >>>>>>>> more about how connections get closed (and how they built up when they >>>>>>>> weren't being closed fast enough) when errors occur than about whether >>>>>>>> queries are being sent to secondaries.
>>>>>>>> On Sat, Sep 29, 2012 at 12:36 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>>>> Thanks for the tip. We did consider using a new driver, but it >>>>>>>>> seems there are too many incompatible changes done. Still need some time to >>>>>>>>> review our code before we can use it.
>>>>>>>>> Is there any other work around? I really need to use the slave >>>>>>>>> instance to reduce master presure in a short time.
>>>>>>>>>> Can you try a newer version of the driver?
>>>>>>>>>> On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>>>>>> hello everyone,
>>>>>>>>>>> while using a master/slave cluster to serving my web >>>>>>>>>>> application, I get the error "can't create thread. closing connection". I >>>>>>>>>>> dug a lot but didn't find any solution. anyone who ran into the same issue >>>>>>>>>>> could you please shed me some light?
>>>>>>>>>>> here's the detail of my issue:
>>>>>>>>>>> I'm using C# driver v1.1.0.4184 to connect to a slave instance >>>>>>>>>>> which is running mongodb 2.0.7 (the master instance runs 2.0.2), with >>>>>>>>>>> option maxPoolSize=300;slaveOk=true.
>>>>>>>>>>> when the web server is online, I can see from the log that the >>>>>>>>>>> connection qty increases very rapidly. and when it reaches max user process >>>>>>>>>>> limit (ulimit -u = 1024), I begin to see the error:
>>>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] connection accepted from >>>>>>>>>>> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
>>>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: >>>>>>>>>>> errno:11 Resource temporarily unavailable
>>>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] can't create new thread, >>>>>>>>>>> closing connection
>>>>>>>>>>> after some internet searching, I decided to increase "ulimit -u" >>>>>>>>>>> to 20480. I do noticed another message from the log. I can't find the exact >>>>>>>>>>> log now, but it says something like mongodb is releasing unused >>>>>>>>>>> connections. when I see this message, the connection qty decreases very >>>>>>>>>>> quickly. however, new connections are created even more quickly. so after a >>>>>>>>>>> longer period, I end up got the same error again.
>>>>>>>>>>> then I tried to stop master/slave replication, and make both >>>>>>>>>>> servers master. this time everything works fine, connection qty is always >>>>>>>>>>> kept less than 250.
>>>>>>>>>>> here's what I think:
>>>>>>>>>>> assuming from the log, I think it's the C# driver who didn't >>>>>>>>>>> know it has created enough connections in the connection pool, instead, it >>>>>>>>>>> kept creating new ones. thus the old ones are never used again, after a >>>>>>>>>>> period, mongodb thinks the old connections are not used anymore and >>>>>>>>>>> released them. that's why I see the 2nd message.
>>>>>>>>>>> what I didn't figure out is why does this issue only happen to >>>>>>>>>>> slave instance? does C# driver use monogdb to store it's current connection >>>>>>>>>>> pool size? because this way it can never write the qty to slave instance, >>>>>>>>>>> and I
Can you be very specific about your server setup and what your connection strings look like? I'd like a complete picture of this instead of trying to put it together from 10 different messages.
On Monday, October 1, 2012 2:32:49 PM UTC-5, 张耀星 wrote:
> yes you're right, it's every 10s:
> Mon Oct 1 14:13:36 [initandlisten] connection accepted from > 10.80.xx.xx:61287 #483
> Mon Oct 1 14:13:36 [conn483] end connection 10.80.xx.xx:61287
> Mon Oct 1 14:13:43 [initandlisten] connection accepted from > 10.4.xx.xx:55519 #484
> Mon Oct 1 14:13:43 [conn484] end connection 10.4.xx.xx:55519
> Mon Oct 1 14:13:47 [initandlisten] connection accepted from > 10.80.xx.xx:61296 #485
> Mon Oct 1 14:13:47 [conn485] end connection 10.80.xx.xx:61296
> Mon Oct 1 14:13:53 [initandlisten] connection accepted from > 10.4.xx.xx:56289 #486
> Mon Oct 1 14:13:53 [conn486] end connection 10.4.xx.xx:56289
> Mon Oct 1 14:13:56 [initandlisten] connection accepted from > 10.80.xx.xx:61315 #487
> Mon Oct 1 14:13:56 [conn487] end connection 10.80.xx.xx:61315
> But bad news is the connection problem happened again. slightly different > from last time though. now we got a lot of the following error before the > server goes down.
> *Unable to connect to server 10.51.xx.xx:27017: A connection attemp > failed because the connected party did not properly respond after a period > of time, or established connection failed because connected host has failed > to respond 10.51.xx.xx:27017*
> I'm a bit confused, usually this exception means a private network issue. > but after restarting app pool everything recovered. Do you think there's > anything else that can cause this issue? By the way, I didn't see the "an > existing connection was forcibly closed by the remote host mongodb" > exception this time. and both MongoDB and IIS are working alright, > CPU pressure is low, too.
> It only happens one time after upgrading the driver, much less frequently > than before. So I guess it may be a different issue.
> 在 2012年10月1日星期一UTC+8下午10时45分05秒,Robert Stam写道:
>> Are you seeing these once every 10 seconds?
>> The driver pings each server once every 10 seconds to check whether it is >> still up and what state it is in. It uses a new connection each time for >> this.
>> On Mon, Oct 1, 2012 at 10:41 AM, 张耀星 <yaoxin...@gmail.com> wrote:
>>> we've upgraded the driver to latest version, and was running online for >>> several hours. it seems there're almost no exceptions thrown anymore. and >>> server stress is kept in a low level until now.
>>> It's achieving traffic peak in 4 hours, we'll see if the driver resolves >>> the issue completely.
>>> there's one more thing which I don't know whether it's related to the >>> driver. Now I see a lot of log like this kind:
>>> Mon Oct 1 09:17:20 [initandlisten] connection accepted from >>> 10.xx.xx.xx:57566 #1557
>>> Mon Oct 1 09:17:20 [conn1557] end connection 10.xx.xx.xx:57566
>>> It seems like a connection is created and release in a very short time. >>> I'm not sure if it's an expected behavior. for me it seems more like >>> someone didn't use the driver in a correct way. maybe called the Disconnect >>> or something else. what do you think?
>>> 在 2012年9月30日星期日UTC+8下午7时12分27秒,张耀星写道:
>>>> great, then it looks like upgrade the driver would resolve everything. >>>> all I have to do is to get my team upgrade the driver as soon as possible.
>>>> thanks a lot for you help.
>>>>> Yes, if the server is closing sockets the client would get this error.
>>>>> On Sat, Sep 29, 2012 at 9:55 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>> One more thing, I also find a lot of this exception in our log:
>>>>>> *Unable to read data from the transport connection: A connection >>>>>> attempt failed because the connected part did not properly respond after a >>>>>> period of time, or established connection failed because connected host as >>>>>> failed to respond*
>>>>>> Do you think it's also caused by the same bug you mentioned above? to >>>>>> me it smells like when the connection storm happens, the server is too busy >>>>>> to respond.
>>>>>>> When you connect to a replica set the driver knows about all the >>>>>>> members and routes queries to the primary (unless slaveOk is true).
>>>>>>> A direct connection to just one member of the replica set would have >>>>>>> just that one host on the connection string:
>>>>>>> mongodb://host2/?safe=true
>>>>>>> A direct connection doesn't know about the other members of the >>>>>>> replica set so all queries (slaveOk or not) would be routed to this one >>>>>>> member.
>>>>>>> You would create a new MongoServerInstance for each connection >>>>>>> string you use.
>>>>>>> Keep in mind though that any one of the hosts could be the primary, >>>>>>> so host2 could be either a primary or a secondary.
>>>>>>> Once again though, if the problem is CSHARP-302 (fixed over a year >>>>>>> ago) then your only solution will be to upgrade to a newer version of the >>>>>>> driver.
>>>>>>> On Sat, Sep 29, 2012 at 1:15 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>>> yea I totally understand it's hard to suggest. just need to make it >>>>>>>> work before the new driver passes the test.
>>>>>>>> and sorry, I don't quite get your suggestion. what do you mean a >>>>>>>> "separate direct connection"? since my site is now unstable anyway I'm >>>>>>>> willing give it a shot.
>>>>>>>>> It's hard to suggest workarounds for a version of the driver that >>>>>>>>> is over a year old.
>>>>>>>>> One thing you could try is to open a separate direct connection to >>>>>>>>> the secondaries for queries that you want to send to the secondaries.
>>>>>>>>> That may or may not solve this issue though, since CSHARP-302 was >>>>>>>>> more about how connections get closed (and how they built up when they >>>>>>>>> weren't being closed fast enough) when errors occur than about whether >>>>>>>>> queries are being sent to secondaries.
>>>>>>>>> On Sat, Sep 29, 2012 at 12:36 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>>>>> Thanks for the tip. We did consider using a new driver, but it >>>>>>>>>> seems there are too many incompatible changes done. Still need some time to >>>>>>>>>> review our code before we can use it.
>>>>>>>>>> Is there any other work around? I really need to use the slave >>>>>>>>>> instance to reduce master presure in a short time.
>>>>>>>>>>> Can you try a newer version of the driver?
>>>>>>>>>>> On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxin...@gmail.com>wrote:
>>>>>>>>>>>> hello everyone,
>>>>>>>>>>>> while using a master/slave cluster to serving my web >>>>>>>>>>>> application, I get the error "can't create thread. closing connection". I >>>>>>>>>>>> dug a lot but didn't find any solution. anyone who ran into the same issue >>>>>>>>>>>> could you please shed me some light?
>>>>>>>>>>>> here's the detail of my issue:
>>>>>>>>>>>> I'm using C# driver v1.1.0.4184 to connect to a slave instance >>>>>>>>>>>> which is running mongodb 2.0.7 (the master instance runs 2.0.2), with >>>>>>>>>>>> option maxPoolSize=300;slaveOk=true.
>>>>>>>>>>>> when the web server is online, I can see from the log that the >>>>>>>>>>>> connection qty increases very rapidly. and when it reaches max user process >>>>>>>>>>>> limit (ulimit -u = 1024), I begin to see the error:
>>>>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] connection accepted from >>>>>>>>>>>> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
>>>>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: >>>>>>>>>>>> errno:11 Resource temporarily unavailable
>>>>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] can't create new thread, >>>>>>>>>>>> closing connection
>>>>>>>>>>>> after some internet searching, I decided to increase "ulimit >>>>>>>>>>>> -u" to 20480. I do noticed another message from the log. I can't find the >>>>>>>>>>>> exact log now, but it says something like mongodb is releasing unused >>>>>>>>>>>> connections. when I see this message, the connection qty decreases very >>>>>>>>>>>> quickly. however, new connections are created even more quickly. so after a >>>>>>>>>>>> longer period, I end up got the same error again.
>>>>>>>>>>>> then I tried to stop master/slave replication, and make both >>>>>>>>>>>> servers master. this time everything works fine, connection qty is always >>>>>>>>>>>> kept less than 250.
>>>>>>>>>>>> here's what I think:
>>>>>>>>>>>> assuming from the log, I think it's the C# driver who didn't >>>>>>>>>>>> know it has created enough connections in the connection pool, instead, it >>>>>>>>>>>> kept
sure. let me try to put everything together.
our site was recently migrated from SQLServer/IIS application. new site is running on MongoDB/IIS. not until several days ago we had all the traffic switched to the new site. before that it was 50%, about 4 million PV per day. It was doing good. Mongo was running on a 8 cores/12G RAM server, 2 IIS are connected to it.
before switching, to make everything goes as expected, I add 2 more IIS and a new MongoDB which runs on a 16 cores/12G RAM server. I was planning to have a Master/Slave cluster to serve the 4 web servers. The master node runs MongoDB 2.0.2, and the slave node 2.0.7. Both of them are installed from official source by yum install. didn't do any special configurations.
since the slave has a better hardware, I want the original server to handle write only, and has all the IIS connected to slave. connection string was simply a basic one plus "maxPoolSize=300;slaveOk=true". however I get the error described in my first post. the driver kept creating new connections without releasing them.when the connection qty reaches ulimit -u, the server is down. there are a lot of error message:
*Fri Sep 28 06:37:21 [initandlisten] connection accepted from xx.xx.xx.xx:64034 #1073 (1014 connections now open)*
*Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable*
*Fri Sep 28 06:37:21 [initandlisten] can't create new thread, closing connection*
and from IIS side there are two kinds of exceptions:
*an existing connection was forcibly closed by the remote host mongodb*
because the server is was planned to be online soon, I had noway but set both MongoDB to master mode. data was synced by using a windows service created before. each was connected by 2 IIS. generally it works, but not very good. I had to restart both IIS and MongoDB every several hours, especially during peak hours. otherwise CPU of both IIS and MongoDB goes up to 100% randomly. can't find anything abnormal in MongoDB logs but there are a lot of exceptions thrown in IIS side:
*Unable to read data from the transport connection: A connection attempt failed because the connected part did not properly respond after a period of time, or established connection failed because connected host as failed to respond*
*Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
*
I was thinking it was the high pressure that makes the server unable to respond. so follow Robert's suggestion, we upgraded C# driver from 1.1 to 1.6, to stop connection storm. he was right, the main issue was resolved, MongoDB CPU never goes up to 100% anymore (even during the time IIS was 100%).
Now the 2nd exception disappeared. most of the time there's no exceptions at all. but during last night around 3:00AM GMT+8, IIS connected to the new MongoDB were both down (IIS connected to the old MongoDB was alright). didn't find anything valuable in MongoDB log. but in IIS we got a lot of:
*Unable to read data from the transport connection: A connection attempt failed because the connected part did not properly respond after a period of time, or established connection failed because connected host as failed to respond*
by that time, both IIS had a very high CPU usage (while in MongoDB the pressure is very low). so I restarted app pool, and it recovered at once. so that's all. It's the first night after upgrading to new driver. I haven't found anything else yet. and I didn't try to make them master/slave again. maybe several days later I'll try the master/slave again since it's holiday here now and nobody's in the office. what do you think? anything else I need to provide?
> Can you be very specific about your server setup and what your connection > strings look like? I'd like a complete picture of this instead of trying > to put it together from 10 different messages.
> Thanks :)
> On Monday, October 1, 2012 2:32:49 PM UTC-5, 张耀星 wrote:
>> yes you're right, it's every 10s:
>> Mon Oct 1 14:13:36 [initandlisten] connection accepted from >> 10.80.xx.xx:61287 #483
>> Mon Oct 1 14:13:36 [conn483] end connection 10.80.xx.xx:61287
>> Mon Oct 1 14:13:43 [initandlisten] connection accepted from >> 10.4.xx.xx:55519 #484
>> Mon Oct 1 14:13:43 [conn484] end connection 10.4.xx.xx:55519
>> Mon Oct 1 14:13:47 [initandlisten] connection accepted from >> 10.80.xx.xx:61296 #485
>> Mon Oct 1 14:13:47 [conn485] end connection 10.80.xx.xx:61296
>> Mon Oct 1 14:13:53 [initandlisten] connection accepted from >> 10.4.xx.xx:56289 #486
>> Mon Oct 1 14:13:53 [conn486] end connection 10.4.xx.xx:56289
>> Mon Oct 1 14:13:56 [initandlisten] connection accepted from >> 10.80.xx.xx:61315 #487
>> Mon Oct 1 14:13:56 [conn487] end connection 10.80.xx.xx:61315
>> But bad news is the connection problem happened again. slightly different >> from last time though. now we got a lot of the following error before the >> server goes down.
>> *Unable to connect to server 10.51.xx.xx:27017: A connection attemp >> failed because the connected party did not properly respond after a period >> of time, or established connection failed because connected host has failed >> to respond 10.51.xx.xx:27017*
>> I'm a bit confused, usually this exception means a private network issue. >> but after restarting app pool everything recovered. Do you think there's >> anything else that can cause this issue? By the way, I didn't see the "an >> existing connection was forcibly closed by the remote host mongodb" >> exception this time. and both MongoDB and IIS are working alright, >> CPU pressure is low, too.
>> It only happens one time after upgrading the driver, much less frequently >> than before. So I guess it may be a different issue.
>> 在 2012年10月1日星期一UTC+8下午10时45分05秒,Robert Stam写道:
>>> Are you seeing these once every 10 seconds?
>>> The driver pings each server once every 10 seconds to check whether it >>> is still up and what state it is in. It uses a new connection each time for >>> this.
>>> On Mon, Oct 1, 2012 at 10:41 AM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>> we've upgraded the driver to latest version, and was running online for >>>> several hours. it seems there're almost no exceptions thrown anymore. and >>>> server stress is kept in a low level until now.
>>>> It's achieving traffic peak in 4 hours, we'll see if the driver >>>> resolves the issue completely.
>>>> there's one more thing which I don't know whether it's related to the >>>> driver. Now I see a lot of log like this kind:
>>>> Mon Oct 1 09:17:20 [initandlisten] connection accepted from >>>> 10.xx.xx.xx:57566 #1557
>>>> Mon Oct 1 09:17:20 [conn1557] end connection 10.xx.xx.xx:57566
>>>> It seems like a connection is created and release in a very short time. >>>> I'm not sure if it's an expected behavior. for me it seems more like >>>> someone didn't use the driver in a correct way. maybe called the Disconnect >>>> or something else. what do you think?
>>>> 在 2012年9月30日星期日UTC+8下午7时12分27秒,张耀星写道:
>>>>> great, then it looks like upgrade the driver would resolve everything. >>>>> all I have to do is to get my team upgrade the driver as soon as possible.
>>>>> thanks a lot for you help.
>>>>>> Yes, if the server is closing sockets the client would get this error.
>>>>>> On Sat, Sep 29, 2012 at 9:55 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>> One more thing, I also find a lot of this exception in our log:
>>>>>>> *Unable to read data from the transport connection: A connection >>>>>>> attempt failed because the connected part did not properly respond after a >>>>>>> period of time, or established connection failed because connected host as >>>>>>> failed to respond*
>>>>>>> Do you think it's also caused by the same bug you mentioned above? >>>>>>> to me it smells like when the connection storm happens, the server is too >>>>>>> busy to respond.
>>>>>>>> When you connect to a replica set the driver knows about all the >>>>>>>> members and routes queries to the primary (unless slaveOk is true).
>>>>>>>> A direct connection to just one member of the replica set would >>>>>>>> have just that one host on the connection string:
>>>>>>>> mongodb://host2/?safe=true
>>>>>>>> A direct connection doesn't know about the other members of the >>>>>>>> replica set so all queries (slaveOk or not) would be routed to this one >>>>>>>> member.
>>>>>>>> You would create a new MongoServerInstance for each connection >>>>>>>> string you use.
>>>>>>>> Keep in mind though that any one of the hosts could be the primary, >>>>>>>> so host2 could be either a primary or a secondary.
>>>>>>>> Once again though, if the problem is CSHARP-302 (fixed over a year >>>>>>>> ago) then your only solution will be to upgrade to a newer version of the >>>>>>>> driver.
>>>>>>>> On Sat, Sep 29, 2012 at 1:15 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>>>> yea I totally understand it's hard to suggest. just need to make >>>>>>>>> it work before the new driver
Sorry I made a mistake. the exception happened last night was not the one in my last post. it was:
*Unable to connect to server 10.xx.xx.xx:27017: A connection attempt failed because the connected part did not properly respond after a period of time, or established connection failed because connected host as failed to respond
*
They look so similar to each other.
> Can you be very specific about your server setup and what your connection > strings look like? I'd like a complete picture of this instead of trying > to put it together from 10 different messages.
> Thanks :)
> On Monday, October 1, 2012 2:32:49 PM UTC-5, 张耀星 wrote:
>> yes you're right, it's every 10s:
>> Mon Oct 1 14:13:36 [initandlisten] connection accepted from >> 10.80.xx.xx:61287 #483
>> Mon Oct 1 14:13:36 [conn483] end connection 10.80.xx.xx:61287
>> Mon Oct 1 14:13:43 [initandlisten] connection accepted from >> 10.4.xx.xx:55519 #484
>> Mon Oct 1 14:13:43 [conn484] end connection 10.4.xx.xx:55519
>> Mon Oct 1 14:13:47 [initandlisten] connection accepted from >> 10.80.xx.xx:61296 #485
>> Mon Oct 1 14:13:47 [conn485] end connection 10.80.xx.xx:61296
>> Mon Oct 1 14:13:53 [initandlisten] connection accepted from >> 10.4.xx.xx:56289 #486
>> Mon Oct 1 14:13:53 [conn486] end connection 10.4.xx.xx:56289
>> Mon Oct 1 14:13:56 [initandlisten] connection accepted from >> 10.80.xx.xx:61315 #487
>> Mon Oct 1 14:13:56 [conn487] end connection 10.80.xx.xx:61315
>> But bad news is the connection problem happened again. slightly different >> from last time though. now we got a lot of the following error before the >> server goes down.
>> *Unable to connect to server 10.51.xx.xx:27017: A connection attemp >> failed because the connected party did not properly respond after a period >> of time, or established connection failed because connected host has failed >> to respond 10.51.xx.xx:27017*
>> I'm a bit confused, usually this exception means a private network issue. >> but after restarting app pool everything recovered. Do you think there's >> anything else that can cause this issue? By the way, I didn't see the "an >> existing connection was forcibly closed by the remote host mongodb" >> exception this time. and both MongoDB and IIS are working alright, >> CPU pressure is low, too.
>> It only happens one time after upgrading the driver, much less frequently >> than before. So I guess it may be a different issue.
>> 在 2012年10月1日星期一UTC+8下午10时45分05秒,Robert Stam写道:
>>> Are you seeing these once every 10 seconds?
>>> The driver pings each server once every 10 seconds to check whether it >>> is still up and what state it is in. It uses a new connection each time for >>> this.
>>> On Mon, Oct 1, 2012 at 10:41 AM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>> we've upgraded the driver to latest version, and was running online for >>>> several hours. it seems there're almost no exceptions thrown anymore. and >>>> server stress is kept in a low level until now.
>>>> It's achieving traffic peak in 4 hours, we'll see if the driver >>>> resolves the issue completely.
>>>> there's one more thing which I don't know whether it's related to the >>>> driver. Now I see a lot of log like this kind:
>>>> Mon Oct 1 09:17:20 [initandlisten] connection accepted from >>>> 10.xx.xx.xx:57566 #1557
>>>> Mon Oct 1 09:17:20 [conn1557] end connection 10.xx.xx.xx:57566
>>>> It seems like a connection is created and release in a very short time. >>>> I'm not sure if it's an expected behavior. for me it seems more like >>>> someone didn't use the driver in a correct way. maybe called the Disconnect >>>> or something else. what do you think?
>>>> 在 2012年9月30日星期日UTC+8下午7时12分27秒,张耀星写道:
>>>>> great, then it looks like upgrade the driver would resolve everything. >>>>> all I have to do is to get my team upgrade the driver as soon as possible.
>>>>> thanks a lot for you help.
>>>>>> Yes, if the server is closing sockets the client would get this error.
>>>>>> On Sat, Sep 29, 2012 at 9:55 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>> One more thing, I also find a lot of this exception in our log:
>>>>>>> *Unable to read data from the transport connection: A connection >>>>>>> attempt failed because the connected part did not properly respond after a >>>>>>> period of time, or established connection failed because connected host as >>>>>>> failed to respond*
>>>>>>> Do you think it's also caused by the same bug you mentioned above? >>>>>>> to me it smells like when the connection storm happens, the server is too >>>>>>> busy to respond.
>>>>>>>> When you connect to a replica set the driver knows about all the >>>>>>>> members and routes queries to the primary (unless slaveOk is true).
>>>>>>>> A direct connection to just one member of the replica set would >>>>>>>> have just that one host on the connection string:
>>>>>>>> mongodb://host2/?safe=true
>>>>>>>> A direct connection doesn't know about the other members of the >>>>>>>> replica set so all queries (slaveOk or not) would be routed to this one >>>>>>>> member.
>>>>>>>> You would create a new MongoServerInstance for each connection >>>>>>>> string you use.
>>>>>>>> Keep in mind though that any one of the hosts could be the primary, >>>>>>>> so host2 could be either a primary or a secondary.
>>>>>>>> Once again though, if the problem is CSHARP-302 (fixed over a year >>>>>>>> ago) then your only solution will be to upgrade to a newer version of the >>>>>>>> driver.
>>>>>>>> On Sat, Sep 29, 2012 at 1:15 PM, 张耀星 <yaoxin...@gmail.com> wrote:
>>>>>>>>> yea I totally understand it's hard to suggest. just need to make >>>>>>>>> it work before the new driver passes the test.
>>>>>>>>> and sorry, I don't quite get your suggestion. what do you mean a >>>>>>>>> "separate direct connection"? since my site is now unstable anyway I'm >>>>>>>>> willing give it a shot.
>>>>>>>>>> It's hard to suggest workarounds for a version of the driver that >>>>>>>>>> is over a year old.
>>>>>>>>>> One thing you could try is to open a separate direct connection >>>>>>>>>> to the secondaries for queries that you want to send to the secondaries.
>>>>>>>>>> That may or may not solve this issue though, since CSHARP-302 was >>>>>>>>>> more about how connections get closed (and how they built up when they >>>>>>>>>> weren't being closed fast enough) when errors occur than about whether >>>>>>>>>> queries are being sent to secondaries.
>>>>>>>>>> On Sat, Sep 29, 2012 at 12:36 PM, 张耀星 <yaoxin...@gmail.com>wrote:
>>>>>>>>>>> Thanks for the tip. We did consider using a new driver, but it >>>>>>>>>>> seems there are too many incompatible changes done. Still need some time to >>>>>>>>>>> review our code before we can use it.
>>>>>>>>>>> Is there any other work around? I really need to use the slave >>>>>>>>>>> instance to reduce master presure in a short time.
>>>>>>>>>>>> Can you try a newer version of the driver?
>>>>>>>>>>>> On Fri, Sep 28, 2012 at 2:39 PM, 张耀星 <yaoxin...@gmail.com>wrote:
>>>>>>>>>>>>> hello everyone,
>>>>>>>>>>>>> while using a master/slave cluster to serving my web >>>>>>>>>>>>> application, I get the error "can't create thread. closing connection". I >>>>>>>>>>>>> dug a lot but didn't find any solution. anyone who ran into the same issue >>>>>>>>>>>>> could you please shed me some light?
>>>>>>>>>>>>> here's the detail of my issue:
>>>>>>>>>>>>> I'm using C# driver v1.1.0.4184 to connect to a slave instance >>>>>>>>>>>>> which is running mongodb 2.0.7 (the master instance runs 2.0.2), with >>>>>>>>>>>>> option maxPoolSize=300;slaveOk=true.
>>>>>>>>>>>>> when the web server is online, I can see from the log that the >>>>>>>>>>>>> connection qty increases very rapidly. and when it reaches max user process >>>>>>>>>>>>> limit (ulimit -u = 1024), I begin to see the error:
>>>>>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] connection accepted from >>>>>>>>>>>>> xx.xx.xx.xx:64034 #1073 (1014 connections now open)
>>>>>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] pthread_create failed: >>>>>>>>>>>>> errno:11 Resource temporarily unavailable
>>>>>>>>>>>>> Fri Sep 28 06:37:21 [initandlisten] can't create new thread, >>>>>>>>>>>>> closing connection
>>>>>>>>>>>>> after some internet searching, I decided to increase "ulimit >>>>>>>>>>>>> -u" to 20480. I do noticed another message from the log. I can't find the >>>>>>>>>>>>> exact log now, but it says something like mongodb is releasing unused
Thanks, this is very helpful. What I see is this: Everything is running well and then, at some point, you start getting TCP/IP error messages ("Unable to read data from..."). That is the .NET framework throwing the exception, not the MongoDB layer. The usual cause of this error is some type of network issue where the driver can't communicate with MongoDB anymore. It's possible we have a bug somewhere that, after some period of time, it stops behaving properly. This is the first we've heard of this and this scenario will be extremely hard to narrow down. Hence, I'd like to walk through some other possible issues as well.
1) Can you provide the connection strings you are using? 2) Can you provide some code for how you are a) setting up your app - are you using an IoC container? How are you creating your MongoServer instances? b) querying and writing to the database - How are you creating MongoDatabase and MongoCollection? c) terminating your "sessions" - are you calling Disconnect?
Sorry I don't have the code at hand right now. It's in my office.
But until now I it doesn't happen again, not even once. I'd like to watch one more night to see if it happens again. Maybe it is really just a network issue. We do got a lot of maintenance notice from IDC recently. Let's see how's it doing tonight. If it's still happening, tomorrow I'll go to the office to get the code.
Thanks a lot for your help.
> Thanks, this is very helpful. What I see is this: Everything is running > well and then, at some point, you start getting TCP/IP error messages > ("Unable to read data from..."). That is the .NET framework throwing the > exception, not the MongoDB layer. The usual cause of this error is some > type of network issue where the driver can't communicate with MongoDB > anymore. It's possible we have a bug somewhere that, after some period of > time, it stops behaving properly. This is the first we've heard of this > and this scenario will be extremely hard to narrow down. Hence, I'd like > to walk through some other possible issues as well.
> 1) Can you provide the connection strings you are using?
> 2) Can you provide some code for how you are > a) setting up your app - are you using an IoC container? How are you > creating your MongoServer instances?
> b) querying and writing to the database - How are you creating > MongoDatabase and MongoCollection?
> c) terminating your "sessions" - are you calling Disconnect?
bad news. last night the MongoDB CPU raise up to 100% again (but IIS still works fine, just slower). after a service restarting, it's recovered. any known issue with MongoDB 2.0.7?
here's the part I don't understand. now we have 2 mongo servers. 2 IIS connected to each. what makes me confused is that the old mongodb with weaker hardware has a higher CPU usage percentage, but is doing fine. I don't see any error log nor need to restart it. while the new server with better hardware which is supposed to act much better always causes problem. the only difference between them is the old one runs mongo 2.0.2, and the new one runs 2.0.7
another thing I noticed is that from the result of htop, I can see 90% of the CPU bar is red. I think it means it's occupied by linux kernel threads right? the disk on new server is a RAID10, does it mean anything wrong with the RAID disks?
> Sorry I don't have the code at hand right now. It's in my office.
> But until now I it doesn't happen again, not even once. I'd like to watch > one more night to see if it happens again. Maybe it is really just a > network issue. We do got a lot of maintenance notice from IDC recently. > Let's see how's it doing tonight. If it's still happening, tomorrow I'll go > to the office to get the code.
> Thanks a lot for your help.
> 在 2012年10月2日星期二UTC+8下午8时50分47秒,craiggwilson写道:
>> Thanks, this is very helpful. What I see is this: Everything is running >> well and then, at some point, you start getting TCP/IP error messages >> ("Unable to read data from..."). That is the .NET framework throwing the >> exception, not the MongoDB layer. The usual cause of this error is some >> type of network issue where the driver can't communicate with MongoDB >> anymore. It's possible we have a bug somewhere that, after some period of >> time, it stops behaving properly. This is the first we've heard of this >> and this scenario will be extremely hard to narrow down. Hence, I'd like >> to walk through some other possible issues as well.
>> 1) Can you provide the connection strings you are using?
>> 2) Can you provide some code for how you are >> a) setting up your app - are you using an IoC container? How are you >> creating your MongoServer instances?
>> b) querying and writing to the database - How are you creating >> MongoDatabase and MongoCollection?
>> c) terminating your "sessions" - are you calling Disconnect?
OK we finally find the reason. I'll skip the details and write down the cause in case some one met the same issue.
So there are generally 2 reasons caused our problem.
*The first one is what Robert mentioned above, our driver is too old that may lead to a connection storm.*
*The second issue has nothing to do with C# driver. It's a problem of Linux and MongoDB versions. *
Our engineer installed a wrong version of CentOS 6.3 (we asked for 6.0). When we run MongoDB 2.0.2/2.0.7 on CentOS 6.3, it caused a high CPU consumption. We can reproduce the issue by putting a high pressure on MongoDB, then CPU usage begin to raise very quickly, but most of it is occupied by Linux kernel processes. When CPU reaches a high level, it never reduce again even if we remove all pressure (or takes a very long time to reduce). Now we tried MongoDB 2.2.0 on CentOS 6.0, Everything works fine again. CPU consumption is much lower and almost no kernel process time spent.
*In conclusion, DON'T run MongoDB 2.0.x on CentOS 6.3.*
> bad news. last night the MongoDB CPU raise up to 100% again (but IIS still > works fine, just slower). after a service restarting, it's recovered. any > known issue with MongoDB 2.0.7?
> here's the part I don't understand. now we have 2 mongo servers. 2 IIS > connected to each. what makes me confused is that the old mongodb with > weaker hardware has a higher CPU usage percentage, but is doing fine. I > don't see any error log nor need to restart it. while the new server with > better hardware which is supposed to act much better always causes problem. > the only difference between them is the old one runs mongo 2.0.2, and the > new one runs 2.0.7
> another thing I noticed is that from the result of htop, I can see 90% of > the CPU bar is red. I think it means it's occupied by linux kernel threads > right? the disk on new server is a RAID10, does it mean anything wrong with > the RAID disks?
> 在 2012年10月3日星期三UTC+8上午1时25分31秒,张耀星写道:
>> Sorry I don't have the code at hand right now. It's in my office.
>> But until now I it doesn't happen again, not even once. I'd like to watch >> one more night to see if it happens again. Maybe it is really just a >> network issue. We do got a lot of maintenance notice from IDC recently. >> Let's see how's it doing tonight. If it's still happening, tomorrow I'll go >> to the office to get the code.
>> Thanks a lot for your help.
>> 在 2012年10月2日星期二UTC+8下午8时50分47秒,craiggwilson写道:
>>> Thanks, this is very helpful. What I see is this: Everything is >>> running well and then, at some point, you start getting TCP/IP error >>> messages ("Unable to read data from..."). That is the .NET framework >>> throwing the exception, not the MongoDB layer. The usual cause of this >>> error is some type of network issue where the driver can't communicate with >>> MongoDB anymore. It's possible we have a bug somewhere that, after some >>> period of time, it stops behaving properly. This is the first we've heard >>> of this and this scenario will be extremely hard to narrow down. Hence, >>> I'd like to walk through some other possible issues as well.
>>> 1) Can you provide the connection strings you are using?
>>> 2) Can you provide some code for how you are >>> a) setting up your app - are you using an IoC container? How are you >>> creating your MongoServer instances?
>>> b) querying and writing to the database - How are you creating >>> MongoDatabase and MongoCollection?
>>> c) terminating your "sessions" - are you calling Disconnect?