"Closed explicitly" error

Evan Shaw

unread,

Mar 9, 2012, 7:43:11 PM3/9/12

to mgo-...@googlegroups.com

Hi,

We've just upgraded an application to the latest weekly to get ready
for Go 1. Occasionally we get back the error "Closed explicitly",
originating from socket.go:178. I'm guessing this means we're trying
to use a session that's been closed already, but as far as I can see
that's not the case. Our code basically looks like this:

func foo() {
session := baseSession.Copy()
db := session.DB(dbName)
defer db.Session.Close()

var result Thing
err := db.C("collection").Find(bson.M{"field": "blah"}, &result)
// err contains "Closed explicitly" error
}

The only way this makes sense is if the base session has been closed
explicitly, but I'm about 99% sure that's not the case -- especially
since the error only happens sometimes. Am I misinterpreting this
error? Any ideas?

- Evan

Gustavo Niemeyer

unread,

Mar 9, 2012, 11:18:55 PM3/9/12

to mgo-...@googlegroups.com

Hi Evan,

> We've just upgraded an application to the latest weekly to get ready
> for Go 1. Occasionally we get back the error "Closed explicitly",
> originating from socket.go:178. I'm guessing this means we're trying
> to use a session that's been closed already, but as far as I can see
> that's not the case. Our code basically looks like this:

I really want to understand what is going on for you. As you can
imagine, there's a bug in one of both sides, and I'm keen on finding
out which side it is. Here are a few hints:

The logic in the latest weekly is indeed more aggressive in terms of
closing open sockets. For example, if a machine is explicitly removed
from a replica set, mgo will proactively close any open sockets to
prevent errors from happening due to logic that stays running with a
server that isn't updating. I'm keen on tweaking the algorithm further
if we find issues around that behavior.

The latest weekly synchronizes the cluster more often, and also when
in detects erratic behavior from the cluster, so if you're somehow
having servers that leave the replica set, that may cause such closed
connections.

Do those ring any bells? Have you correlated the logs of the replica
set with such disconnection events?

Please let me know if you have any further info.

> The only way this makes sense is if the base session has been closed
> explicitly, but I'm about 99% sure that's not the case -- especially

Closing the base session won't close a copy of it.

--
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/plus
http://niemeyer.net/twitter
http://niemeyer.net/blog

-- I'm not absolutely sure of anything.

Evan Shaw

unread,

Mar 9, 2012, 11:36:03 PM3/9/12

to mgo-...@googlegroups.com

On Sat, Mar 10, 2012 at 5:18 PM, Gustavo Niemeyer <gus...@niemeyer.net> wrote:
> Do those ring any bells? Have you correlated the logs of the replica
> set with such disconnection events?

No, nothing unusual was going on with Mongo when this was happening.
We only have one server right now anyway.

> Please let me know if you have any further info.

Another detail that might be relevant: when this was happening, we
were hitting our file descriptor limit because it was set fairly low.
I don't believe we're actually leaking sockets since the number of
open files was rising and falling. It's under control now and I'm not
seeing these "explicitly closed" errors anymore either. Maybe it's
related?

Thanks,
Evan

Gustavo Niemeyer

unread,

Mar 11, 2012, 2:51:13 PM3/11/12

to mgo-...@googlegroups.com

> Another detail that might be relevant: when this was happening, we
> were hitting our file descriptor limit because it was set fairly low.
> I don't believe we're actually leaking sockets since the number of
> open files was rising and falling. It's under control now and I'm not
> seeing these "explicitly closed" errors anymore either. Maybe it's
> related?

Ah, yes, that might cause issues indeed. This would amount to
arbitrary failures attempting to connect to the master during
synchronization of the topology, which might be seen by mgo as the
master being down.

I'll have a pass at the algorithm before the next release and try to
imagine better ways to handle such harsher circumstances.

Please let me know if you see anything else around this.

Evan Shaw

unread,

Mar 11, 2012, 3:11:50 PM3/11/12

to mgo-...@googlegroups.com

On Mon, Mar 12, 2012 at 7:51 AM, Gustavo Niemeyer <gus...@niemeyer.net> wrote:
> Ah, yes, that might cause issues indeed. This would amount to
> arbitrary failures attempting to connect to the master during
> synchronization of the topology, which might be seen by mgo as the
> master being down.
>
> I'll have a pass at the algorithm before the next release and try to
> imagine better ways to handle such harsher circumstances.
>
> Please let me know if you see anything else around this.

Another update: we've now started intermittently running out of file
descriptors at the higher limit. lsof showed that over half the open
files were connections to the database.

So again, I'm not sure sockets are being leaked, but at least a huge
number of them are open. I've reviewed our code and I can't find any
place we might be leaking sessions. Every Session.Copy() is followed
by either a deferred Session.Close() or the session being closed in a
different way.

- Evan

Gustavo Niemeyer

unread,

Mar 11, 2012, 5:08:56 PM3/11/12

to mgo-...@googlegroups.com

> Another update: we've now started intermittently running out of file
> descriptors at the higher limit. lsof showed that over half the open
> files were connections to the database.
>
> So again, I'm not sure sockets are being leaked, but at least a huge
> number of them are open. I've reviewed our code and I can't find any

And the other half of that huge number comes from where?

> place we might be leaking sessions. Every Session.Copy() is followed
> by either a deferred Session.Close() or the session being closed in a
> different way.

That "or the session being closed in a different way" makes that
statement somewhat open-ended.

The possibility of bugs is always open, but it's still the case that
every single test in mgo ensures that the open sockets go down to zero
at the end of the run, so there's some good verification in place if
nothing else.

Gustavo Niemeyer

unread,

Mar 12, 2012, 10:04:19 AM3/12/12

to mgo-...@googlegroups.com

Evan, have you tried to reproduce your problem using the same sample
program we discussed last time you were investigating failover
trouble:

http://paste.ubuntu.com/880377/

If you run lsof against the sample, it should show the number of
connections to the database matching "Sockets Alive", and it must not
increase over time.

If you can't find the cause of your problem, I suggest trying to port
the necessary logic to reproduce the issue in a more contained example
such as this one.

Evan Shaw

unread,

Mar 12, 2012, 3:04:41 PM3/12/12

to mgo-...@googlegroups.com

Yes, that's what I'm trying to do now. Thanks for your help.

- Evan

Andrew Bonventre

unread,

Oct 25, 2012, 12:40:49 AM10/25/12

to mgo-...@googlegroups.com

Gustavo, I'm having trouble understanding why the "Sockets Alive" parameter should stay up (with live socket connections) even after the copied sessions have been closed.

I tried running your test program (updated to use the latest imports): http://pastie.org/5112832

And while the live socket number does not increase over time, it never goes down either.

After holding down "reload for a few runs...":

mgo.Stats{MasterConns:34, SlaveConns:0, SentOps:4543, ReceivedOps:2322, ReceivedDocs:2322, SocketsAlive:31, SocketsInUse:1, SocketRefs:1}

Insert 2224 err: <nil>
Insert 2228 err: <nil>
Insert 2225 err: <nil>
Insert 2221 err: <nil>
Insert 2226 err: <nil>
Insert 2229 err: <nil>
Insert 2227 err: <nil>
Insert 2220 err: <nil>
Insert 2223 err: <nil>
Insert 2222 err: <nil>

lsof is confirming that these sockets are still open. Is this intended behavior?

Thanks,

Andy

Gustavo Niemeyer

unread,

Oct 25, 2012, 3:01:26 AM10/25/12

to mgo-...@googlegroups.com

On Thu, Oct 25, 2012 at 6:40 AM, Andrew Bonventre <andy...@gmail.com> wrote:
> Gustavo, I'm having trouble understanding why the "Sockets Alive" parameter
> should stay up (with live socket connections) even after the copied sessions
> have been closed.

Sockets stay alive because there's a connection pool shared by all
sessions. Next time a session is created, sockets will be reused if
they are available. That's the distinction between SocketsAlive and
SocketsInUse.

gustavo @ http://niemeyer.net

Andrew Bonventre

unread,

Oct 25, 2012, 9:38:59 AM10/25/12

to mgo-...@googlegroups.com

Forgive me if I'm missing something from the documentation, but is there a method for limiting how large the connection pool can get?

Thanks again for the awesome driver :)

Gustavo Niemeyer

unread,

Oct 28, 2012, 12:02:54 PM10/28/12

to mgo-...@googlegroups.com

On Thu, Oct 25, 2012 at 3:38 PM, Andrew Bonventre <andy...@gmail.com> wrote:
> Forgive me if I'm missing something from the documentation, but is there a
> method for limiting how large the connection pool can get?

There's no API to limit that yet. I've consciously avoided introducing
an interface to see how it would turn out in practice, and apparently
it's been working pretty well in all cases I've heard of. You only get
an increased number of connections if there's really parallel activity
which demands additional sockets, and in general if you want to limit
parallelism in your application, the database driver is not the right
place to do it.

There were also cases when the number of connections increased due to
session leaks (sessions being copied or cloned without being closed).

If you have comments on the topic, I'm certainly interested.

gustavo @ http://niemeyer.net

Andrew Bonventre

unread,

Oct 28, 2012, 12:24:12 PM10/28/12

to mgo-...@googlegroups.com

> There's no API to limit that yet. I've consciously avoided introducing
> an interface to see how it would turn out in practice, and apparently
> it's been working pretty well in all cases I've heard of. You only get
> an increased number of connections if there's really parallel activity
> which demands additional sockets, and in general if you want to limit
> parallelism in your application, the database driver is not the right
> place to do it.

Agreed. I have no interest in limiting parallelism.

This came up during a load test scenario where the amount of concurrent requests was unrealistically high for just one instance of the app server. It ended up being a cascading failure since the fd limit was maxed out, rpc connections were dropped, and clients could no longer connect at all.

While I wouldn't suggest a feature to mitigate this highly unlikely scenario, I'd like my app to ideally still serve rpc requests (albeit slowly) even under extreme load so it's under consideration to be implemented on my side (though, very low priority).

Any approaches would be welcome, but again, this is very low on my priority list as it can be dealt with in other ways (more instances serving rpc traffic, for example).

Thanks,
A

İnanç Gümüş

unread,

Jul 31, 2017, 4:56:44 PM7/31/17

to mgo-users, andy...@gmail.com

I've a simple app in my local machine (which file limits etc maxed out) which has Session.Copy and defer Session.Close. When I profile the app with 500reqs/sec for 5 secs mgo starts to receive timeout errors.

When I set `session.SetPoolLimit(4096)` to 50, errors fade away.

I don't know why this happens.

28 Ekim 2012 Pazar 19:24:12 UTC+3 tarihinde Andrew Bonventre yazdı:

Gustavo Niemeyer

unread,

Aug 3, 2017, 6:19:10 AM8/3/17

to mgo-...@googlegroups.com, Andrew Bonventre

Again, using SetPoolLimit that way is a bad idea. Please see the 5 other places you posted the same comment on to see why.

--
You received this message because you are subscribed to the Google Groups "mgo-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mgo-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.