We've just upgraded an application to the latest weekly to get ready
for Go 1. Occasionally we get back the error "Closed explicitly",
originating from socket.go:178. I'm guessing this means we're trying
to use a session that's been closed already, but as far as I can see
that's not the case. Our code basically looks like this:
func foo() {
session := baseSession.Copy()
db := session.DB(dbName)
defer db.Session.Close()
var result Thing
err := db.C("collection").Find(bson.M{"field": "blah"}, &result)
// err contains "Closed explicitly" error
}
The only way this makes sense is if the base session has been closed
explicitly, but I'm about 99% sure that's not the case -- especially
since the error only happens sometimes. Am I misinterpreting this
error? Any ideas?
- Evan
> We've just upgraded an application to the latest weekly to get ready
> for Go 1. Occasionally we get back the error "Closed explicitly",
> originating from socket.go:178. I'm guessing this means we're trying
> to use a session that's been closed already, but as far as I can see
> that's not the case. Our code basically looks like this:
I really want to understand what is going on for you. As you can
imagine, there's a bug in one of both sides, and I'm keen on finding
out which side it is. Here are a few hints:
The logic in the latest weekly is indeed more aggressive in terms of
closing open sockets. For example, if a machine is explicitly removed
from a replica set, mgo will proactively close any open sockets to
prevent errors from happening due to logic that stays running with a
server that isn't updating. I'm keen on tweaking the algorithm further
if we find issues around that behavior.
The latest weekly synchronizes the cluster more often, and also when
in detects erratic behavior from the cluster, so if you're somehow
having servers that leave the replica set, that may cause such closed
connections.
Do those ring any bells? Have you correlated the logs of the replica
set with such disconnection events?
Please let me know if you have any further info.
> The only way this makes sense is if the base session has been closed
> explicitly, but I'm about 99% sure that's not the case -- especially
Closing the base session won't close a copy of it.
--
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/plus
http://niemeyer.net/twitter
http://niemeyer.net/blog
-- I'm not absolutely sure of anything.
No, nothing unusual was going on with Mongo when this was happening.
We only have one server right now anyway.
> Please let me know if you have any further info.
Another detail that might be relevant: when this was happening, we
were hitting our file descriptor limit because it was set fairly low.
I don't believe we're actually leaking sockets since the number of
open files was rising and falling. It's under control now and I'm not
seeing these "explicitly closed" errors anymore either. Maybe it's
related?
Thanks,
Evan
Ah, yes, that might cause issues indeed. This would amount to
arbitrary failures attempting to connect to the master during
synchronization of the topology, which might be seen by mgo as the
master being down.
I'll have a pass at the algorithm before the next release and try to
imagine better ways to handle such harsher circumstances.
Please let me know if you see anything else around this.
Another update: we've now started intermittently running out of file
descriptors at the higher limit. lsof showed that over half the open
files were connections to the database.
So again, I'm not sure sockets are being leaked, but at least a huge
number of them are open. I've reviewed our code and I can't find any
place we might be leaking sessions. Every Session.Copy() is followed
by either a deferred Session.Close() or the session being closed in a
different way.
- Evan
And the other half of that huge number comes from where?
> place we might be leaking sessions. Every Session.Copy() is followed
> by either a deferred Session.Close() or the session being closed in a
> different way.
That "or the session being closed in a different way" makes that
statement somewhat open-ended.
The possibility of bugs is always open, but it's still the case that
every single test in mgo ensures that the open sockets go down to zero
at the end of the run, so there's some good verification in place if
nothing else.
http://paste.ubuntu.com/880377/
If you run lsof against the sample, it should show the number of
connections to the database matching "Sockets Alive", and it must not
increase over time.
If you can't find the cause of your problem, I suggest trying to port
the necessary logic to reproduce the issue in a more contained example
such as this one.
- Evan
After holding down "reload for a few runs...":
mgo.Stats{MasterConns:34, SlaveConns:0, SentOps:4543, ReceivedOps:2322, ReceivedDocs:2322, SocketsAlive:31, SocketsInUse:1, SocketRefs:1} Insert 2224 err: <nil> Insert 2228 err: <nil> Insert 2225 err: <nil> Insert 2221 err: <nil> Insert 2226 err: <nil> Insert 2229 err: <nil> Insert 2227 err: <nil> Insert 2220 err: <nil> Insert 2223 err: <nil> Insert 2222 err: <nil>
--
You received this message because you are subscribed to the Google Groups "mgo-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mgo-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.