New issue 205 by wickedg...@gmail.com: Having a well-defined way to reset
the connection pool would be useful
http://code.google.com/p/couchdb-python/issues/detail?id=205
What steps will reproduce the problem?
1. connect to couchdb
2. fork() the python process
3. try to use the db in the child process
What is the expected output? What do you see instead?
The child gets exceptions like:
File "/usr/local/lib/python2.7/dist-packages/couchdb/client.py", line
1003, in rows
self._fetch()
File "/usr/local/lib/python2.7/dist-packages/couchdb/client.py", line
990, in _fetch
data = self.view._exec(self.options)
File "/usr/local/lib/python2.7/dist-packages/couchdb/client.py", line
880, in _exec
_, _, data = self.resource.get_json(**self._encode_options(options))
File "/usr/local/lib/python2.7/dist-packages/couchdb/http.py", line 394,
in get_json
if 'application/json' in headers.get('content-type'):
TypeError: argument of type 'NoneType' is not iterable
and
File "/usr/local/lib/python2.7/dist-packages/couchdb/client.py", line
1003, in rows
self._fetch()
File "/usr/local/lib/python2.7/dist-packages/couchdb/client.py", line
990, in _fetch
data = self.view._exec(self.options)
File "/usr/local/lib/python2.7/dist-packages/couchdb/client.py", line
878, in _exec
**self._encode_options(options))
File "/usr/local/lib/python2.7/dist-packages/couchdb/http.py", line 401,
in post_json
data = json.decode(data.read())
File "/usr/local/lib/python2.7/dist-packages/couchdb/http.py", line 94,
in read
bytes = self.resp.read(size)
File "/usr/lib/python2.7/httplib.py", line 541, in read
return self._read_chunked(amt)
File "/usr/lib/python2.7/httplib.py", line 586, in _read_chunked
raise IncompleteRead(''.join(value))
httplib.IncompleteRead: IncompleteRead(1258 bytes read)
What version of the product are you using? On what operating system?
v0.8 on Ubuntu, python 2.7
Please provide any additional information below.
I'm able to work around the problem by including code like the following
under 0.8:
with my_db.resource.session.lock:
my_db.resource.session.conns = {}
Trunk has a connection pool object, so it looks like that would now be:
with my_db.resource.session.connection_pool.lock:
my_db.resource.session.connection_pool.conns = {}
Having this encapsulated in a public API function would be useful for
us. :)
Thanks!
Could we make the connection pool thread-local, or something like that?
I.e. something that does the right thing without needing the user's
cooperation.
That would be great, if something like that would work. Threading.local
won't do the trick by itself, though:
Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import threading
>>> import os
>>> def f():
... l = threading.local()
... l.foo = 'bar'
... pid = os.fork()
... if pid == 0:
... print dir(l)
...
>>> f()
>>> ['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'foo']
Namespacing the connections by os.getpid() would probably work for the fork
case, though. Something along the lines of:
# Session
def __init__(self, ...):
...
self._connectionPool_dict = {}
...
@property
def connection_pool(self):
if os.getpid() not in self._connectionPool_dict:
self._connectionPool_dict.clear()
self._connectionPool_dict[os.getpid()] = ConnectionPool(...)
return self._connectionPool_dict[os.getpid()]
Not sure what the intended behavior is WRT threading, but that could be
added to the mix too if desired.
I have no objection to extending the connection pool - that's actually one
of the reasons I extracted the code to the ConnectionPool class. However,
isn't this just the common problem of forking a process with open file
handles? The solution is generally to fork first and open files later.
So, in this particular case I think it's best to fork the process and
create a new couchdb.Server or couchdb.Database instance in the child
process for it to use.
That approach presumes a lot about the application design. For my case,
the parts doing the forking and the parts talking to couchdb are pretty
loosely coupled, and broadcasting "hey, we just forked; reset all of your
connections please" wouldn't be very clean (as attested to by the hack that
I put in place to work around this issue for the time being). I can't not
connect to couchdb in the parent, because the information about what/when
to fork is contained there.
I was surprised by the current couchdb-python behavior because I don't
typically lump open files and http requests into the same stateful bucket -
keep-alives seem like an implementation detail, in this case. Not having
customers have to be aware of and account for that would be nice.