Re: why redis-py claim itself thread safety? Is it real

1,858 views
Skip to first unread message

Andy McCurdy

unread,
Jul 5, 2012, 11:05:24 AM7/5/12
to redi...@googlegroups.com




On Jul 5, 2012, at 3:04 AM, Bo Wang <wang...@gmail.com> wrote:

I have read the source code of redis-py, it just maintains the pid inside the 'connection' and 'connection_pool'. If confiliting with the original pid-value when called, the redis-py retrys to re-establish the connection or clear the connection_pool.

The technique of storing the pid is actually for processes using Redis that later become forked, such as when using multiprocess. Forked processes copy any open file descriptors of the parent process. Which is obviously bad. 

Redis-py is thread safe because the connection pool only uses atomic methods when allocating/deallocating a connection.

I'm recently using Gevent which based on 'coroutine', and i tried to print the threadID. They are different! it's out of my except. I have imagined that all jobs were running in the same thread according to the concept of 'coroutine', and just cooperated through the Hub greenlet.
So if the redis-py is not real thread-safety, there will be a big problem to me. I even don't know how to assign a redis-py instance for each thread inside Gevent!


What you need to do is subclass the ConnectionPool to add a gevent pool. This is something I'd like to bake in as a convenience to others using gevent.


could anyone give a Explanation of the thread-safety of redis-py? is it real?

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/m19ltU-ZHjgJ.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Matthias Felsche

unread,
Jul 5, 2012, 12:31:14 PM7/5/12
to redi...@googlegroups.com, Bo Wang
Hello Bo,

First of all, with gevent, you do not need the same thread-safety as you
need with pre-emptive threads (the normal python-threads (pthreads, to
be correct)).
There will never be more than 1 greenlet running at the same time. You
just have to make sure that your greenlet is not switched (on I/O or
whatever) and leaves behind inconsistent state.
This is not the case for redis-py, at least as far as I can see.

If you use gevent and do gevent.monkey.patch_all() (or similar
patching-function) it will replace the thread-id-stuff with the
greenlet-id. All greenlets run in the same PROCESS, not Thread. They are
kind of lightweight threads and gevent patches the builtin
thread-library to return the id of the greenlet.

Redis-Py per default uses a Connection-Pool that does not block but
always creates new connections (connected client-sockets) until
max_connections is reached. Then it raises an exception, though this
should rarely occur (as always, it depends).

redis-py is 100% gevent-save, I think. Though its default
pool-implementation is not perfect for gevent.
The thing you would want is a blocking pool, where a greenlet that
cannot get a connection is scheduled away.
You could do it like this:

import gevent.queue

import redis

class GeventConnectionPool(redis.ConnectionPool):

"""

patch the standard pool to block when there is no connection available.

this schedules the current greenlet away so the next one can take over.

"""

def __init__(self, *args, **kwargs):

super(GeventConnectionPool, self).__init__(*args, **kwargs)

self._available_connections = gevent.queue.Queue(maxsize = self.max_connections)

def get_connection(self, command_name, *keys, **options):

self._checkpid()

if self._available_connections.empty():

if self._created_connections >= self.max_connections:

# cannot create anymore, wait for an available connection

conn = self._available_connections.get()

else:

# create a new connection

conn = self.connection_class(**self.connection_kwargs)

else:

conn = self._available_connections.get()

self._in_use_connections.add(conn)

return conn

def make_connection(self):

pass

def release(self, connection):

self._checkpid()

if connection.pid == self.pid:

self._in_use_connections.remove(connection)

self._available_connections.put(connection)

def disconnect(self):

for conn in self._in_use_connections:

conn.disconnect()

while not self._available_connections.empty():

self._available_connections.get().disconnect()


This kind of works for me. Maybe someone with more experience could
share his opinion on that.

Another thing: you mostly use gevent for massive scaling. In that case
you shouldn't assign a connection to each greenlet, but use one
redis.Redis-instance for all greenlets. It will use a connection-pool
and will dispatch your requests to different sockets if needed.

Did that help?

Am 05.07.12 11:46, schrieb Bo Wang:
> I have read the source code of redis-py, it just maintains the pid
> inside the 'connection' and 'connection_pool'. If confiliting with the
> original pid-value when called, the redis-py retrys to re-establish
> the connection or clear the connection_pool.
> I'm recently using Gevent which based on 'coroutine', and i tried to
> print the threadID. They are different! it's out of my except. I have
> imagined that all jobs were running in the same thread according to
> the concept of 'coroutine', and just cooperated through the Hub greenlet.
> So if the redis-py is not real thread-safety, there will be a big
> problem to me. I even don't know how to assign a redis-py instance for
> each thread inside Gevent!
>

Pau Freixes

unread,
Jul 5, 2012, 4:15:07 PM7/5/12
to redi...@googlegroups.com
Hi, sorry for my intromission but just one thing

First of all, with gevent, you do not need the same thread-safety as you
need with pre-emptive threads (the normal python-threads (pthreads, to
be correct)).
There will never be more than 1 greenlet running at the same time. You
just have to make sure that your greenlet is not switched (on I/O or
whatever) and leaves behind inconsistent state.
This is not the case for redis-py, at least as far as I can see.


Thats not a gevent "drawback", is a GIL [1] architecture


Andy McCurdy

unread,
Jul 6, 2012, 3:32:41 PM7/6/12
to redi...@googlegroups.com
list.pop() is in fact an atomic operation in Python given the GIL.

On Jul 5, 2012, at 11:20 PM, Bo Wang wrote:


hi Andy,
   Glad to see your reply. I agree that the pid(processid) is used to cope with multiprocessing. But I'm still confused with the "thread safety".
              Redis-py is thread safe because the connection pool only uses atomic methods when allocating/deallocating a connection. 
The source code does not indicate that the methods are atomic methods.

class Connection(object):
    def _connect(self):
        "Create a TCP socket connection"
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.settimeout(self.socket_timeout)
        sock.connect((self.host, self.port))
        return sock

class ConnectionPool(object):
   def get_connection(self, command_name, *keys, **options):
        "Get a connection from the pool"
        self._checkpid()
        try:
            connection = self._available_connections.pop()
        except IndexError:
            connection = self.make_connection()
        self._in_use_connections.add(connection)
        return connection
   def make_connection(self):
        "Create a new connection"
        if self._created_connections >= self.max_connections:
            raise ConnectionError("Too many connections")
        self._created_connections += 1
        return self.connection_class(**self.connection_kwargs)

Pop a socket from a list of python is not atomic operation. Could you give me some more detailed explanation?

thank you.


在 2012年7月5日星期四UTC+8下午11时05分24秒,Andy McCurdy写道:




On Jul 5, 2012, at 3:04 AM, Bo Wang <wang...@gmail.com> wrote:

I have read the source code of redis-py, it just maintains the pid inside the 'connection' and 'connection_pool'. If confiliting with the original pid-value when called, the redis-py retrys to re-establish the connection or clear the connection_pool.

The technique of storing the pid is actually for processes using Redis that later become forked, such as when using multiprocess. Forked processes copy any open file descriptors of the parent process. Which is obviously bad. 

Redis-py is thread safe because the connection pool only uses atomic methods when allocating/deallocating a connection.

I'm recently using Gevent which based on 'coroutine', and i tried to print the threadID. They are different! it's out of my except. I have imagined that all jobs were running in the same thread according to the concept of 'coroutine', and just cooperated through the Hub greenlet.
So if the redis-py is not real thread-safety, there will be a big problem to me. I even don't know how to assign a redis-py instance for each thread inside Gevent!


What you need to do is subclass the ConnectionPool to add a gevent pool. This is something I'd like to bake in as a convenience to others using gevent.


could anyone give a Explanation of the thread-safety of redis-py? is it real?



--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/fNGighWVIQ0J.

Josiah Carlson

unread,
Jul 6, 2012, 4:31:41 PM7/6/12
to redi...@googlegroups.com
I'll add to this...

If list.pop() were not itself atomic, then two list.pop() operations
occurring simultaneously would have a chance of both returning the
same item, or would have a chance of causing Python to segfault. As
Andy mentioned, any time an operation occurs on a Python data
structure, the calling thread, by definition, has the GIL.
Semantically, it is similar to a Redis BRPOP operation, as there
exists an underlying mechanism (the single-threaded nature of Redis,
or the GIL in Python) of ensuring that no two commands execute
simultaneously.

Technically speaking, there is a race condition in the code that Bo
listed in the make_connections() call (the += operator/assignment is
not atomic), but the likelihood of that causing an issue in practice
is low.

- Josiah

Pau Freixes

unread,
Jul 6, 2012, 5:13:53 PM7/6/12
to redi...@googlegroups.com
Technically speaking, there is a race condition in the code that Bo
listed in the make_connections() call (the += operator/assignment is
not atomic), but the likelihood of that causing an issue in practice
is low.

Yes i didnt see the += operator, but in practices de collateral damage with this code running a race condition probably - sure - pass with unnoticed. 



--
--pau
Reply all
Reply to author
Forward
0 new messages