Session file locking not working properly in multithreaded environment

25 views
Skip to first unread message

Ralph Heinkel

unread,
May 10, 2019, 8:08:47 AM5/10/19
to cherrypy-users
I have a very simple web-page which makes 20 (parallel) connections to my cherrypy server via <img> tags targetting images which are calculated programmatically on-the-fly when accessed:

<html>
<img src="/CP/getPlot?image_id=i_1"/>
<img src="/CP/getPlot?image_id=i_2"/>
[.. 18 more entries ]
<img src="/CP/getPlot?image_id=i_20"/>
</html>

In some rare, random cases some of the 'getPlot' requests fail  because cherrypy receives an incomplete session object from a session file, i.e. data that cannot be unpickled. In such a case cherrypy creates a new session object. Since authentication is bound to the session, the request then fails due to missing privileges, and hence all followup requests to 'getPlot' fail as well.

When digging in the sessions and lockfile code of cherrypy I added some log statements to see in which order things happen when a session is opened and closed. This way I found that the same lock file was locked simultaneously by different parallel threads. This of course leads to the case where a session was partially written by one thread while another thread started reading from it, causing it to receive incomplete session data, and hence unpickle failed.

The log reveilled what is happening:

UnixLockFile._lock_file called for /fs/local/run/sessions/session-3146212444067902ec35e41a4af0db64bcbacf4b.lock on fileno 32 (thread=139877983168256)
Loaded session with data: ({'csrf_token': 'c8cc04fa-6cec-4dc2-a690-d3d1398f6322', 'login': 'rh'}, datetime.datetime(2019, 5, 10, 19, 29, 26, 591346))
saving session data
: dict_items([('csrf_token', 'c8cc04fa-6cec-4dc2-a690-d3d1398f6322'), ('login', 'rh')])
UnixLockFile.release called for /fs/local/run/sessions/session-3146212444067902ec35e41a4af0db64bcbacf4b.lock (thread=139877983168256)
2019-05-10 11:29:27,141:pid=22163:127.0.0.1 - rh300070 [10/May/2019:11:29:27] "GET /CP/getPlot?image_id=i_1 HTTP/1.1" 200 14708 "http://localhost:8080/CP/static/quant.html"

UnixLockFile._lock_file called for /fs/local/run/sessions/session-3146212444067902ec35e41a4af0db64bcbacf4b.lock on fileno 36 (thread=139877479868160)
Loaded session with data: ({'csrf_token': 'c8cc04fa-6cec-4dc2-a690-d3d1398f6322', 'login': 'rh'}, datetime.datetime(2019, 5, 10, 19, 29, 27, 146440))
saving session data
: dict_items([('csrf_token', 'c8cc04fa-6cec-4dc2-a690-d3d1398f6322'), ('login', 'rh')])
UnixLockFile.release called for /fs/local/run/sessions/session-3146212444067902ec35e41a4af0db64bcbacf4b.lock (thread=139877479868160)
2019-05-10 11:29:27,153:pid=22163:127.0.0.1 - rh300070 [10/May/2019:11:29:27] "GET /CP/getPlot?image_id=i_7 HTTP/1.1" 200 14708 "http://localhost:8080/CP/static/quant.html"

UnixLockFile._lock_file called for /fs/local/run/sessions/session-3146212444067902ec35e41a4af0db64bcbacf4b.lock on fileno 32 (thread=139877454690048)
UnixLockFile._lock_file called for /fs/local/run/sessions/session-3146212444067902ec35e41a4af0db64bcbacf4b.lock on fileno 36 (thread=139877479868160)
[...]

The first two requests looked entirely ok, the lock was acquired, the session was loaded, then after the request the session got saved, and the lock released.

But then suddenly two threads obtain a lock to the same file! How is this possible? I have to mention that this does not happen every time I load the main page, sometimes I have to reload 30 or 40 times until this behavior shows up.

Looking at the code I have no idea how this can happen, the code (it uses fcntl.flock() with correct flags) looks entirely OK to me.

Any help is very much appreciated.

PS: I work under Centos7, Python 3.6.4, cherrypy ~=15.0.0 (but also the latest 18.x.x version of cherrypy shows the same behavior)

Tim Roberts

unread,
May 10, 2019, 12:33:27 PM5/10/19
to cherryp...@googlegroups.com
Ralph Heinkel wrote:

The first two requests looked entirely ok, the lock was acquired, the session was loaded, then after the request the session got saved, and the lock released.

But then suddenly two threads obtain a lock to the same file! How is this possible? I have to mention that this does not happen every time I load the main page, sometimes I have to reload 30 or 40 times until this behavior shows up.

Looking at the code I have no idea how this can happen, the code (it uses fcntl.flock() with correct flags) looks entirely OK to me.

Ah, you've stumbled across one of the classic blunders, as they say in Princess Bride.  The Unix "flock" call only blocks simultaneous access between different processes.  It doesn't protect multiple threads within a single process.  A single process can call flock repeatedly on a single file, and all of the calls will succeed.

If you want to protect a resource between threads of a single process, you need to use something other than flock, like a  Lock from the "threading" module.

-- 
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Ralph Heinkel

unread,
May 10, 2019, 12:56:43 PM5/10/19
to cherrypy-users
I'm not completely convinced that this is true. Here is an example where a second lock on the same file fails within the same thread:

Python 3.6.5 (default, Mar 31 2018, 19:45:04) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fcntl
>>> flags = fcntl.LOCK_EX | fcntl.LOCK_NB
>>> fp1 = open('x.lock', 'a+')  # first file handle and file descriptor for the lock file
>>> fp2 = open('x.lock', 'a+')  # second file handle and descriptor for the lock file
>>> lock1 = fcntl.flock(fp1.fileno(), flags)
>>> lock2 = fcntl.flock(fp2.fileno(), flags)
Traceback (most recent call last):
 
File "<stdin>", line 1, in <module>
BlockingIOError: [Errno 11] Resource temporarily unavailable

Only if lock2 was obtained from fp1.fileno() then no exception would have been raised. As soon as a separate file descriptor (for the same file) is used a second lock also fails.
But this is exaclty the point that makes me wonder why it then works in cherrypy. From the log you can see that each thread uses a different file descriptor for the lock file, so it should fail there as well...  I'm still puzzled.

Tim Roberts

unread,
May 10, 2019, 1:31:28 PM5/10/19
to cherryp...@googlegroups.com
Ralph Heinkel wrote:
> I'm not completely convinced that this is true. Here is an example
> where a second lock on the same file fails within the same thread:
> ...
> Only if lock2 was obtained from fp1.fileno() then no exception would
> have been raised. As soon as a separate file descriptor (for the same
> file) is used a second lock also fails.
> But this is exaclty the point that makes me wonder why it then works
> in cherrypy. From the log you can see that each thread uses a
> different file descriptor for the lock file, so it should fail there
> as well...  I'm still puzzled.

Yes, you're right.  The behavior I described is only true if you try to
flock the same file descriptor, or file descriptors that are "dup"s of
each other.  If the file descriptors came from two separate opens, then
the flock should work.

The zc.lockfile I'm looking at certainly opens a new fd each time, but
the source I'm looking at doesn't have a UnixLockFile class, as your log
shows.

Your log shows to calls to _lock_file, but it doesn't show that the both
succeed.  Do they?

Ralph Heinkel

unread,
May 10, 2019, 2:22:05 PM5/10/19
to cherrypy-users
On Friday, 10 May 2019 19:31:28 UTC+2, Tim Roberts wrote:

The zc.lockfile I'm looking at certainly opens a new fd each time, but
the source I'm looking at doesn't have a UnixLockFile class, as your log
shows.

Your log shows to calls to _lock_file, but it doesn't show that the both
succeed.  Do they?


yes, both succeed. The log I showed was truncated a little bit too early. Both threads then worked on the same session file simultaneously, resulting in the described behavior that one thread reads incomplete data from the session file because the other thread hasn't finished yet writing to it.


As I said I was still working on cherrypy 15. And you are right, in the most current cherrypy version zc.lockfile.LockFile() is used. However I run into the same problems with cherrypy 18. That is probably due to the fact that "The zc.lockfile package provides a basic portable implementation of interprocess locks using lock files" as its project description states.


Since in most cases people run cherrypy with multiple threads we might need to think about a locking implementation that also supports inter-thread locking of a file. Otherwise IMO the locking for file sessions only makes half sense.

 

Ralph Heinkel

unread,
May 13, 2019, 12:07:06 PM5/13/19
to cherrypy-users
I've found the reason for the session lock race condition and file a bug on github:

https://github.com/cherrypy/cherrypy/issues/1779

Reply all
Reply to author
Forward
0 new messages