Web2py locks row at auth_user and web2py_session_* tables and doesn't unlock them

223 views
Skip to first unread message

Lisandro

unread,
Apr 5, 2018, 10:58:07 AM4/5/18
to web2py-users
Hi there! This is somehow related to this other topic [1], which I have closed because I've found some new evidence and I though it would be better to open a new topic.

I'm having this problem where a simple select on auth_user table hangs indefinitely, until timeout.
The problem occurs in one specific user account. The user logs in, but after the successfull login, any request hangs.
I checked the postgres long running queries and I see this query hanging until timeout:

SELECT auth_user.id, auth_user.first_name, auth_user.last_name, auth_user.email, auth_user.password, auth_user.registration_key, auth_user.reset_password_key, auth_user.registration_id, auth_user.alta,auth_user.plantel, auth_user.responsable, auth_user.nombre, auth_user.telefono, auth_user.autor, auth_user.foto, auth_user.foto_temp, auth_user.moderador, auth_user.descripcion, auth_user.facebook,auth_user.twitter, auth_user.linkedin, auth_user.gplus FROM auth_user WHERE (auth_user.id = 2) LIMIT 1 OFFSET 0;


As you see, the query is a simple select to the auth_user table. Also, notice that it uses LIMIT, so it retrieves one only row.
Additionally, I can confirm that the table has only four rows.

I went a little further, and I checked the pg_locks at postgres when the problem is happening (that is, during the 60 seconds until timeout).
This is the query I run to check the locks and this is what I see during that time:

select t.relname,l.locktype,page,virtualtransaction,pid,mode,granted from pg_locks l, pg_stat_all_tables t where l.relation=t.relid order by relation asc;


           relname          
| locktype | page | virtualtransaction | pid  |      mode       | granted
-----------------------------+----------+------+--------------------+------+-----------------+---------
 pg_class                    
| relation |      | 70/908027          | 4421 | AccessShareLock | t
 pg_index                    
| relation |      | 70/908027          | 4421 | AccessShareLock | t
 pg_namespace                
| relation |      | 70/908027          | 4421 | AccessShareLock | t
 web2py_session_infosanpedro
| relation |      | 16/1858255         | 9865 | AccessShareLock | t
 auth_user                  
| relation |      | 16/1858255         | 9865 | AccessShareLock | t
(5 rows)


Notice the lock of the web2py_session_* and auth_user tables.
That lock is there during the whole 60 seconds, until timeout.

After timeout, the lock isn't there anymore.
If I reload the page, again, the lock appears and the request hangs until timeout.

All the other user accounts can login normally.
Also, there is nothing particularly different with the user account having the problem (it's an account like any other else).
Of course, I already tried to delete everything from the web2py_session table, but it doesn't work, the problem remains.

Where else should I look? 
Would it help if I share with you the app code for the login? It doesn't do any tricky stuff.
What concerns me is that the problem is generated in one specific account, but in several of our customers (we have the same web2py app installed for every one of our customers, each one has its own database). The same problem is happening randomly in a few accounts. Database activity is really low.

I'm pretty lost here, so any help or suggestion will be appreciated.
Thanks in advance.
Regards,
Lisandro


[1] https://groups.google.com/forum/#!topic/web2py/o0FBeimrr5c

Anthony

unread,
Apr 5, 2018, 4:42:46 PM4/5/18
to web...@googlegroups.com
For any select that takes place within a transaction, Postgres by default automatically acquires an access share lock on the involved tables, so this should be happening on all requests. However, access share locks do not conflict with each other, so I don't think these locks are necessarily the source of your problem.

Anthony

Lisandro

unread,
Apr 6, 2018, 10:58:56 AM4/6/18
to web2py-users
Yes, in fact, I've been running that SQL command to check for locks, and sometimes I see that lock on other tables, but that other locks live for less than a second. However, when the problem happens, the lock on the auth_user and web2py_session tables remains there for the whole 60 seconds. 

Forgot to mention that, when the problem presents in an specific account, I can experience the same problem if I try to impersonate that account. It seems as if something would have "break" within that account, and deleting web2py_session_* rows doesn't help. Actually, until now, the only way I could "solve" it was creating a new account (of course, that is not technically a solution) or moving the sessions to Redis.

In case it helps, this is the controller function that processes the post for the login.
I don't pretend you check all the code, because I'm not sure how this could be the problem. 

def user():
    if request.env.request_method == 'POST':
        email = request.post_vars.email.lower().strip() if request.post_vars.email else ''
        password = request.post_vars.password
        recordarme = bool(int(request.post_vars.recordarme or 0))
        errores = []
        if not email:
    errores.append(['email', 'Ingresa tu dirección de email'])
if not password:
    errores.append(['password', 'Ingresa tu contraseña'])
        if errores:
            return response.json({'success': False, 'errores': errores})

        usuario = db(db.auth_user.email.lower() == email).select().first()
        if not usuario:
            session.failed_login_attempt = True
            return response.json({
                'success': False,
                'message': 'Login data invalid'
            })
        elif usuario.registration_key:
            return response.json({
                'success': False,
                'message': 'Registration is pending for confirmation'
            })
        else:
            usuario = auth.login_bare(usuario.email, password)
            if not usuario:
                return response.json({
                    'success': False,
                    'message': 'Login data invalid'
                })
            session.auth.expiration = auth.settings.expiration
            if recordarme:
                session.auth.expiration = auth.settings.long_expiration
                session.auth.remember_me = True
                if response.cookies.get(response.session_id_name):
                    response.cookies[response.session_id_name]["expires"] = session.auth.expiration
            try:
                db.auth_event.insert(time_stamp=request.now, client_ip=request.client, user_id=usuario.id,
                                     origin='auth', description='User %s Logged-in' % usuario.id)
            except:
                pass
            return response.json({'success': True})



Anthony

unread,
Apr 6, 2018, 3:05:13 PM4/6/18
to web2py-users
On Friday, April 6, 2018 at 10:58:56 AM UTC-4, Lisandro wrote:
Yes, in fact, I've been running that SQL command to check for locks, and sometimes I see that lock on other tables, but that other locks live for less than a second. However, when the problem happens, the lock on the auth_user and web2py_session tables remains there for the whole 60 seconds.

Yes, but that doesn't mean the lock or the database has anything to do with the app hanging. The locks will be held for the duration of the database transaction, and web2py wraps HTTP requests in a transaction, so the transaction doesn't end until the request ends (unless you explicitly call db.commit()). In other words, the app is not hanging because the locks are being held, but rather the locks are being held because the app is hanging. First you have to figure out why the app is hanging (it could be the database, but could be something else).

Anthony

Lisandro

unread,
Apr 6, 2018, 5:59:28 PM4/6/18
to web2py-users
Oh, I see, you made a good point there, I hadn't realised.

I guess I will have to take a closer look to my app code. Considering that the problem exists in specific accounts while others work ok, and considering also that the problem happens with any request that that specific user makes to any controller/function, I'm thinking: what does my app do different for a user compared to another one at request level? For "request level" I mean all the code the app runs in every request, to start, the models/db.py

I'll take a closer look to that and will post another message here if I find something that could signal the root cause of the issue. 

Thank you very much for your help!

Lisandro

unread,
Apr 16, 2018, 3:15:54 PM4/16/18
to web2py-users
Hi there, sorry to bother again, I have some additional info that could help.

The problem happened again, exactly the same as the other times. 
But this time an error ticket was created with this traceback:
  • Traceback (most recent call last):
    File "/var/www/medios/gluon/main.py", line 463, in wsgibase
    session._try_store_in_db(request, response)
    File "/var/www/medios/gluon/globals.py", line 1152, in _try_store_in_db
    if not table._db(table.id == record_id).update(**dd):
    File "/var/www/medios/gluon/packages/dal/pydal/objects.py", line 2117, in update
    ret = db._adapter.update("%s" % table._tablename,self.query,fields)
    File "/var/www/medios/gluon/packages/dal/pydal/adapters/base.py", line 988, in update
    raise e
    DatabaseError: query_wait_timeout
    server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

Could this indicate that for some reason web2py is failing to store the session?
Or could it still be that a deadlock in my app code is producing this error?

Richard Vézina

unread,
Apr 16, 2018, 3:53:42 PM4/16/18
to web2py-users
Hello LIsandro,

I recall you have ask question about session in redis... Could there are remaining stuff of experiment with redis that haven't been commented out completly??

Richard

--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anthony

unread,
Apr 16, 2018, 4:31:47 PM4/16/18
to web2py-users
Where is the database server running? Is it possible there are occasional network problems connecting to it?

Anthony

Lisandro

unread,
Apr 16, 2018, 5:57:47 PM4/16/18
to web2py-users
Hi, thank you both for your time and concern.

@Richard: this particular website was still running with sessions stored in Redis. As we have several websites, moving sessions to Redis is something that we will do progressively in the next weeks.

@Anthony: the database server is PostgreSQL, running in the same VPS, so I wouldn't say it's due to network problems. I do have pgBouncer and I limit the pool size to only 1 connection (with 2 of reserve pool) per database. The app didn't have much load (actually it almost never has), but in this situation, with that query hanging for 60 seconds, it's probable that the ticket error was because there were no more connections available for that db (for example, if the user with the problem tried simultaneously in a laptop, in a pc and in his mobile phone). 


Some (weird) points about the problem:
  • While it presents in an specific account, other user accounts can login and work perfectly with the app.
  • As an admin, I have the permission to impersonate other user accounts. When the problem happens, I can impersonate any account but the one with the problem (the impersonation is successfull, but the same timeout presents after I'm impersonating the account).
  • Problem doesn't go away deleting all web2py_session_table records and clearing cookies.
  • Problem doesn't go away changing the account email or password.
  • The only solution I've been applying last times it happened, was to create a new account for the user and invalidate the old one.

Today, when the problem happened, I created the new account for the user and moved the sessions to Redis. Maybe I should have kept sessions in the db, in order to debug the problem with that account. Now it's not possible anymore, because I already moved to Redis. Of course I could move back sessions to db, but I don't like the idea of debugging at production in the website of a customer, specially one who had a recent issue with this.

So, I'll wait if it happens again, and I'll try to leave the account there to do some tests.
Thank you very much for your time!

Lisandro

unread,
Apr 19, 2018, 9:27:46 AM4/19/18
to web2py-users
Hi there,
I've found the issue but I still don't know how is it produced.
Anthony was right from the begining when he said "the app is not hanging because the locks are being held, but rather the locks are being held because the app is hanging"
Since that comment, I was waiting for the problem to happen again to decompile the app and print some logs to see exactly the line of code where the application hangs. 

So that's what I did, and I've found that my app indeed hangs in an specific line of code of models/db.py:
This is my models/db.py resumed:


if auth.is_logged_in() and auth.user.responsable:

    # ----------- THIS IS THE LINE WHERE THE CODE HANGS ----------

    session.important_messages = cache.redis('important_messages-%s' % auth.user.id,
                                             lambda: get_important_messages(), 
                                             time_expire=180)




So I checked what the function "get_important_messages()" does, and I see that it connects to a webservice (also developed with web2py):


def get_important_messages():
    from gluon.contrib.simplejsonrpc import ServerProxy

    try:
        result = webservice.get_account_info(CONFIG.customer_id)
    except Exception as e:
        result = []
    return result



Then I went to double check my nginx error.log, this time looking for errors in the URL that the app uses to connect to the webservice. Surprisingly, I'm seeing a few timeouts everyday for that URL:

2018/04/17 15:08:22 [error] 23587#23587: *93711423 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:08:22 [error] 23587#23587: *93711449 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX2 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:08:36 [error] 23582#23582: *93711928 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:09:04 [error] 23582#23582: *93713029 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX3 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:09:16 [error] 23591#23591: *93713451 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:09:24 [error] 23582#23582: *93713819 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX4 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:09:25 [error] 23582#23582: *93713839 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX5 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:10:25 [error] 23582#23582: *93716003 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:12:34 [error] 23591#23591: *93720887 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX6 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:12:36 [error] 23590#23590: *93720938 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX7 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:12:50 [error] 23589#23589: *93721468 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/16 10:39:39 [error] 16600#16600: *89723537 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX7 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/16 10:40:10 [error] 16601#16601: *89724987 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/16 10:40:11 [error] 16602#16602: *89725040 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/16 16:59:46 [error] 17874#17874: *90771814 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/16 17:00:56 [error] 17877#17877: *90774663 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/16 17:01:11 [error] 17879#17879: *90775407 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/15 13:46:46 [error] 11395#11395: *86829630 upstream timed out (110: Connection timed out) while reading response header from upstream, client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"


So, what I know now is that the problem are these timeouts that occur ocasionally when an app tries to connect the main webservice with this code:




This is the code of the ws.py controller that implements the webservice:

# -*- coding: utf-8 -*-

from gluon.tools import Service

service = Service()


def call():
    if not request.vars.token or not db(db.websites.token == request.vars.token).count():
        raise HTTP(403)
    session.forget()
    return service()



Notice that the call receives a token, and every app that tries to connect has its own token, in order to validate the connection.
I'm not sure why some of the calls to the webservice hang, but I'm sure of this:
  • While some of these calls time out, other identical calls work properly (and they are all identical, just calls to connect to the webservice).
  • Just in case, I've checked that my nginx configuration isn't applying requests limits to my server IP or something like that, but no warning or error regarding this is showed in the nginx error.log
  • Also, just in case, I checked my pgBouncer log to see if connections to the main database are exhausted, but that's not the case either (actually, if this was the case, I would see error tickets created and also any other attempt of connection to the webservice would fail, when this is not happening).

Now I'm lost here, I don't see how the attempt of connection to the webservice could fail. 
Maybe network problems, but they should affect other connections as well.

Any comment or suggestion will be much apreciated.
Regards,
Lisandro.

Lisandro

unread,
Apr 20, 2018, 7:28:28 AM4/20/18
to web2py-users
Sorry to bother you again with this, but I think I've found the problem.
The problem is apparently with Redis integration. It had nothing to do with connections, database, sessions, none of that. Here is what I've found.

Remember, the line where my app hangs is this:

session.important_messages = cache.redis('important-messages-%s' % auth.user.id,
                                         lambda: get_important_messages(),
                                         time_expire=180)


As the problem only presented in production, on the website of my customer, I asked him to allow me to play a little with the code. 
So, first thing I did was to cache request.now instead of calling the function "get_important_messages()", but the problem remained.
Then I thought "maybe if I change the key..." and I changed the original code to this:

session.important_messages = cache.redis('important-messages',
                                        lambda: get_important_messages(),
                                        time_expire=180)


Notice that only thing I changed was the key to store in Redis. And it worked! I thought that maybe "auth.user.id" was some large number, but I checked and the user ID is 3. Tried to pass it like int(auth.user.id) but I had no success. App still hangs when I try to retrieve that specific key. Only that key.

I've connected to redis-cli and it tells me that the key isn't there.
So I set a "hello" value for the key, I get it, then I deleted it:

$ redis-cli
127.0.0.1:6379> DUMP w2p:myapp:important-messages-3
(nil)
127.0.0.1:6379> SET w2p:myapp:important-messages-3 "hello"
OK
127.0.0.1:6379> DUMP w2p:myapp:important-messages-3
"\x00\x05hello\x06\x00\xf5\x9f\xb7\xf6\x90a\x1c\x99"
127.0.0.1:6379> DEL w2p:myapp:important-messages-3
(integer) 1127.0.0.1:6379> DUMP w2p:myapp:important-messages-3
127.0.0.1:6379> DUMP w2p:myapp:important-messages-3
(nil)


But event after that, web2py hangs with this simple code:

r = cache.redis('important-messages-3', lambda: request.now, time_expire=30)

This happens only with that specific key. I can set the key to "important-messages-2", "important-messages-999", "important-messages-A", anything I can think, but with that specific key it hangs.

We have several websites (around 200), and this problem has happened about 5 o 6 times in different websites, but it was always the same problem. The only solution I had (until now) was to create a new account for the user (that explains why it worked with a new account, that is because the new account had a different auth.user.id, so the key to store in redis was different).

Could this be a bug in the redis_cache.py integration?
Maybe I should open a new thread about this, right?

Anthony

unread,
Apr 20, 2018, 9:54:20 AM4/20/18
to web2py-users
When you set up the Redis cache, do you set with_lock=True? If so, I wonder if an error here could be causing the key to be locked and never released. I guess you can check for a key named "w2p:myapp:important-messages-3:__lock".

Anthony

Lisandro

unread,
Apr 20, 2018, 10:47:10 AM4/20/18
to web2py-users
Thank you very much for your time Anthony.

Yes, I use Redis with_lock=True.
I checked but there is no *__lock key stored in Redis. I double checked that.

But, giving you mentioned with_lock, I tried to set with_lock=False, and it worked. 
Then I set with_lock=True again, and it worked too.
Apparently, the problem went away after executing the request one time with_lock=False, and then I could set it back to True and it kept working ok.

I'm using an old version of web2py (2.10). 
I already have plans to upgrade next month.
I've checked the difference between the redis_cache.py file of my version and the current stable one: https://www.diffchecker.com/owJ67Slp
But I'm not able to see if the new version could help on this. It is indeed a weird problem.


In the book I've read that with_lock=True is good to avoid that two different threads write the same key value. In my case, I suppose it won't hurt if that happens (my app uses cache in order to increase performance).
Should I consider setting it to False?

Anthony

unread,
Apr 20, 2018, 11:19:11 AM4/20/18
to web2py-users
On Friday, April 20, 2018 at 10:47:10 AM UTC-4, Lisandro wrote:
Thank you very much for your time Anthony.

Yes, I use Redis with_lock=True.
I checked but there is no *__lock key stored in Redis. I double checked that.

But, giving you mentioned with_lock, I tried to set with_lock=False, and it worked. 
Then I set with_lock=True again, and it worked too.
Apparently, the problem went away after executing the request one time with_lock=False, and then I could set it back to True and it kept working ok.

I'm using an old version of web2py (2.10).

Looking at the code under 2.10, it is not clear what the problem could be, as the locking code is in a try block and there is a "finally" clause that deletes the lock key if there is an exception.

The current code in master looks like it could result in a lock being stuck if an exception occurs while storing or retrieving a cache item.

Anthony

Lisandro

unread,
Apr 20, 2018, 11:58:47 AM4/20/18
to web2py-users
I see what you mean. 
But still, if my interpretation is correct, in those cases we should see the *__lock key stored.
What is weird about my specific issue is that there was no *__lock key.

Anyway, regardless upgrading web2py, now I'm wondering if I should set with_lock True or False. Do you have any suggestion? The book says:
"Redis cache subsystem allows you to prevent the infamous "thundering herd problem": this is not active by default because usually you choose redis for speed, but at a negligible cost you can make sure that only one thread/process can set a value concurrently.

I haven't found comments regarding when is best to use with_lock=True and when to use with_lock=False. I'm guessing with_lock=True is best when the process that generates the data that is going to be cached takes long time or uses much resources. That's not my case, so I'm tempted to change it to False, but I'm not sure about the decision. If you have any experience or suggestion about that, I'll appreciate you comment about it.

Thanks again.
Regards,
Lisandro.

Anthony

unread,
Apr 20, 2018, 2:32:15 PM4/20/18
to web2py-users
On Friday, April 20, 2018 at 11:58:47 AM UTC-4, Lisandro wrote:
I see what you mean. 
But still, if my interpretation is correct, in those cases we should see the *__lock key stored.
What is weird about my specific issue is that there was no *__lock key.

Anyway, regardless upgrading web2py, now I'm wondering if I should set with_lock True or False. Do you have any suggestion? The book says:
"Redis cache subsystem allows you to prevent the infamous "thundering herd problem": this is not active by default because usually you choose redis for speed, but at a negligible cost you can make sure that only one thread/process can set a value concurrently.

I haven't found comments regarding when is best to use with_lock=True and when to use with_lock=False.

Probably safest to set it to True unless that slows things down noticeably. Or go with False if you can tolerate the occasional race condition.

Anthony

Lisandro

unread,
Apr 21, 2018, 6:37:10 AM4/21/18
to web2py-users
Ok, I've left it set to True.

A quick comment about a couple of tests I did regarding RedisSession (that also has a "with_lock" argument).
To the test, I updated web2py locally to version 2.16.1-stable+timestamp.2017.11.14.05.54.25
And then I run my apps using:
  • RedisSession(..., with_lock=False)
    This is the way that I was already using it, and apps run normally (pretty fast because they do simple tasks)

  • RedisSession(..., with_lock=True)
    Using this, the apps start responding with hughe delay; the same requests that usually took less than a second to complete, start tooking between 6 and 10 seconds.
I was testing locally, with only one user, one session. 
After switching with_lock argument I always did a "flushall" at redis and restarted uwsgi, just in case.
I switched several times to confirm, and the difference is significant. 

Having seen this, I'll keep using Redis this way:
RedisCache(..., with_lock=True)
RedisSession(..., with_lock=False)


Thank you very much Anthony for your help.
Regards,
Lisandro.

PS: I would like to change this thread subject to something more appropiate, but I'm not allowed to do it. Sorry for having created it with that ugly title, I'll do better next time.

Anthony

unread,
Apr 21, 2018, 5:00:36 PM4/21/18
to web2py-users
A quick comment about a couple of tests I did regarding RedisSession (that also has a "with_lock" argument).
To the test, I updated web2py locally to version 2.16.1-stable+timestamp.2017.11.14.05.54.25
And then I run my apps using:
  • RedisSession(..., with_lock=False)
    This is the way that I was already using it, and apps run normally (pretty fast because they do simple tasks)

  • RedisSession(..., with_lock=True)
    Using this, the apps start responding with hughe delay; the same requests that usually took less than a second to complete, start tooking between 6 and 10 seconds.
That seems odd. This should have minimal effect on the time to complete a single request if there are no other simultaneous requests involving the same session. How are you testing?

It does look like the Redis session code puts a 2 second lock on the session when it is first read at the beginning of a request, so if a page makes multiple simultaneous Ajax requests when it loads, each request will take up to 2 seconds before the next can be processed (if I'm reading the code right, an update to the session will result in the lock being released, but if no update is made, it looks like you have to wait for the full 2 seconds to expire).

Note, locking with Redis cache works differently -- the lock is held only during the time it takes to read from or write to the cache, not for the duration of the entire request or beyond.

Anthony

Lisandro

unread,
Apr 21, 2018, 8:10:52 PM4/21/18
to web2py-users
El sábado, 21 de abril de 2018, 18:00:36 (UTC-3), Anthony escribió:
A quick comment about a couple of tests I did regarding RedisSession (that also has a "with_lock" argument).
To the test, I updated web2py locally to version 2.16.1-stable+timestamp.2017.11.14.05.54.25
And then I run my apps using:
  • RedisSession(..., with_lock=False)
    This is the way that I was already using it, and apps run normally (pretty fast because they do simple tasks)

  • RedisSession(..., with_lock=True)
    Using this, the apps start responding with hughe delay; the same requests that usually took less than a second to complete, start tooking between 6 and 10 seconds.
That seems odd. This should have minimal effect on the time to complete a single request if there are no other simultaneous requests involving the same session. How are you testing?

It is weird indeed. 
I'm testing in my local environment, with web2py updated to the version I mentioned, I use uwsgi and nginx, but I tried also with web2py's embedded server. 
I'm testing on an admin app that I developed, it doesn't make ajax calls, it is just a bunch of urls where I put a grid with pagination and some forms to edit rows, stuff like that.

I've noticed that when I use RedisSession(..., with_lock=True) the app takes much longer to respond. 
I've measured the change using the Chrome Inspector, within the Network tab, checking for the TTFB (time to first byte).
Normally, running locally, the TTFB of any request URL is around 50ms. 
But when I set RedisSession(..., with_lock=True), flushall redis and restart uwsgi, the TTFB raises to 6s up to 10s.
This happens even with requests where I run session.forget(response)

Something particularily weird is that once every 5 or 6 times, the request responds fast (with a low TTFB). 
I mean, the same request URL, I stay with the Chrome inspector opened in that tab, and reload the page several times.
The TTFB is always between 6s and 10s, but every 5 or 6 times, it is low (about 50ms as it should be). 
I'm running Redis 3.2.10

Here is my db.py, but I don't think it has anything out of the ordinary:

db = DAL(
    'postgres://%s:%s@%s/%s' % (CONFIG.db_user, CONFIG.db_user_password, CONFIG.db_host, CONFIG.db_name),
    migrate=False,
    lazy_tables=True,
    folder=os.path.join(CONFIG.path_datos, 'databases'))

_conn = RConn('localhost', 6379)

sessiondb = RedisSession(
    redis_conn=_conn, 
    session_expiry=172800,  # two days
    with_lock=True)

session.connect(request, response, db=sessiondb)

auth = Auth(globals(), 
            db, 
            hmac_key=Auth.get_or_create_key(filename=os.path.join(CONFIG.path_datos, 'auth.key')))

auth.define_tables()

# ... define all the tables
# ... configure auth settings



It does look like the Redis session code puts a 2 second lock on the session when it is first read at the beginning of a request, so if a page makes multiple simultaneous Ajax requests when it loads, each request will take up to 2 seconds before the next can be processed (if I'm reading the code right, an update to the session will result in the lock being released, but if no update is made, it looks like you have to wait for the full 2 seconds to expire).

Note, locking with Redis cache works differently -- the lock is held only during the time it takes to read from or write to the cache, not for the duration of the entire request or beyond.


I see what you mean, I took a look at the code.

I've read this from the book: "The redis backend for sessions is the only one that can prevent concurrent modifications to the same session: this is especially true for ajax-intensive applications that write to sessions often in a semi-concurrent way. To favour speed this is by default not enforced, however if you want to turn on the locking behaviour, just turn it on with with_lock=True parameter passed to the RedisSession object"

Considering that, and giving that until now I stored the sessions in the db, I guess that it won't hurt to use RedisSession(..., with_lock=False), as it would remain being the same that before (regarding the lock), right? 


Anyway, let me know if there is some additional test that I can do to figure out why the app takes longer using RedisSession(..., with_lock=True).

Regards,
Lisandro

Lisandro

unread,
May 3, 2018, 5:37:53 PM5/3/18
to web2py-users
I came back to reply this particular message, because the problem happened again yesterday.

Just in order to remember, the problem was with redis_cache.py failing to write some specific key name.
Not *__lock object was found in the cache during the incident.
Connecting manually to redis-cli would show that the key isn't present and can be written and read.
But from the web2py application (using the redis_cache.py adapter), the application hangs with that specific key.
The only solution is to change the key name. Weird. 


Yesterday, our Redis server crashed (it reported out of memory, but the server had plenty of RAM available, this was due to an issue with memory fragmentation).
Anyway, after that crash, I did a flushall and restarted Redis server. 
Everything started to work ok for almost the total 260 websites we run with web2py, but 4 of them kept hanging. 
All the requests to those websites were hanging. I put some log sentences, and I found out that it was the same problem, this time with another random key name. 

The solution was, again, to de-compile the apps, manually edit the source code and change the key name. But of course, this is not a solution.
Again, this time there was no *__lock key in the cache, and I could succesfully use redis-cli to write and read the key name. 

I don't have a proper solution yet. 
redis-cli flushall doesn't help.


I'm using an old web2py version (Version 2.10.3-stable+timestamp.2015.04.02.21.42.07).
I already have plans to upgrade. 
But before, could I manually update only gluon/contrib/redis_*.py files?
In your opinion, does that test worthwhile? Does it make any sense?

Anthony

unread,
May 4, 2018, 9:13:52 AM5/4/18
to web2py-users
I'm using an old web2py version (Version 2.10.3-stable+timestamp.2015.04.02.21.42.07).
I already have plans to upgrade. 
But before, could I manually update only gluon/contrib/redis_*.py files?
In your opinion, does that test worthwhile? Does it make any sense?

I haven't studied the code differences, so not sure. I suppose you could give it a try.

Anthony

Massimo Di Pierro

unread,
May 5, 2018, 11:13:06 PM5/5/18
to web2py-users
That is a very old version of web2py. The problem may have been solved years ago. ;-)

Lisandro

unread,
Jun 4, 2018, 8:34:22 AM6/4/18
to web2py-users
I just wanted to confirm that the problem was apparently solved after upgrading web2py to version 2.16.1 and also upgrading Redis to latest stable 64bit version (version 1.8.1).
The app has been running for a couple of weeks now and the problem didn't arise.
I truly sorry having bothered you with this, considering that the problem could be solved just upgrading web2py and Redis.
We also have just moved our production deployment to CentOS 7 for better stability.

Once again, I deeply thank all of you for your time and your constant willingness to help!
Best regards,
Lisandro.
Reply all
Reply to author
Forward
0 new messages