Performance problems when executing the sessions2trash.py script

165 views
Skip to first unread message

Lisandro

unread,
Mar 20, 2015, 6:33:15 PM3/20/15
to web...@googlegroups.com
I'm using we2py in production to serve about 15 websites, each one of them is served by it's own web2py installation.

I want to clean up expired sessions every certain amount of time. To do that, I create a file under /etc/cron.d/ for every website. The file has the following content:

MAILTO=root
*/60 * * * www-data nohup python /var/www/mywebsite/web2py.py -S init   -M -R /var/www/mywebsite/scripts/sessions2trash.py -A -o
*/60 * * * www-data nohup python /var/www/mywebsite/web2py.py -S panel  -M -R /var/www/mywebsite/scripts/sessions2trash.py -A -o

As you can see, the cleaning is executed one per hour. I'm using two lines for each websites, because each website has two web2py applications running: "init" and "panel", so I clean up sessions of both of them. 

The problem is that, if I activate those lines in cron configuration, every time they are executed I can see that memory usage of my sever goes "to the sky" (memory is all used and the server starts to swap), also the CPU load increases considerably, and during about 5 o 10 minutes, all the websites throw errors.

If I execute those lines manually, they execute and finish instantly. I mean, there aren't too much sessions to cleanup, so the cleaning takes no more than a few seconds.  So I don't understand why the same process isn't working when called from cron. Any tip on this? Thanks in advance!

Niphlod

unread,
Mar 20, 2015, 6:44:57 PM3/20/15
to web...@googlegroups.com
are sessions stored on files, db, memcache or redis ?

Lisandro

unread,
Mar 20, 2015, 6:54:19 PM3/20/15
to web...@googlegroups.com
Sessions are stored on the db.
I'm using postgresql, and this is part of my code on db.py model:

db = DAL('postgres://%s:%s@%s/%s' %(DB_USER, DB_USER_PASSWORD, DB_HOST, DB_NAME), lazy_tables=True)
session
.connect(request, response, db=db, masterapp='init')

As you can see, I'm using masterapp parameter of the session.connect() method, because every one of my sites have two web2py applications that share sessions.
Having noticed that, I'm not shure if I have to run the sessions2trash.py script for both applications or just one of them. However I **think** this isn't relevant to the preformance problem. 

Niphlod

unread,
Mar 20, 2015, 7:18:57 PM3/20/15
to web...@googlegroups.com
you can check with the verbose option the things you could be interested in (and helping me/us understand the possible issues):
- if cleaning on "panel" right after cleaning "init" actually cleans up something --> it shouldn't if your masterapp is always "init"
- how many sessions are you inspecting
- how many sessions are you clearing

Massimo Di Pierro

unread,
Mar 20, 2015, 7:47:45 PM3/20/15
to web...@googlegroups.com
You say "I'm using we2py in production to serve about 15 websites, each one of them is served by it's own web2py installation." Why? Why not a single web2py running? Do the difference instances run on the same server and different ports?

Lisandro

unread,
Mar 20, 2015, 8:05:31 PM3/20/15
to web...@googlegroups.com
I've already tried with -v option for verbose output, but nothing special is shown.

I've just ran the script for one of the sites, and the output was ok:
web2py Web Framework
Created by Massimo Di Pierro, Copyright 2007-2015
Version 2.9.12-stable+timestamp.2015.01.17.06.11.03
Database drivers available: sqlite3, pymysql, psycopg2, pg8000, imaplib
e4b39c2b-f6c4-442a-b352-211c2fcc3474 trashed
a6166534-cc6e-49ef-9900-30f8ad299cc6 trashed
bdbc3d14-ff5e-49c6-a2d6-7f2e7ea83fd7 trashed
c169ee4b-dc48-410c-9342-a85571d41d36 OK
78182dcf-0738-4534-a394-98cdb320ecba OK
b209819e
-c5d7-4ba0-a922-0546a4d34bdc OK

Then, I ran the script for other site, and there was no session trashed or "ok", however, the execution time was around 30 seconds or more, and during that time, I could see how the swap started increasing and also CPU.

I read a little more about sessions, and I noticed that, when storing them in the db, there is a table called, in this case, "web2py_session_init", so I checked the count of records, and I found out that one of the websites has **a lot** of records in that table, specifically, 1712710 records :P

So I guess that's the problem, I don't know how much would web2py take to clear all that sessions, but I guess that's a lot of records to handle.

Question: can I just delete all records on that table? Would that cause any trouble to users? Or would they just be asked to login again?

Niphlod

unread,
Mar 20, 2015, 8:15:46 PM3/20/15
to web...@googlegroups.com
they'll loose session data. if they are logged in, they would be asked to login again.
To alleviate the issues, you could delete all records that have a modified_datetime older than some value in the past, e.g. 2 days ago. Currently logged in users (that are probably storing/changing something in their own session) wouldn't spot anything, and you'll get your table - hopefully - down to a manageable size.

BTW, if you didn't purge sessions regularly (i.e. you just started recently), and your script is failing to fetch the data - because it's not printing anything - it's probably accumulating.
Unfortunately, sessions2trash.py isn't very keen to a very large table, and much depends on how much data you store per session. If that table is holding 1GB of data, it gets stored in memory by the select that is fetching the records to inspect them.

Lisandro

unread,
Mar 21, 2015, 9:02:48 AM3/21/15
to web...@googlegroups.com
Problem solved!  The problem was, in effect, the large quantity of records on the web2py_session_init table. So I did a manual delete on the table, and then I could successfully add the sessions2trash.py script to the cron configuration. So far, everything is working ok.

Thank you very much Niphlod for your help!

Lisandro

unread,
Mar 21, 2015, 9:55:15 AM3/21/15
to web...@googlegroups.com
Hello Massimo. I'm not really sure about my answer, but that's a question I've asked myself in the past.

Let me clarify: in my company we offer auto-administrable websites (for newspapers, magazines, organizations, bloggers). What we have is a main web2py app called "panel", that is pretty much like a CMS. In the other hand, we have a bunch of other web2py apps that we call the "templates".

So, when a new client arrives, he chooses one of the templates, and then I run a script that I made to create the website. In resume, this is what the script does:
 - download and uncompress web2py.zip
 - install "panel" app
 - install the choosen template and symlink it to "init" app
 - create database and initialize configuration variables

So, each on of those 15 websites that I was talking about consists of: 
 - the main "panel"
 - the "init" application, symlinked to the template that has been choosen by the client.

The "panel" app is accesed via domain.com/panel, and is the place where our client can access to manage the content of his website, that is, news, videos, polls, etc.
In the other hand, the "init" app is the public part of the website, and it controls which data is shown and how. 

So, when we were starting this project, I decided to go with multiple instances of web2py. However, I had my doubts about being possible to execute all the websites with one only web2py instance. But there were too much points that I wouldn't be able to resolve without professional help, for example:
 - How do I configure routes.py to serve multiple "panel" apps installed with different name but being accesed via /panel url?
 - What about multiple uwsgi applications? That is, multiple wsighandler.py files. Where should they be placed assuming that I need one per website?
 - What about adding and deleting applications on production without restarting?

I must say that I'm completely sure that there are a lot of things to improve to my app, and I would really like the assisante of a "web2py expert" (also I would like to hear some suggestions of a server administrator, and a database administrator too). However here in Argentina there isn't official support of web2py (I think), and if there is, it's in Buenos Aires, but I'm far away from there.

I would really like to "open" my code to more advanced programmers/engineers, because the next step I want to cover is the creation of a website from a website. I mean, I'm working on the official website of my company, where the clients would be able to create their website in seconds (this is where my scripts would run on the back, creating a new instance of web2py, installing the apps, creating the database, configuring the virtual host, etc). But "opening" the code is still something to discuss with the other members of the company, so.. 

Finally, I must say that I really appreciate any tip or sugestion, but I understand that I would have to be more specific to get more specific tips. 

As always, thanks a lot for the help, this community is awesome, I have always been able to solve my problems with the help received here :)

Massimo Di Pierro

unread,
Mar 21, 2015, 4:15:03 PM3/21/15
to web...@googlegroups.com
You can definitively handle this with one single web2py instance and you would save lots of memory (not CPU, only memory, but it may be worth).

Since you are using this in production, my recommendation is to use nginx and use nginx.conf to map subdomains into apps. Each domain would have it panel. When a new domain is created your program would add an entry to nginx.conf  and restart it. No need to restart or use multiple uwsgi.

This is really more of a ngnix config issue than a web2py programming issue.

Massimo

Lisandro Rostagno

unread,
Mar 21, 2015, 4:50:11 PM3/21/15
to web...@googlegroups.com
Thank you Massimo, I will try that, I think I'm going to "play" a
little on a testing server and then go to production.

One thing that I was concerned about was the assignment of resources.
Considering that our clients choose a "plan", I need to limit
resources per every website accordingly to the plan selected.
Currently I'm applying some limits like requests per ip, connections
per ip, and total connections per server. Those limits are applyed
through nginx configuration, so it can be done regardless there is one
or multiple instances of web2py.

However, there is still another limit that I apply, that is the amount
of processes dedicated to the website. For example, if I have a 4 core
CPU, the websites that selected the basic plan are assigned with 1
core, others with 2 cores and others with 4 cores. I get this done
through uwsgi configuration of every specific website (using the
"processes" configuration parameter), but I don't know if I can
achieve that with only one web2py instance running (therefor, one only
uwsgi application, I think).

Anyway, like you said, this is more of a nginx/uwsgi configuration issue.
When I have some time, I will make some tests about this, and I will
probably be here asking some questions :P
> --
> Resources:
> - http://web2py.com
> - http://web2py.com/book (Documentation)
> - http://github.com/web2py/web2py (Source code)
> - https://code.google.com/p/web2py/issues/list (Report Issues)
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "web2py-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/web2py/Yz4Tn762cg4/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> web2py+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Niphlod

unread,
Mar 21, 2015, 5:19:23 PM3/21/15
to web...@googlegroups.com
This last post is a dream come true for my boss.

Glossary:
- "developer" --> someone that can create logics in programs, coding in some programming language
- "sysop" --> someone who knows inside out every bit of the pieces the "developer" uses to make programs work
- "devop" --> someone who is a coder, and tries to put the bits together to make his program work
Bit of background:
Working in a large sysop group managing servers for every kind of things...in the range of a 6M $ worth of hardware.
The new "devops" movement, where "developers" that are not-that-expert in "sysops" try to do all by themselves, often results in all kinds of configuration nightmares. I mean, I'm a developer too, but maybe I'm too biased on the sysop side to let this go unnoticed. I'm happy that developers can set up a website from the ground up, we're in 2015 after all....
But "sysop-only" people (boss included) are scared of these "new times" because the idea that is moving everybody on is that they don't matter anymore: if "devops" can set everything from the ground up, why the need of "sysops" ?
Ok, background finished.

This kind of configuration may only matter if you didn't run multiple sites on a single server!
let's start with "cpu" ideas....
web performance isn't related only to cpu power...but even if it was, assigning 2 wsgi processes vs 1 in a 4-core server doesn't mean that users will get 2x performances...with some - hard - roundings (memory, I/O contention, network, nginx master processes, etc) your theory would be valid only as long as you host EXACTLY 4 processes at the most in a 4-core server.
limits "ideas"....
enforcing limits per ip address: what about users sitting behind a corporate proxy, users on large MANs, chineese folks ? to your site, they all come from the same ip, but are different users!

tl;dr: don't ever put here what are the sites you manage, you may get someone like me on the other end and he'd not be pleased :-P

Lisandro Rostagno

unread,
Mar 26, 2015, 10:35:31 AM3/26/15
to web...@googlegroups.com
2015-03-21 18:19 GMT-03:00 Niphlod <nip...@gmail.com>:
> This last post is a dream come true for my boss.
>
> Glossary:
> - "developer" --> someone that can create logics in programs, coding in some
> programming language
> - "sysop" --> someone who knows inside out every bit of the pieces the
> "developer" uses to make programs work
> - "devop" --> someone who is a coder, and tries to put the bits together to
> make his program work
> Bit of background:
> Working in a large sysop group managing servers for every kind of
> things...in the range of a 6M $ worth of hardware.
> The new "devops" movement, where "developers" that are not-that-expert in
> "sysops" try to do all by themselves, often results in all kinds of
> configuration nightmares. I mean, I'm a developer too, but maybe I'm too
> biased on the sysop side to let this go unnoticed. I'm happy that developers
> can set up a website from the ground up, we're in 2015 after all....
> But "sysop-only" people (boss included) are scared of these "new times"
> because the idea that is moving everybody on is that they don't matter
> anymore: if "devops" can set everything from the ground up, why the need of
> "sysops" ?
> Ok, background finished.
>

It's true, though I think that that has always been happening,
historically. I mean, today, if my car is broken, I take it to the
mechanic, and there they connect the car to a computer and the
computer tells them where the problem may be. The don't need to be
experts. However, it doesn't mean that we don't need mechanic experts
no more. On the contrary, I think those mechanic experts are able to
invest their time in stuff of a higher level.

In the same way, I don't think that we don't need sysops no more.
Maybe it's true that we don't need them as much as before. For
example, I'm just a developer, and it's been a while since I'm doing a
lot of stuff that is supposed to be done by a sysop. However, I think
that I wouldn't be able to achieve the quality/efficiency in the same
way that a sysop would do.

Other example is the html/css design. Today there is Adobe Muse, and
there are a lot of people there that are creating websites without
knowing anything of html or css. However, I understand that resulting
code returned by Adobe Muse can't be compared to some code made from
the ground by someone who really knows how to code css and html.

Returning to the idea of the sysops, I'm 100% convinced that a sysop
could setup things much better than me. Trust me, I'm truly convinced
that a sysop (and a dbop if you allow me to call it that way) would
improve my systems a lot. So the big question: why haven't I called a
sysop yet? Well, this moment in Argentina isn't the best for startup
companies. We've been financing ourselves for the first two years, so
we are carefully using every buck :P

> This kind of configuration may only matter if you didn't run multiple sites
> on a single server!
> let's start with "cpu" ideas....
> web performance isn't related only to cpu power...but even if it was,
> assigning 2 wsgi processes vs 1 in a 4-core server doesn't mean that users
> will get 2x performances...with some - hard - roundings (memory, I/O
> contention, network, nginx master processes, etc) your theory would be valid
> only as long as you host EXACTLY 4 processes at the most in a 4-core server.

I understand. Actually I've been playing around a little, changing the
processes assigned and I didn't notice any considerable change on
performance.
So I think I just have to adjust the processes uwsgi configuration to
match de number of cores on my server.

> limits "ideas"....
> enforcing limits per ip address: what about users sitting behind a corporate
> proxy, users on large MANs, chineese folks ? to your site, they all come
> from the same ip, but are different users!

Yes indeed. I noticed that when I discovered the limit_req module of
nginx. However, with regard to limits, I'm applying some limits based
on the "binary_remote_address", a variable available on nginx. But in
those cases, I'm applying high limits, just to get some basic ddos
protection.

> tl;dr: don't ever put here what are the sites you manage, you may get
> someone like me on the other end and he'd not be pleased :-P

I must say I didn't fully understand that. Do you mean I shouldn't put
here the domains I manage? Or do you mean that I should tell what are
the sites about?
Also I must admit that I can't figure out if you said that because
you're somehow angry or you feel that someway I offended you with my
last post. Or I just got it all wrong :P

Any way, I really appreciate your help, and I would like to contact
you (or your company) in the future for a possible server tune up.

Niphlod

unread,
Mar 26, 2015, 5:23:40 PM3/26/15
to web...@googlegroups.com

> tl;dr: don't ever put here what are the sites you manage, you may get
> someone like me on the other end and he'd not be pleased :-P

I must say I didn't fully understand that. Do you mean I shouldn't put
here the domains I manage? Or do you mean that I should tell what are
the sites about?
Also I must admit that I can't figure out if you said that because
you're somehow angry or you feel that someway I offended you with my
last post. Or I just got it all wrong :P

don't "disclose" the sites you're running. I (and others "sysops") may be compelled to buy the least expensive plan :-P
I'm not angry, I just see how things are going on in the world and even if I support the "devops" movement I still feel that at some point the developer should "have a chat" with a sysop on the underlying architecture. I'm not scared of "new times"... I'm a sysop, and a "dbop", and a developer :-P
Reply all
Reply to author
Forward
0 new messages