Web2py freezing on live deployment!

551 views
Skip to first unread message

Andrew Buchan

unread,
Nov 4, 2013, 6:15:56 AM11/4/13
to web...@googlegroups.com

 Hi,

I have a serious issue. Got a web2py install running as a service on a Windows 2003 box with SQL Server, with applications serving 100s of users in-house. We have a live and test application running on the same install but pointing to different databases, and due to things not being released these have diverged quote a bit over the months. We just managed to reconcile these (with carefully supervised migration - one table at a time) last week, and the problems started happening. Web2py becomes unresponsive after a period of use - it just hangs when you try to load a page. There are no errors in web2py, in the event manager, or SQL. Rolling back those changes is tricky, as there have been changes to the database and data has already been added..
When left running over the weekend, its fine. If I try accessing every page, it's fine (or at least, I can't break it). If it goes into proper use with lots of other people accessing it, it crashes.

I don't know how to find out what's going wrong. It's a relatively recent build of web2py, and I don't want to upgrade to throw yet another variable in there (there also been changes to the server it uses fro its mail function, though that seems to work fine, and we just removed IIS from the same server).

Any ideas why web2py would just freeze?

Willoughby

unread,
Nov 4, 2013, 8:29:23 AM11/4/13
to web...@googlegroups.com
Are you running Microsoft Endpoint Security?  I have problems with the virus scanner 'locking up' things under even light usage.  One or two users can bang all day, no problem but get more than 10 and it randomly freezes.  Our fix was to exclude pretty much anything Python related.  YMMV.

Andrew Buchan

unread,
Nov 4, 2013, 10:56:32 AM11/4/13
to web...@googlegroups.com
Thanks Willoughby,

We've got McAffee on that network, so have asked the IT guy to set it to ignore python and web2py folders. Will let you know what the upshot is once that's in place...

Regards,

Andy.

LightDot

unread,
Nov 4, 2013, 11:26:21 AM11/4/13
to web...@googlegroups.com
Which version of web2py is it? Are you using gevent or rocket web server?

Can you try load testing your devel version and see if you can replicate the issue? You can use something like siege or a similar tool.

Regards

Derek

unread,
Nov 4, 2013, 11:59:11 AM11/4/13
to web...@googlegroups.com
McAfee always seems to block things incorrectly, at least for me. 

Andrew Buchan

unread,
Nov 4, 2013, 2:13:03 PM11/4/13
to web...@googlegroups.com
It's running on rocket. I now think it's dying during a specific ajax call, but only some of the time and/or shortly after its called (there's usually a page load after it) and/or when it's excessively busy. Not sure if siege can automate the button clicks that trigger the ajax calls but I might be able to do something with selenium...

Andrew Buchan

unread,
Nov 5, 2013, 6:53:13 AM11/5/13
to web...@googlegroups.com
Ok, the IT guy has disabled McAffee's "on access" scan for the folders containing web2py stuff as well as python's installation directory. He tells me that parts of McAffee other than "on access" scan may still interfere but there's not much we can do about that. This hasn't made any difference.
As for the ajax calls, I thought it might be to do with the asynchronous calls taking too long when the server is overloaded (and it seems it is being hammered intermittently by another application), but I checked my web2py_ajax.html file and the ajax function is set to "async: false", so there shouldn't be an issue there unless there is some kind of timeout that kicks in?

But.... I was about to post the above when I did some checks on the ajax calls, and am a bit confused...
I have two javascript functions which call ajax:

#Function1:  This call displays a check list of 'previous contracts' to pick from
ajax('HubForms/Timesheets/AjaxReturnBlank', [], 'PreviousContractListingArea');

#Function2:  Once user has check some items, they click a button to call this, which adds all selected contracts to timesheet, then reloads the page.
jQuery('.PreviousContractCheckbox').each(
    function(index)
    {
    if(this.checked)
        {
        jQuery('#ContractId').val(this.name);
        ajax('HubForms/Timesheets/AjaxAddContractToTimesheet', ['ContractId', 'TimesheetId', 'UserId']);
        }
    }
    );
a='nothing, just works';
window.location='HubForms/Timesheets/ViewTimesheet?Timesheet_Id=15995';
location.reload(true);

I temporarily modified the ajax function in web2py_ajax.html to display a pop-up, then wait 3 seconds before executing (last 4 lines modified):

 function ajax(u,s,t) {
  var query="";
  for(i=0; i<s.length; i++) { 
     if(i>0) query=query+"&";
     query=query+encodeURIComponent(s[i])+"="+encodeURIComponent(document.getElementById(s[i]).value);
  }
  /*this line:
       jQuery.ajax({type: "POST", url: u, data: query, async: false, success: function(msg) { if(t==':eval') eval(msg); else document.getElementById(t).innerHTML=msg; } }); 
  replaced by these 4 lines:
  */
  alert('hi');
  setTimeout(function(){
  jQuery.ajax({type: "POST", url: u, data: query, async: false, success: function(msg) { if(t==':eval') eval(msg); else document.getElementById(t).innerHTML=msg; } })
  },3000);  
}

The funny thing is that my Function1 does this (says 'hi' then pauses for 3 seconds), but Function2 says 'hi' for every item selected as you'd expect, but does NOT pause...
This is making me wonder whether web2py's ajax function behaves differently when called from inside jQuery().each() as the setTimeout() is being ignored...? 
If that is the case then perhaps the directive to use "async: false" is not being taken into account in this scenario?

Any javascript knowledgeable people able to help on this one?

Andrew Buchan

unread,
Nov 5, 2013, 3:59:43 PM11/5/13
to web...@googlegroups.com
Update:

I made a copy of the web2py installation on a new server (still pointing to old database) and eventually got it set it up as a service but it still freezes with no errors in web2py or in the event manager...

I'm pretty sure it's a programming error on my part somewhere or a migration issue, but I need to be able to find out what it is!!! (people are getting tetchy...)

I went back to the original server and tried debugging using winpdb, and it seems web2py gets stuck in a perpetual loop. When I pause, it tends to stop in the "run" or "listen" methods in rocket.py, or "accept" in socket.py (python installation dir), so the thread is running, just in a loop and not responding to new requests...

I'm thinking of putting the server onto apache to see if it behaves any differently, but if anyone has any bright suggestions on what I can do to find out what's going, I'm all ears :-)





Derek

unread,
Nov 6, 2013, 12:28:59 PM11/6/13
to web...@googlegroups.com
Check the logs, find the last line in the log - that's probably what caused the freeze. Check your logging.conf to find out where the logs are kept.

Andrew Buchan

unread,
Nov 6, 2013, 3:31:42 PM11/6/13
to web...@googlegroups.com

Hi Derek,

Thanks for replying, I had checked those but the last entry was months old, so that was a dead end.

I finally got web2py running on Apache, though it took me till 5 am :-)
(ps: if anyone else plans on attempting to install Apache on Windows against MSSQL, I'm happy to help out or write a tutorial, it wasn't straightforward! Feel free to pm me)

I left both the rocket and the apache web2py installs live running on separate servers (pointing to same database) and advised the users to work on both. When I checked after a few hours, the rocket one had become unresponsive, and apache was still running fine.

I think the problem is something to do with to the server being overloaded, and perhaps ajax calls time out and holding locks on the sessions files, causing the whole thing to become unresponsive. There were multiple threads interacting with the same session files judging from the debugger... If anyone would like me to do more investigation on this, I'm happy to help so long as you tell me where to look.

Apache also caused a rather funny issue... A set of ajax calls which were meant to run synchronously (because of "async: false" flag) was failing to complete before page reload, whereas it always did before, leading me to think Apache was not running it synchronously as instructed. What I think is happening is that the ajax call never ran synchronously in the first place (it's in a jQuery.each() loop) but on rocket it always finished on time before the page reload, and apache seems to reload way faster so caught me out.
Word to the wise: check any assumptions you make about ajax running synchronously or not, especially as async: false is deprecated in jQuery 1.8.



--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/PEC1uLfzlrU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Derek

unread,
Nov 6, 2013, 6:23:06 PM11/6/13
to web...@googlegroups.com
Ah, if you are on Windows, I would recommend you run it with gevent instead of rocket, it should be faster than even apache.

Derek

unread,
Nov 6, 2013, 6:30:51 PM11/6/13
to web...@googlegroups.com
I'll also add this:
look at anyserver.py for how to get it going...

also, gevent does monkey.patch_all which means that all your tcp sockets would become async, which means if you are freezing up because a single connection is frozen, that won't be the case with gevent. Your individual connections would still be waiting but the server will process other requests while it waits for data. In other words, while one client may freeze up, the rest should process just fine.

Ricardo Pedroso

unread,
Nov 6, 2013, 6:32:54 PM11/6/13
to web...@googlegroups.com
I don't have a solution for your problem, if you could make a minimal application that replicates the problem, it would
be easy to track it down.
 
setTimeout will not pause javascript execution, it will only put the given function in a "queue" to be processed later.

Why are you doing an ajax call for each item checked?
I think you could just make one ajax call with a list of checked items, it would be more efficient.

Derek

unread,
Nov 6, 2013, 6:39:56 PM11/6/13
to web...@googlegroups.com
Yea, the ajax running synchronously - you provide callbacks because they do run asynchronously. 
To quote the jQuery book:

The first letter in Ajax stands for "asynchronous," meaning that the operation occurs in parallel and the order of completion is not guaranteed. The async option to $.ajax() defaults to true, indicating that code execution can continue after the request is made. Setting this option to false (and thus making the call no longer asynchronous) is strongly discouraged, as it can cause the browser to become unresponsive.

Just what kind of app do you have that needs to make synchronous requests? Why are you not batching requests and processing async?

Niphlod

unread,
Nov 6, 2013, 6:50:33 PM11/6/13
to web...@googlegroups.com


On Thursday, November 7, 2013 12:23:06 AM UTC+1, Derek wrote:
Ah, if you are on Windows, I would recommend you run it with gevent instead of rocket, it should be faster than even apache.

unfortunately it doesn't work as happily as it should: if you're using any non-green module (he's working with mssql, hence either pyodbc or pypyodbc, that are not), those libraries don't "yield", so you block all other greenlets in the meantime.
Without an upstream proxy balancing to multiple gevented web2py(s) if you have a "blocking" query to the database you queue up basically all other connections. I guess that  without an apache ProxyBalancer (or a much yummier gunicorn, as it stands due in two months) those web2py windows apps with non-green modules will run more "concurrently-friendly" with any "old" threaded webserver.

Derek

unread,
Nov 6, 2013, 7:15:57 PM11/6/13
to web...@googlegroups.com
Yea, I'm working on a communication layer for pyodbc or pypyodbc which will handle requests with gevent, thus your application should yield properly, but it's not available just yet, and it won't necessarily be a drop-in replacement for pyodbc though. In the meantime, this is usually good enough, and you can set timeouts for the queries and connects, but that would be a change to the DAL.

Derek

unread,
Nov 6, 2013, 7:23:02 PM11/6/13
to web...@googlegroups.com
It does look like there is a pure python implementation of TDS called pytds, which may actually work... the hard part would be getting web2py to use it...

Niphlod

unread,
Nov 7, 2013, 2:26:44 AM11/7/13
to web...@googlegroups.com
hard ? with DAL it's pretty easy.... if a module exposes the dbapi just force the driver and implement the connect method and it's usually good to go.
I'll test it when I get back home.

Andrew Buchan

unread,
Nov 7, 2013, 6:47:32 AM11/7/13
to web...@googlegroups.com
Thanks for the suggestions, gevent looks good but I am indeed using pyodbc and this seems to be working with Apache so I'd rather stick with this for time being. (Note I had to embedd the Python27.dll manifest into pyodbc.pyd to get it to work with Apache).

The reason for using ajax in the first place was naivety. At the time I simply looked up how to call a controller function from javascript and read that ajax() was the way to do it, so proceeded with that without understanding the implications.

The feature in question is a check list, which I go over with jQuery.each(). If checked: copy the value to a hidden field, and call ajax to insert a record in the database including that field and couple others. Once done, set windows.location to same page to reaload it.
The page has a form on it elsewhere, so I didn't want to make this a form as I kept getting a "confirm resubmission" popup (this is going from memory, it was actually written a long time ago).

I've now changed it so each() collate all the values into a string (they're ints) and pass to a url which I navigate to with windows.location, that page does the insert and returns back to previous page. No ajax in there whatsoever :-)
I'm scouring the code for other places where I may have placed asynchronous calls next to synchronous ones, which will only have worked by "coincidence" up till now...
I know, I know. I might just pick up my copy of the pragmatic programmer and throw it at myself. Hard.








On Thu, Nov 7, 2013 at 7:26 AM, Niphlod <nip...@gmail.com> wrote:
hard ? with DAL it's pretty easy.... if a module exposes the dbapi just force the driver and implement the connect method and it's usually good to go.
I'll test it when I get back home.

--

Tim Richardson

unread,
Nov 8, 2013, 7:31:37 AM11/8/13
to web...@googlegroups.com
I had a puzzling problem with 2.7.4 as a rocket service on Windows 2003, but this was a problem with sqlform.grids. There was a print statement left in code which caused the service to stop working. This cause doesn't match with your insights into the problem, but I mention it anyway. You only see the problem when web2py is running as a service (I use nssm to create the service, but this is not relevant). 
 
The problem was very quickly fixed in trunk, but as far as I know the fix is not in any stable version yet.
 
I'm still using 2.6.4 for production apps in this scenario (rocket as a service on windows server). 

Andrew Buchan

unread,
Nov 8, 2013, 8:06:17 AM11/8/13
to web...@googlegroups.com
Yes, I've been caught out by that quite a few times! Not a problem with Apached I may add, the wsgi file redirects stdout to show up in the apache error log, but I suppose you could point it anywhere really.


--

Derek

unread,
Nov 14, 2013, 2:18:49 PM11/14/13
to web...@googlegroups.com
any luck?

Andrew Buchan

unread,
Nov 15, 2013, 12:30:28 PM11/15/13
to web...@googlegroups.com
Hi Derek, the install on Apache is running fine and the ajax issue it threw up I was able to fix as described above. So my problem is solved, but thanks for checking in again. 
I found that after updating to the latest version of web2py, it was easy to get apache running following the deployment recipes (I skipped the SSL part) whereas with the older version I had previously, I had to do a few hacks to get it to work.
I've not done anything since to determine what went wrong with Rocket in the first place, and to be honest am not able to justify spending time investigating what went wrong seeing as I have a working alternative (and been busy trying to make up for lost time!)

Let me know if you want a copy of my httpd.conf file (anyone).



--

Derek

unread,
Nov 15, 2013, 2:29:20 PM11/15/13
to web...@googlegroups.com
That's cool, I was just asking about the side project of Niphlod's to look at the pytds so that we could finally have a pure python database adapter. With a pure python adapter, gevent could properly use greenlets to make database queries, thus there would be no blocking on the queries.

Andrew Buchan

unread,
Nov 20, 2013, 9:56:51 AM11/20/13
to web...@googlegroups.com

Further update:

Apache did finally crash, with the following message:

"Server ran out of threads to serve requests. Consider raising the ThreadsPerChild setting"

I've bumped that up to 500, and there are less than that number of users on the system (although I understand there will be more than one connection fired off per user/session).
But it crashed/became unresponsive again with no error message!

I added mod_status to see what's going on, and it seems there are loads of requests being sent to which never exit, and I can't tell where these are being fired from as they don't indicate which virtualhost they came from (and no url mapping). I'm waiting for approval to join apachelounge to ask over there, but thought I'd ask here if anyone had any information on how web2py uses/creates requests/threads in apache, and how it could be that apache becomes unresponsive after a while. I'm not even sure it is running out of workerthreads (I thought servers queued requests till threads became available specifically so they wouldn't crash under heavy load).

Also, I leave sessions alone, but should I be clearing them?

Also also, I updated to the latest version of web2py and it seems someone left a print statement in SQLFORM.grid() which prints every row! (Line 2373 in sqlhtml.py) I presume that if I were running that as a windows service on rocket it would crash the moment you displayed a grid...


Niphlod

unread,
Nov 20, 2013, 11:26:48 AM11/20/13
to web...@googlegroups.com


On Wednesday, November 20, 2013 3:56:51 PM UTC+1, Andrew Buchan wrote:

Further update:

Apache did finally crash, with the following message:

"Server ran out of threads to serve requests. Consider raising the ThreadsPerChild setting"

I've bumped that up to 500, and there are less than that number of users on the system (although I understand there will be more than one connection fired off per user/session).
But it crashed/became unresponsive again with no error message!

Asking for real disasters!!!!! Are you sure that there are more than 150 (default) requests active ?
 

Also, I leave sessions alone, but should I be clearing them?

Yep, you definitely should (as anyone in production)
 

Also also, I updated to the latest version of web2py and it seems someone left a print statement in SQLFORM.grid() which prints every row! (Line 2373 in sqlhtml.py) I presume that if I were running that as a windows service on rocket it would crash the moment you displayed a grid..


known issue. Upgrade to trunk 

Andrew Buchan

unread,
Nov 20, 2013, 7:39:00 PM11/20/13
to web...@googlegroups.com
Hi Niphlod, what is asking for disasters? Bumping it up to 500? Can you elaborate on why? (I'm all for avoiding disasters given what's been happening!)
The server-status page was showing 150 worker threads active, except lots of the seem not to be from proper requests. Here's my post on apachelounge with more details: http://www.apachelounge.com/viewtopic.php?t=5655 

Apache since became unresponsive again since, strangely I could still access appadmin via local host, but pages in the main application weren't responding, even from localhost... CPU and memory were both fine. Made me wonder if I'm doing something stupid in something very top-level application-wise (like in a model file or in layout.html), but I would expect an error message in that case.

I'll look into clearing sessions, thanks for the tip.


--
Reply all
Reply to author
Forward
0 new messages