wsgi + trac + rouge bots = apache down?

6 views
Skip to first unread message

Omry Yadan

unread,
Oct 2, 2008, 1:47:05 AM10/2/08
to trac...@googlegroups.com
Hi all,

here is the deal:

my apache server reached the maximum limit for workers several times in
the last week.

every time I had to restart it, and it bought me 10-20 hours, till the
next incident.

I setup mod_status with extended information, and ran a script that
fetched it every minute to see what is the server status before the
problem appears.

my server didn't disappoint and I got the problem 3-4 hours after I
setup this.


it appears that trac is leaking apache workers.

my trac is configured with wsgi, and it looks like when there is a
certain trac error (any trac error?) the request does not terminate
properly.

with wild and stupid bots out there, this is a recipe for a disaster.


attached is the output of the extended log. you can see that 143
requests are being served at once, all from the firestats.cc vhost (let
me assure you that my site is no where near that popular).

looking at the trac log, I can see many errors like:


2008-10-02 06:04:38,134 Trac[__init__] ERROR: database is locked
Traceback (most recent call last):
File
"/usr/lib/python2.4/site-packages/Trac-0.11.1-py2.4.egg/trac/web/main.py",
line 423, in _dispatch_request
dispatcher.dispatch(req)
File
"/usr/lib/python2.4/site-packages/Trac-0.11.1-py2.4.egg/trac/web/main.py",
line 222, in dispatch
req.session.save()
File
"/usr/lib/python2.4/site-packages/Trac-0.11.1-py2.4.egg/trac/web/session.py",
line 97, in save
(self.sid,))
File
"/usr/lib/python2.4/site-packages/Trac-0.11.1-py2.4.egg/trac/db/util.py",
line 50, in execute
return self.cursor.execute(sql_escape_percent(sql), args)
File
"/usr/lib/python2.4/site-packages/Trac-0.11.1-py2.4.egg/trac/db/sqlite_backend.py",
line 58, in execute
args or [])
File
"/usr/lib/python2.4/site-packages/Trac-0.11.1-py2.4.egg/trac/db/sqlite_backend.py",
line 50, in _rollback_on_error
return function(self, *args, **kwargs)
OperationalError: database is locked


just before the shit hits the fan.


before I upgraded trac, I used the RobotsTxtPlugin, but it does not seem
to be available for 0.11.


so, to summarize:

1. Bots can kill my trac site.

2. it appears that trac does not properly terminate resopnses on error.

3. robots.txt would help this situation (but not solve it as some bots
gives a flying fuck), but it's not available for 0.11.


help would be appreciated.


Omry.

Graham Dumpleton

unread,
Oct 2, 2008, 3:19:00 AM10/2/08
to Trac Development
> "/usr/lib/python2.4/site-packages/Trac-0.11.1-py2.4.egg/trac/db/sqlite_back end.py",
> line 58, in execute
>     args or [])
>   File
> "/usr/lib/python2.4/site-packages/Trac-0.11.1-py2.4.egg/trac/db/sqlite_back end.py",
> line 50, in _rollback_on_error
>     return function(self, *args, **kwargs)
> OperationalError: database is locked
>
> just before the shit hits the fan.
>
> before I upgraded trac, I used the RobotsTxtPlugin, but it does not seem
> to be available for 0.11.
>
> so, to summarize:
>
> 1. Bots can kill my trac site.
>
> 2. it appears that trac does not properly terminate resopnses on error.
>
> 3. robots.txt would help this situation (but not solve it as some bots
> gives a flying fuck), but it's not available for 0.11.
>
> help would be appreciated.

Please clarify your hosting configuration. What do you mean by WSGI?

Do you mean, running Trac under Apache/mod_wsgi?

Or do you mean, running tracd and proxying Apache through to tracd?

Or do you mean, running Trac under some other WSGI server and proxying
Apache through to that WSGI server instance?

The link you give for mod_status doesn't work, at least via Google
groups.

Graham

Omry Yadan

unread,
Oct 2, 2008, 3:25:00 AM10/2/08
to trac...@googlegroups.com

> Please clarify your hosting configuration. What do you mean by WSGI?
>
> Do you mean, running Trac under Apache/mod_wsgi?
>
> Or do you mean, running tracd and proxying Apache through to tracd?
>
> Or do you mean, running Trac under some other WSGI server and proxying
> Apache through to that WSGI server instance?
>
> The link you give for mod_status doesn't work, at least via Google
> groups.
>

using mod_wsgi in apache2.

this is my vhost file :

IWSGIDaemonProcess firestats.cc user=omry group=omry threads=25
maximum-requests=100

<VirtualHost *:80>
WSGIProcessGroup firestats.cc
WSGIApplicationGroup %{GLOBAL}

ServerName firestats.cc
ServerAlias www.firestats.cc
ServerAdmin omry
ErrorLog /home/omry/logs/firestats.cc/error.log

WSGIScriptAlias / /home/omry/www/trac/trac.wsgi
SetEnv trac.env_path /home/omry/www/trac/firestats.cc
</VirtualHost>

Attached a zip with the html file, hopefully this will work better.

Thanks.

status.zip

Graham Dumpleton

unread,
Oct 2, 2008, 3:49:04 AM10/2/08
to Trac Development


On Oct 2, 5:25 pm, Omry Yadan <o...@yadan.net> wrote:
> > Please clarify your hosting configuration. What do you mean by WSGI?
>
> > Do you mean, running Trac under Apache/mod_wsgi?
>
> > Or do you mean, running tracd and proxying Apache through to tracd?
>
> > Or do you mean, running Trac under some other WSGI server and proxying
> > Apache through to that WSGI server instance?
>
> > The link you give for mod_status doesn't work, at least via Google
> > groups.
>
> using mod_wsgi in apache2.

But what version of mod_wsgi. Make sure you are using version 2.3 if
using older version.

> this is my vhost file :
>
> IWSGIDaemonProcess firestats.cc user=omry group=omry threads=25
> maximum-requests=100

I would use more than 100 requests for maximum-requests unless the
memory leaks in your application are quite bad. Value of 1000 would be
more reasonable.

If you are use mod_wsgi 2.X, there is a deadlock timeout of 300
seconds and so if the whole process was being deadlocked by problem
code, it should automatically kill of process and restart it. Although
this will not help if Python code had released GIL and were hung on
access to back end database.

For this latter situation, you can add to WSGIDaemonProcess the
option:

inactivity-timeout=120

If all threads get hung and nothing happens for 120 seconds, the
process should be automatically be killed off and restarted. This at
least may get you out of situation where manual intervention required.

Other than that, looks perhaps that database requests may be hanging.
You possibly need to attach gdb to stuck process and look at stack
traces to see if stuck in third party C extension module such as a
database module. See:

http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_Crashes_With_GDB

for pointers on how to do that.

Graham

> <VirtualHost *:80>
>     WSGIProcessGroup firestats.cc
>     WSGIApplicationGroup %{GLOBAL}
>
>     ServerName firestats.cc
>     ServerAliaswww.firestats.cc
>     ServerAdmin omry
>     ErrorLog /home/omry/logs/firestats.cc/error.log
>
>     WSGIScriptAlias / /home/omry/www/trac/trac.wsgi
>     SetEnv trac.env_path /home/omry/www/trac/firestats.cc
> </VirtualHost>
>
> Attached a zip with the html file, hopefully this will work better.
>
> Thanks.
>
>  status.zip
> 7KViewDownload

Graham Dumpleton

unread,
Oct 2, 2008, 7:14:04 AM10/2/08
to Trac Development
For reference, also see:

https://www.coderesort.com/about/blog/sporadic_hang_solved

This was mention on #trac. Not sure if that was by you or you were
part of conversation.

Graham

On Oct 2, 5:49 pm, Graham Dumpleton <Graham.Dumple...@gmail.com>
wrote:
>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_C...

Omry Yadan

unread,
Oct 4, 2008, 11:40:10 AM10/4/08
to trac...@googlegroups.com
Graham Dumpleton wrote:

>> using mod_wsgi in apache2.
>>
>
> But what version of mod_wsgi. Make sure you are using version 2.3 if
> using older version.
>

2.3
exact debian version is:
2.3-1~bpo40+1


>> this is my vhost file :
>>
>> IWSGIDaemonProcess firestats.cc user=omry group=omry threads=25
>> maximum-requests=100
>>
>
> I would use more than 100 requests for maximum-requests unless the
> memory leaks in your application are quite bad. Value of 1000 would be
> more reasonable.
>

I have a problem apache workers running out due to trac misbehavior.
if anything, I will reduce this number to something that will not take
down the entire apache server whenever trac is acting up.


> If you are use mod_wsgi 2.X, there is a deadlock timeout of 300
> seconds and so if the whole process was being deadlocked by problem
> code, it should automatically kill of process and restart it. Although
> this will not help if Python code had released GIL and were hung on
> access to back end database.
>
> For this latter situation, you can add to WSGIDaemonProcess the
> option:
>
> inactivity-timeout=120
>
> If all threads get hung and nothing happens for 120 seconds, the
> process should be automatically be killed off and restarted. This at
> least may get you out of situation where manual intervention required.
>

Added this, thanks.
will this incident be logged anywhere?
it can be useful to get the stuck urls when this happens.

> Other than that, looks perhaps that database requests may be hanging.
> You possibly need to attach gdb to stuck process and look at stack
> traces to see if stuck in third party C extension module such as a
> database module. See:
>
> http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_Crashes_With_GDB
>
> for pointers on how to do that.
>

This link is about debugging apache crashes.
my apache does not crash, instead if stop responding to any requests
because all the workers are busy with trac.
so I don't think this will be helpful.


> For reference, also see:
> https://www.coderesort.com/about/blog/sporadic_hang_solved
>
> This was mention on #trac. Not sure if that was by you or you were
> part of conversation.
>

osimons mentioned this to me on the irc channel, but I am not using the
[[TOC]] macro so it's not the cause for my problem.


Omry.

Graham Dumpleton

unread,
Oct 4, 2008, 9:11:21 PM10/4/08
to Trac Development


On Oct 5, 2:40 am, Omry Yadan <o...@yadan.net> wrote:
> >> IWSGIDaemonProcess firestats.cc user=omry group=omry threads=25
> >> maximum-requests=100
>
> > I would use more than 100 requests for maximum-requests unless the
> > memory leaks in your application are quite bad. Value of 1000 would be
> > more reasonable.
>
> I have a problem apache workers running out due to trac  misbehavior.
> if anything, I will reduce this number to something that will not take
> down the entire apache server whenever trac is acting up.

What database adapter are you using? If using an older version of it
than newest available, make sure you update it. Old versions of
psycopg2 have problems.

> > Other than that, looks perhaps that database requests may be hanging.
> > You possibly need to attach gdb to stuck process and look at stack
> > traces to see if stuck in third party C extension module such as a
> > database module. See:
>
> >  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_C...
>
> > for pointers on how to do that.
>
> This link is about debugging apache crashes.
> my apache does not crash, instead if stop responding to any requests
> because all the workers are busy with trac.
> so I don't think this will be helpful.

Yes it would be helpful. When you attach the debugger to running
process instance it will suspend all theads and you get then get a
stack trace for all to see where each is stuck. If stuck in a C
extension module such as a database adapter it will show you were in C
functions it is stuck. This can help in debugging problem.

Graham

Omry Yadan

unread,
Oct 5, 2008, 3:56:59 AM10/5/08
to trac...@googlegroups.com
Graham Dumpleton wrote:

>
> On Oct 5, 2:40 am, Omry Yadan <o...@yadan.net> wrote:
>
>>>> IWSGIDaemonProcess firestats.cc user=omry group=omry threads=25
>>>> maximum-requests=100
>>>>
>>> I would use more than 100 requests for maximum-requests unless the
>>> memory leaks in your application are quite bad. Value of 1000 would be
>>> more reasonable.
>>>
>> I have a problem apache workers running out due to trac misbehavior.
>> if anything, I will reduce this number to something that will not take
>> down the entire apache server whenever trac is acting up.
>>
>
> What database adapter are you using? If using an older version of it
> than newest available, make sure you update it. Old versions of
> psycopg2 have problems.
>

I am using sqlite 3.3.8 with python-pysqlite2 2.3.2

>>> Other than that, looks perhaps that database requests may be hanging.
>>> You possibly need to attach gdb to stuck process and look at stack
>>> traces to see if stuck in third party C extension module such as a
>>> database module. See:
>>>
>>> http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_C...
>>>
>>> for pointers on how to do that.
>>>
>> This link is about debugging apache crashes.
>> my apache does not crash, instead if stop responding to any requests
>> because all the workers are busy with trac.
>> so I don't think this will be helpful.
>>
>
> Yes it would be helpful. When you attach the debugger to running
> process instance it will suspend all theads and you get then get a
> stack trace for all to see where each is stuck. If stuck in a C
> extension module such as a database adapter it will show you were in C
> functions it is stuck. This can help in debugging problem.
>

Can you please provide specific instructions on how to extract the stack
trace?
I know it's in the link, but next time it happens, I want to extract the
info as soon as possible and restart apache (trac is not the only site
on that apache).
don't want to start toying around with gdb while the site is down if I
don't know exactly what to do.


Graham Dumpleton

unread,
Oct 5, 2008, 5:18:11 AM10/5/08
to Trac Development


On Oct 5, 6:56 pm, Omry Yadan <o...@yadan.net> wrote:
> Graham Dumpleton wrote:
>
> > On Oct 5, 2:40 am, Omry Yadan <o...@yadan.net> wrote:
>
> >>>> IWSGIDaemonProcess firestats.cc user=omry group=omry threads=25
> >>>> maximum-requests=100
>
> >>> I would use more than 100 requests for maximum-requests unless the
> >>> memory leaks in your application are quite bad. Value of 1000 would be
> >>> more reasonable.
>
> >> I have a problem apache workers running out due to trac  misbehavior.
> >> if anything, I will reduce this number to something that will not take
> >> down the entire apache server whenever trac is acting up.
>
> > What database adapter are you using? If using an older version of it
> > than newest available, make sure you update it. Old versions of
> > psycopg2 have problems.
>
> I am using sqlite 3.3.8  with python-pysqlite2 2.3.2

Which is not the latest version of either package, which means any
problem could well have been fixed long ago.

> >>> Other than that, looks perhaps that database requests may be hanging.
> >>> You possibly need to attach gdb to stuck process and look at stack
> >>> traces to see if stuck in third party C extension module such as a
> >>> database module. See:
>
> >>>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_C...
>
> >>> for pointers on how to do that.
>
> >> This link is about debugging apache crashes.
> >> my apache does not crash, instead if stop responding to any requests
> >> because all the workers are busy with trac.
> >> so I don't think this will be helpful.
>
> > Yes it would be helpful. When you attach the debugger to running
> > process instance it will suspend all theads and you get then get a
> > stack trace for all to see where each is stuck. If stuck in a C
> > extension module such as a database adapter it will show you were in C
> > functions it is stuck. This can help in debugging problem.
>
> Can you please provide specific instructions on how to extract the stack
> trace?
> I know it's in the link, but next time it happens, I want to extract the
> info as soon as possible and restart apache (trac is not the only site
> on that apache).
> don't want to start toying around with gdb while the site is down if I
> don't know exactly what to do.

The information is in the section of the document the link referenced.
See part way through where it starts talking about daemon process and
attaching to process using pid. The only thing you need not do is the
'cont' command in gdb as you want to dump the stack trace as soon as
you connect the debugger.

If you are concerned about understanding what is necessary, then work
it out in advance on a development installation of Apache.

So, read the documentation, that section isn't that long. I am not
about to start spoon feeding information on how to do it, you need to
be prepared to read documentation and understand it by experimenting
as necessary first.

Graham

Omry Yadan

unread,
Oct 7, 2008, 2:55:39 AM10/7/08
to trac...@googlegroups.com
Graham Dumpleton wrote:

>>>> I have a problem apache workers running out due to trac misbehavior.
>>>> if anything, I will reduce this number to something that will not take
>>>> down the entire apache server whenever trac is acting up.
>>>>
>>> What database adapter are you using? If using an older version of it
>>> than newest available, make sure you update it. Old versions of
>>> psycopg2 have problems.
>>>
>> I am using sqlite 3.3.8 with python-pysqlite2 2.3.2
>>
>
> Which is not the latest version of either package, which means any
> problem could well have been fixed long ago.
>

changed WSGIDaemonProcess line to:


WSGIDaemonProcess firestats.cc user=omry group=omry threads=25

maximum-requests=100 inactivity-timeout=120

but it froze again (this time it took longer to happen, not sure if it's
related).
didn't had the chance to upgrade sqlite yet.

my server froze again, 155 apache processes.

this is the stack trace on one :

(gdb) thread apply all bt

Thread 1 (Thread 140422165649104 (LWP 28894)):
#0 0x00007fb6942a7182 in __read_nocancel () from /lib/libpthread.so.0
#1 0x00007fb69470258e in apr_file_read () from /usr/lib/libapr-1.so.0
#2 0x00007fb695017664 in apr_bucket_pipe_create () from
/usr/lib/libaprutil-1.so.0
#3 0x0000000000437e52 in ap_open_logs ()
#4 0x00000000004381ce in ap_scan_script_header_err_core ()
#5 0x00007fb690211071 in ?? () from /usr/lib/apache2/modules/mod_wsgi.so
#6 0x00007fb69020cd9e in ?? () from /usr/lib/apache2/modules/mod_wsgi.so
#7 0x0000000000432d09 in ap_run_handler ()
#8 0x0000000000435e82 in ap_invoke_handler ()
#9 0x0000000000442018 in ap_process_request ()
#10 0x000000000043f4dc in ap_register_input_filter ()
#11 0x0000000000439861 in ap_run_process_connection ()
#12 0x00000000004459b1 in ap_graceful_stop_signalled ()
#13 0x0000000000445c24 in ap_graceful_stop_signalled ()
#14 0x00000000004464c6 in ap_mpm_run ()
#15 0x0000000000420e70 in main ()
#0 0x00007fb6942a7182 in __read_nocancel () from /lib/libpthread.so.0


Here is the stacktrace of another:

#0 0x00007fb6942a7472 in __connect_nocancel () from /lib/libpthread.so.0
#1 0x00007fb690210591 in ?? () from /usr/lib/apache2/modules/mod_wsgi.so
#2 0x00007fb690210e2d in ?? () from /usr/lib/apache2/modules/mod_wsgi.so
#3 0x00007fb69020cd9e in ?? () from /usr/lib/apache2/modules/mod_wsgi.so
#4 0x0000000000432d09 in ap_run_handler ()
#5 0x0000000000435e82 in ap_invoke_handler ()
#6 0x0000000000442018 in ap_process_request ()
#7 0x000000000043f4dc in ap_register_input_filter ()
#8 0x0000000000439861 in ap_run_process_connection ()
#9 0x00000000004459b1 in ap_graceful_stop_signalled ()
#10 0x0000000000445c24 in ap_graceful_stop_signalled ()
#11 0x00000000004464c6 in ap_mpm_run ()
#12 0x0000000000420e70 in main ()


anything useful here?

Omry.

Graham Dumpleton

unread,
Oct 7, 2008, 7:05:57 AM10/7/08
to Trac Development
No, for a couple of reasons.

The first is that debug information not available in mod_wsgi as not
compiled with debugging. Thus ensure that -g option is included when
compiling mod_wsgi.

Second is that actually want the tracebacks for mod_wsgi daemon
process and not Apache worker processes. Use option:

display-name=%{GROUP}

with WSGIDaemonProcess. Use 'ps' command to identify the mod_wsgi
daemon process and attach gdb to it, then get tracebacks.

Reason for getting tracebacks for daemon processes is to ascertain if
threads there are stuck or not. If not, then it is an Apache
configuration issue which is most likely nothing to do with mod_wsgi.

BTW, when using mod_wsgi daemon mode, if not running PHP in Apache,
then you are better off running Apacher worker MPM rather than
prefork. That way number of processes less and overall Apache will use
less memory.

Graham

Omry Yadan

unread,
Dec 13, 2008, 6:28:43 AM12/13/08
to trac...@googlegroups.com
Sorry for the late response.
a reminder:
I am using trac with wsgi, when trac comes under stress my apache server
locks up, once it's running out of worker threads.


You suggested that the problem may have been solved in later version of
sqlite. I am now running with 3.5.9 and upgraded to python 2.5 -
but the problem is still there.


in fact, I can easily reproduce it by using an http stress testing tool
(siege) on trac /timeline url.
it brings my entire apache server down in a matter of seconds.


I rebuild mod-wsgi with debug information, and this is the stack trace:


(gdb) bt
#0 0x00007fafff6fbbe1 in sem_wait () from /lib/libpthread.so.0
#1 0x00007faff6158c78 in PyThread_acquire_lock () from
/usr/lib/libpython2.5.so.1.0
#2 0x00007faff615df62 in ?? () from /usr/lib/libpython2.5.so.1.0
#3 0x00007faff612ffbd in PyEval_EvalFrameEx () from
/usr/lib/libpython2.5.so.1.0
#4 0x00007faff6131bdd in PyEval_EvalCodeEx () from
/usr/lib/libpython2.5.so.1.0
#5 0x00007faff61302a0 in PyEval_EvalFrameEx () from
/usr/lib/libpython2.5.so.1.0
#6 0x00007faff6131bdd in PyEval_EvalCodeEx () from
/usr/lib/libpython2.5.so.1.0
#7 0x00007faff61302a0 in PyEval_EvalFrameEx () from
/usr/lib/libpython2.5.so.1.0
#8 0x00007faff6131bdd in PyEval_EvalCodeEx () from
/usr/lib/libpython2.5.so.1.0
#9 0x00007faff61302a0 in PyEval_EvalFrameEx () from
/usr/lib/libpython2.5.so.1.0
#10 0x00007faff6131bdd in PyEval_EvalCodeEx () from
/usr/lib/libpython2.5.so.1.0
#11 0x00007faff61302a0 in PyEval_EvalFrameEx () from
/usr/lib/libpython2.5.so.1.0
#12 0x00007faff6131bdd in PyEval_EvalCodeEx () from
/usr/lib/libpython2.5.so.1.0
#13 0x00007faff61302a0 in PyEval_EvalFrameEx () from
/usr/lib/libpython2.5.so.1.0
#14 0x00007faff6131bdd in PyEval_EvalCodeEx () from
/usr/lib/libpython2.5.so.1.0
#15 0x00007faff60c855e in ?? () from /usr/lib/libpython2.5.so.1.0
#16 0x00007faff60a52a3 in PyObject_Call () from /usr/lib/libpython2.5.so.1.0
#17 0x00007faff6129ff1 in PyEval_CallObjectWithKeywords () from
/usr/lib/libpython2.5.so.1.0
#18 0x00007faff63ee365 in Adapter_run () from
/usr/lib/apache2/modules/mod_wsgi.so
#19 0x00007faff63f263e in wsgi_execute_script () from
/usr/lib/apache2/modules/mod_wsgi.so
#20 0x00007faff63fb588 in wsgi_hook_daemon_handler () from
/usr/lib/apache2/modules/mod_wsgi.so
#21 0x00007faff63f79c9 in wsgi_process_socket () from
/usr/lib/apache2/modules/mod_wsgi.so
#22 0x00007faff63f7e17 in wsgi_daemon_worker () from
/usr/lib/apache2/modules/mod_wsgi.so
#23 0x00007faff63f7f16 in wsgi_daemon_thread () from
/usr/lib/apache2/modules/mod_wsgi.so
#24 0x00007fafff932171 in ?? () from /usr/lib/libapr-1.so.0
#25 0x00007fafff6f5fc7 in start_thread () from /lib/libpthread.so.0
#26 0x00007fafff46b7cd in clone () from /lib/libc.so.6
#27 0x0000000000000000 in ?? ()


The last thing in trac error log is this:


2008-12-13 08:38:02,070 Trac[main] ERROR: database is locked
Traceback (most recent call last):
File
"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/web/main.py",
line 423, in _dispatch_request
dispatcher.dispatch(req)
File
"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/web/main.py",
line 173, in dispatch
chosen_handler)
File
"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/web/main.py",
line 286, in _pre_process_request
chosen_handler = filter_.pre_process_request(req, chosen_handler)
File
"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/versioncontrol/api.py",
line 86, in pre_process_request
self.get_repository(req.authname).sync()
File
"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/versioncontrol/cache.py",
line 89, in sync
','.join(["'%s'" % key for key in CACHE_METADATA_KEYS]))
File
"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/db/util.py",
line 51, in execute
return self.cursor.execute(sql)
File
"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/db/sqlite_backend.py",
line 58, in execute
args or [])
File
"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/db/sqlite_backend.py",
line 50, in _rollback_on_error
return function(self, *args, **kwargs)
OperationalError: database is locked



Omry Yadan

unread,
Dec 13, 2008, 3:58:39 AM12/13/08
to trac...@googlegroups.com
Sorry for the late response.
a reminder:
I am using trac with wsgi, when trac comes under stress my apache server
locks up, once it's running out of worker threads.


You suggested that the problem may have been solved in later version of
sqlite. I am now running with 3.5.9 and upgraded to python 2.5 -
but the problem is still there.


in fact, I can easily reproduce it by using an http stress testing tool
(siege) on trac /timeline url.
it brings my entire apache server down in a matter of seconds.


I rebuild mod-wsgi with debug information, and this is the stack trace:


2008-12-13 08:38:02,070 Trac[main] ERROR: database is locked


Traceback (most recent call last):
File

"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/web/main.py",


line 423, in _dispatch_request
dispatcher.dispatch(req)
File

"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/web/main.py",
line 173, in dispatch
chosen_handler)
File
"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/web/main.py",
line 286, in _pre_process_request
chosen_handler = filter_.pre_process_request(req, chosen_handler)
File
"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/versioncontrol/api.py",
line 86, in pre_process_request
self.get_repository(req.authname).sync()
File
"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/versioncontrol/cache.py",
line 89, in sync
','.join(["'%s'" % key for key in CACHE_METADATA_KEYS]))
File
"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/db/util.py",
line 51, in execute
return self.cursor.execute(sql)
File

"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/db/sqlite_backend.py",


line 58, in execute
args or [])
File

"/usr/lib/python2.5/site-packages/Trac-0.11.1-py2.5.egg/trac/db/sqlite_backend.py",


line 50, in _rollback_on_error
return function(self, *args, **kwargs)
OperationalError: database is locked

>> >> ...


>> >> #15 0x0000000000420e70 in main ()
>> >> #0 0x00007fb6942a7182 in __read_nocancel () from /lib/libpthread.so.0
>> >>
>> >> Here is the stacktrace of another:
>> >>
>> >> #0 0x00007fb6942a7472 in __connect_nocancel () from
/lib/libpthread.so.0
>> >> #1 0x00007fb690210591 in ?? () from
/usr/lib/apache2/modules/mod_wsgi.so

>> >> #2 ...


>> >> #12 0x0000000000420e70 in main ()
>> >>
>> >> anything useful here?
>> >>
> >
> > No, for a couple of reasons.
> >
> > The first is that debug information not available in mod_wsgi as not
> > compiled with debugging. Thus ensure that -g option is included when
> > compiling mod_wsgi.
> >
> > Second is that actually want the tracebacks for mod_wsgi daemon
> > process and not Apache worker processes. Use option:
> >

> > display-name=%{GROUP}88


> >
> > with WSGIDaemonProcess. Use 'ps' command to identify the mod_wsgi
> > daemon process and attach gdb to it, then get tracebacks.
> >
> > Reason for getting tracebacks for daemon processes is to ascertain if
> > threads there are stuck or not. If not, then it is an Apache

> > configuration issue which is most likely nothing to do with mod_wsgi.78

Omry Yadan

unread,
Dec 13, 2008, 3:49:59 PM12/13/08
to trac...@googlegroups.com
I believe I figured it out, and uncovered a serious bug with the pooling
of SQLite connections.

the problem is that there is NO POOLING for sqlite connections.


from sqlite_backend.py :


class SQLiteConnection(ConnectionWrapper):
"""Connection wrapper for SQLite."""

__slots__ = ['_active_cursors']
poolable = have_pysqlite and os.name == 'nt' and sqlite_version >= 30301


problem is that my linux is not an 'nt' :), so poolable is false.

this in turn cause return_cnx to not pool the connection when it's returned:


so, effectively - there is currently no connection pooling for sqlite
for non nt machines.

so, if you put your server under stress, it spawns many many sqlite
connections, which eventually causes 'database is locked', and
'connection timeout' errors.

Remy Blank

unread,
Dec 14, 2008, 5:49:07 AM12/14/08
to trac...@googlegroups.com
Omry Yadan wrote:
> the problem is that there is NO POOLING for sqlite connections.
>
>
> from sqlite_backend.py :
>
>
> class SQLiteConnection(ConnectionWrapper):
> """Connection wrapper for SQLite."""
>
> __slots__ = ['_active_cursors']
> poolable = have_pysqlite and os.name == 'nt' and sqlite_version >= 30301
>
>
> problem is that my linux is not an 'nt' :), so poolable is false.

This restriction has been introduced in [4493], as it seems to have been
causing crashes. This seems to be due to a bug in pysqlite:

http://oss.itsystementwicklung.de/trac/pysqlite/ticket/187

Unfortunately, there has been no activity on that ticket for 23 months...

I'd be interested to know if you still get the crash if you enable
pooling on linux.

-- Remy

signature.asc

osimons

unread,
Dec 14, 2008, 6:12:29 AM12/14/08
to Trac Development
Hi again Omry,

I've look into this after our IRC session yesterday. I cannot find any
setting or assumption that we can use to determine if a pooling should
be enabled. The sqlite.threadsafety read setting seems to be set to 1
in source, reagardless of compile settings - I'd like to use it, but
don't trust it feeling it says more about the python side more than
the underlying actual sqlite lib...

Anyway some more links of interest:

* http://www.sqlite.org/faq.html#q6
* http://www.sqlite.org/cvstrac/wiki?p=MultiThreading
* http://trac.edgewall.org/changeset/4493
* http://trac.edgewall.org/ticket/2196
* http://oss.itsystementwicklung.de/trac/pysqlite/ticket/187
* http://oss.itsystementwicklung.de/trac/pysqlite/changeset/334%3A353e0026f76c/

The last changeset seems to indicate improved detection of the
situation in latest pysqlite version(s).

We need to improve our detection of poolability of pysqlite, but I'm
just not sure what the best approach is. Other devs may be more likely
to decide on the best strategy - I hope :-)


:::simon

https://www.coderesort.com

Omry Yadan

unread,
Dec 14, 2008, 6:16:24 AM12/14/08
to trac...@googlegroups.com
Remy Blank wrote:

Enabling pooling seems to have resolved my situation, and I have not
seen any crashes yet (in the last 12 hours).
Quoting the ticket:
"But on Linux, you very quickly run into the following crash (all it
takes are 2 concurrent requests to the timeline):"

since my stress test was 15 concurrent connections to the timeline and I
had no problems, I assume this is no longer an issue.

Omry Yadan

unread,
Dec 14, 2008, 6:25:09 AM12/14/08
to trac...@googlegroups.com

osimons wrote:

Hi Simon, You can try the sqlite3_threadsafe call :
http://www.sqlite.org/c3ref/threadsafe.html


Remy mentioned that it's probably a bug in pysqlite:
http://oss.itsystementwicklung.de/trac/pysqlite/ticket/187
Maybe it was resolved since then?

Omry Yadan

unread,
Dec 14, 2008, 6:29:08 AM12/14/08
to trac...@googlegroups.com
Ah, I missed that.
ignore my suggestion to use it. :)

Omry Yadan

unread,
Dec 14, 2008, 6:46:23 AM12/14/08
to trac...@googlegroups.com
from sqlite3-3.5.9/src/main.c line 31:

int sqlite3_threadsafe(void){ return SQLITE_THREADSAFE; }

osimons

unread,
Dec 14, 2008, 7:20:32 AM12/14/08
to Trac Development
Right, I'm sure it is in there somewhere - it is just not exposed in
any Python information structure that I can find. Having to read C
libs directly will complicate our own setup.

That is still also just one part of the puzzle, as the frontend also
matters. It works really well for you as things are compiled the right
way, and you use WSGI daemon mode with one process serving through
threads - essentially same strategy as all Windows frontends (Apache
mpm_nt never creates more than one process to serve requests).
However, if you use any frontend that forks to serve connections, the
pool is a very bad idea - essentially making new copies of the pool
(and connections) in new processes that have no means of coordinating
their actions. And, as the db code is essential to any Trac loading,
it will likely be populated with connections before requests start
coming in. Forking and db pooling will be unpredictable.

So, the check as it stands is the lowest common denominator of
"trusted" pooling. Question is, how can we improve the checks to also
include other safe situations? One idea that springs to mind is to
start off non-pooled, but as requests come in we can read req.environ
wsgi settings that on my (OSX/Apache 2.2.10) mod_wsgi daemon setup
shows as:

wsgi.multiprocess: False
wsgi.multithread: True
wsgi.run_once: False

However, on my mod_python test setup it reads:

wsgi.multiprocess: True
wsgi.multithread: False
wsgi.run_once: False

For the total set of versions/platforms/settings deemed as safe, we
could then check and turn on pooling for the process when first
request is made.


:::simon

https://www.coderesort.com

Jonas Borgström

unread,
Dec 14, 2008, 11:03:02 AM12/14/08
to trac...@googlegroups.com

On Dec 14, 2008, at 12:16 PM, Omry Yadan wrote:

>> I'd be interested to know if you still get the crash if you enable
>> pooling on linux.
>>
> Enabling pooling seems to have resolved my situation, and I have not
> seen any crashes yet (in the last 12 hours).
> Quoting the ticket:
> "But on Linux, you very quickly run into the following crash (all it
> takes are 2 concurrent requests to the timeline):"
>
> since my stress test was 15 concurrent connections to the timeline
> and I
> had no problems, I assume this is no longer an issue.


I think it's important to remember is that a connection pool's job is
to reduce the connection overhead by reusing existing connections.
This is important for most server based databases where this overhead
can be very noticeable. But with sqlite's in-process design this
benefit should be much less noticeable if it's even measurable.

SQLite's Achilles' heel is concurrency. Only one active transaction is
allowed at a time, regardless if the connection is "pooled" or not.

So that's why I can't really understand how enabling connection
pooling could on its own so clearly make all your concurrency related
problems go away.

/ Jonas

osimons

unread,
Dec 14, 2008, 11:59:17 AM12/14/08
to Trac Development
Hmm. From simple testing last night Omry quoted counts of close to a
1000 open files for the process, most of them for trac.db files. New
connections (ie. opening files) seemed to open faster than old ones
got closed - I think from the same 15 clients to the timeline (Omry
need to confirm this). Could it hit some system limits perhaps? Omry,
try running:

% limit

... and see what numbers your system give for max open files and file
descriptors.


:::simon

https:///www.coderesort.com

Omry Yadan

unread,
Dec 14, 2008, 2:12:44 PM12/14/08
to trac...@googlegroups.com

I tested using siege, a simple http stress tester.
my test was simply to stress test the timeline url with 15 concurrent
clients.
the results of this - before enabling pooling - is that soon enough trac
beging to timeout and/or return http 500.
Osimons suggester to check the number open files by the wsgi process,
which shown that while under stress the number sometimes jumped to a
very high number of open trac.db file handles (almost 1000 in one case).
once I figured that pooling was disabled, I hacked the code to re-enable
it, and noticed that the number of open tracd.db file handles no longed
exceed the number of allowed connections in the pool.
also, my stress test now works fine.

while looking at the code, I could not figure out when the connections
to sqlite are actually closed, and I suspect that the problem is that
they are actually
never closed explicitly - only when they are garbage collected.

this is with the almost trivial load caused by 15 concurrent connections
to a strong server with plenty of RAM.

as for system limits, this is the output if ulimit -a :

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 16308
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 16308
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Graham Dumpleton

unread,
Dec 14, 2008, 8:54:08 PM12/14/08
to Trac Development


On Dec 14, 11:20 pm, osimons <oddsim...@gmail.com> wrote:
> On Dec 14, 12:46 pm, Omry Yadan <o...@yadan.net> wrote:
>
> > from sqlite3-3.5.9/src/main.c line 31:
>
> > int sqlite3_threadsafe(void){ return SQLITE_THREADSAFE; }
>
> > osimons wrote:
> > > The sqlite.threadsafety read setting seems to be set to 1 in source,
> > > reagardless of compile settings
>
> Right, I'm sure it is in there somewhere - it is just not exposed in
> any Python information structure that I can find. Having to read C
> libs directly will complicate our own setup.
>
> That is still also just one part of the puzzle, as the frontend also
> matters. It works really well for you as things are compiled the right
> way, and you use WSGI daemon mode with one process serving through
> threads - essentially same strategy as all Windows frontends (Apache
> mpm_nt never creates more than one process to serve requests).
> However, if you use any frontend that forks to serve connections, the
> pool is a very bad idea - essentially making new copies of the pool
> (and connections) in new processes that have no means of coordinating
> their actions.

Note that in Apache, whether one uses mod_python or mod_wsgi, stuff
like Trac initialisation and therefore database connection creation,
happens after forking of the long running process. Thus, there is no
chance that an existing setting of connections would be inherited/
copied across a forking of the process created to handle requests.

Graham

Omry Yadan

unread,
Dec 18, 2008, 2:12:37 AM12/18/08
to trac...@googlegroups.com
I increased the file limit to 2048 and tried again without my db.pool hack:

problem didn't go away, in fact - it may be have made it worse, I didn't
receive a single response from the server when putting under stress.

activated the db pool hack (while keeping the file limit at 2048) and
things started to work again.


another user (dpb) complained about the same problem on the irc channel.
forcing sqlite_backend.py poolable to true solved his problem.


in the meantime, I had one unexplained apache crash (after about a week
of using the connection pooling).

my gut feeling is that it's unrelated, but I`ll take a closer look if
and when it happens again.
Reply all
Reply to author
Forward
0 new messages