Trac 0.10 gets stuck after an hour or so

Christian

unread,

Oct 26, 2006, 12:04:38 PM10/26/06

to Trac Users

Hi everyone,

I hope you guys can help me on this, we upgraded to trac 0.10 recently
from 0.9. Since then trac has been quite unstable. What happens is
after some activity trac suddenly dies and you can no longer access it,
the browser times out and the apache process clocks at 100%.
I cannot pinpoint a particular activity that triggers this behavior.

Anyone experienced a similar issue or has an idea on where to look?

Our 0.9 was very stable and we had it running for many months.

We are running Trac 0.10 on Ubuntu 6.06 with Apache 2.0. We have Trac
sites on sqlite and Postgres (8.1.4).

Thank you

Christian Boos

unread,

Oct 26, 2006, 1:07:16 PM10/26/06

to trac-...@googlegroups.com

Christian wrote:
> Hi everyone,
>
> I hope you guys can help me on this, we upgraded to trac 0.10 recently
> from 0.9. Since then trac has been quite unstable. What happens is
> after some activity trac suddenly dies and you can no longer access it,
> the browser times out and the apache process clocks at 100%.
> I cannot pinpoint a particular activity that triggers this behavior.
>
> Anyone experienced a similar issue or has an idea on where to look?
>

It could be the DB connection pool, for which there's has been a few
fixes done in the trunk.
Please try to apply the following changes:

http://trac.edgewall.org/changeset?new=trunk%2Ftrac%2Fdb%2Fpool.py%403934&old=trunk%2Ftrac%2Fdb%2Fpool.py%403798
http://trac.edgewall.org/changeset?new=trunk%2Ftrac%2Fdb%2Fapi.py%403934&old=trunk%2Ftrac%2Fdb%2Fapi.py%403798

or if you can use the above links, paste the following links in Trac's
search box:

diff:trunk/trac/db/pool.py@3798//trunk/trac/db/pool.py@4039
diff:trunk/trac/db/api.py@3798//trunk/trac/db/api.py@4039

In all cases, select the 'Diff' format at the bottom of the page to get
a usable diff.

Eventually add to this the following changes on the sqlite backend:
http://trac.edgewall.org/changeset/3830

Those changes are likely to go on the 0.10-stable branch quite soon
(even sooner if those changes appear to solved your problem ;) )

There's however one thing that make me think it could yet be something
completely different, it's the apache process 100% CPU usage. This
doesn't look like being related to a hang on the db pool lock... Maybe
you could use gdb to attach to the offending process and get a stack
trace to see where it is stuck?

-- Christian (another one)

Christian

unread,

Oct 27, 2006, 11:10:59 AM10/27/06

to Trac Users

Hi Christian, I tried to patch using the diff files you presented but I
get the following errors during the patching:

File to patch: trac/db/pool.py
patching file trac/db/pool.py
Hunk #1 FAILED at 18.
Hunk #2 FAILED at 40.
Hunk #3 FAILED at 95.
Hunk #4 FAILED at 102.
Hunk #5 succeeded at 108 with fuzz 2.
4 out of 5 hunks FAILED -- saving rejects to file trac/db/pool.py.rej

I downloaded the current trac 0.10 and from the root of the directory
did patch -p0 < pool.py.diff

Perhaps i'm lacking patching skills? :)

Christian Boos

unread,

Oct 27, 2006, 11:45:12 AM10/27/06

to trac-...@googlegroups.com

Christian wrote:
> Hi Christian, I tried to patch using the diff files you presented but I
> get the following errors during the patching:

> ...

> Perhaps i'm lacking patching skills? :)
>

Well, I don't know, perhaps end-of-line issues.
Anyway, I've collected the changes previously mentioned and I attached
them to this mail as a single diff on 0.10.

-- Christian

pool_fixes-0.10.diff

Christian

unread,

Oct 27, 2006, 12:38:51 PM10/27/06

to Trac Users

Hello Christian,

I did the patch, reinstalled...but the problem persists and we have
apache2 spiking again, I did an update of all the packages on ubuntu
but still that didn't do it.

It really only started with 0.10, you mentioned a thread dump, how do
we go about doing that?

C

Christian Boos

unread,

Oct 28, 2006, 6:30:15 AM10/28/06

to trac-...@googlegroups.com

Christian wrote:
> Hello Christian,
>
> I did the patch, reinstalled...but the problem persists and we have
> apache2 spiking again, I did an update of all the packages on ubuntu
> but still that didn't do it.
>

ok

> It really only started with 0.10, you mentioned a thread dump, how do
> we go about doing that?
>

With ps or top, get the pid of the apache process that spikes to 100%
(say 1234).
Then, start gdb, and do "attach 1234". You're now attached to that
process and you can to a 'bt' (backtrace) to see where it is stuck.
There are additional tricks to get access to the Python stacktrace from
there [1], but the C stacktrace should already be a good start to
understand what's going on.

-- Christian

[1] http://trac.edgewall.org/ticket/1401#comment:14

Christian Billen

unread,

Oct 28, 2006, 8:21:26 AM10/28/06

to trac-...@googlegroups.com

Ok, well there's actually two apache process both spiking.

The first one:
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7bc456b in __read_nocancel () from /lib/tls/i686/cmov/libpthread.so.0
#2 0x080793e4 in ap_mpm_pod_check ()
#3 0x080773b3 in ap_graceful_stop_signalled ()
#4 0x0807751b in ap_graceful_stop_signalled ()
#5 0x080775e2 in ap_graceful_stop_signalled ()
#6 0x08078094 in ap_mpm_run ()
#7 0x0807ede5 in main ()

And the other one:
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7bc456b in __read_nocancel () from /lib/tls/i686/cmov/libpthread.so.0
#2 0x080793e4 in ap_mpm_pod_check ()
#3 0x080773b3 in ap_graceful_stop_signalled ()
#4 0x0807751b in ap_graceful_stop_signalled ()
#5 0x0807816c in ap_mpm_run ()
#6 0x0807ede5 in main ()

Does this tell you anything?

Christian Boos

unread,

Oct 28, 2006, 3:50:40 PM10/28/06

to trac-...@googlegroups.com

Christian Billen wrote:
> Ok, well there's actually two apache process both spiking.
>
> The first one:
> #0 0xffffe410 in __kernel_vsyscall ()
> #1 0xb7bc456b in __read_nocancel () from /lib/tls/i686/cmov/libpthread.so.0
> #2 0x080793e4 in ap_mpm_pod_check ()
> #3 0x080773b3 in ap_graceful_stop_signalled ()
> #4 0x0807751b in ap_graceful_stop_signalled ()
> #5 0x080775e2 in ap_graceful_stop_signalled ()
> #6 0x08078094 in ap_mpm_run ()
> #7 0x0807ede5 in main ()
>
> And the other one:
> #0 0xffffe410 in __kernel_vsyscall ()
> #1 0xb7bc456b in __read_nocancel () from /lib/tls/i686/cmov/libpthread.so.0
> #2 0x080793e4 in ap_mpm_pod_check ()
> #3 0x080773b3 in ap_graceful_stop_signalled ()
> #4 0x0807751b in ap_graceful_stop_signalled ()
> #5 0x0807816c in ap_mpm_run ()
> #6 0x0807ede5 in main ()
>
> Does this tell you anything?
>

No, but some googling lead me to this:

http://www.forbiddenweb.org/viewtopic.php?id=25875

which seems to be quite close to what you have.

So following the hint given in that thread, you should perhaps
double-check that all the modules you are using are compiled in a
thread-safe way (esp. sqlite and clearsilver, check the PySqlite and
ClearSilver pages in Trac's wiki for guidance).

-- Christian

Reply all

Reply to author

Forward