Stopped sync in 0.11.2.1

14 views
Skip to first unread message

anatoly techtonik

unread,
Feb 7, 2009, 9:03:03 PM2/7/09
to trac...@googlegroups.com
Hello,

Any ideas why Trac may stop syncing cache with repository after
upgrading to 0.11.2.1?
The log is full of strange messages about insufficient privileges for
creating pages.

2009-02-07 17:55:11,224 Trac[__init__] DEBUG: Dispatching <Request
"GET u'/timeline'">
2009-02-07 17:55:11,402 Trac[__init__] DEBUG: Subversion bindings imported
2009-02-07 17:55:11,417 Trac[__init__] INFO: repos rev [2557] !=
cached rev [2414]
2009-02-07 17:55:11,424 Trac[__init__] DEBUG: Retrieving session for
ID '4b0a7707907d96a01f8ff7e0'
2009-02-07 17:55:11,803 Trac[__init__] DEBUG: Prepare chrome data for request
2009-02-07 17:55:11,805 Trac[__init__] DEBUG: No policy allowed
anonymous performing TRAC_ADMIN on None
2009-02-07 17:55:11,806 Trac[__init__] DEBUG: No policy allowed
anonymous performing PERMISSION_GRANT on None
2009-02-07 17:55:11,806 Trac[__init__] DEBUG: No policy allowed
anonymous performing PERMISSION_REVOKE on None
2009-02-07 17:55:11,336 Trac[__init__] DEBUG: No policy allowed
anonymous performing EMAIL_VIEW on None
2009-02-07 17:55:12,509 Trac[__init__] DEBUG: Updating wiki page index
2009-02-07 17:55:12,511 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:FreeFindData'>
2009-02-07 17:55:12,013 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:KeyBarLabels'>
2009-02-07 17:55:12,746 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:GoToFile'>
2009-02-07 17:55:12,266 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:GlobalLock'>
2009-02-07 17:55:12,308 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:FileFilter'>
2009-02-07 17:55:12,899 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:BreakKeys'>
2009-02-07 17:55:12,933 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:BoxSymbols'>
2009-02-07 17:55:12,980 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:OutDir'>
2009-02-07 17:55:13,580 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource
u'wiki:JumpToSelectedFile'>
2009-02-07 17:55:13,179 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:MacFly'>
2009-02-07 17:55:13,186 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:CodeBlocks'>
2009-02-07 17:55:13,737 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:UnicodeString'>
2009-02-07 17:55:13,952 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource
u'wiki:PasteFromClipboard'>
2009-02-07 17:55:13,952 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:DeleteBuffer'>
2009-02-07 17:55:13,486 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource
u'wiki:CtrlIns/AltShiftIns'>
2009-02-07 17:55:13,526 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:AskOverwrite'>
2009-02-07 17:55:13,570 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource
u'wiki:PreRedrawFunction'>
2009-02-07 17:55:13,576 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource
u'wiki:GetOpenPluginInfo'>
2009-02-07 17:55:13,577 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:ClosePlugin'>
2009-02-07 17:55:14,239 Trac[__init__] DEBUG: No policy allowed
anonymous performing WIKI_CREATE on <Resource u'wiki:GetPluginInfo'>
2009-02-07 17:55:14,958 Trac[__init__] DEBUG: 4361 unreachable objects found.
2009-02-07 17:55:15,705 Trac[__init__] DEBUG: Dispatching <Request
"GET u'/chrome/common/css/trac.css'">
2009-02-07 17:55:15,808 Trac[__init__] DEBUG: 1640 unreachable objects found.
2009-02-07 17:55:15,851 Trac[__init__] DEBUG: Dispatching <Request
"GET u'/chrome/common/css/timeline.css'">
2009-02-07 17:55:15,877 Trac[__init__] DEBUG: 18 unreachable objects found.
...

--
--anatoly t.

Christian Boos

unread,
Feb 8, 2009, 6:32:54 AM2/8/09
to trac...@googlegroups.com
Hello Anatoly,

anatoly techtonik wrote:
> Hello,
>
> Any ideas why Trac may stop syncing cache with repository after
> upgrading to 0.11.2.1?
> The log is full of strange messages about insufficient privileges for
> creating pages.
>
> 2009-02-07 17:55:11,224 Trac[__init__] DEBUG: Dispatching <Request
> "GET u'/timeline'">
> 2009-02-07 17:55:11,402 Trac[__init__] DEBUG: Subversion bindings imported
> 2009-02-07 17:55:11,417 Trac[__init__] INFO: repos rev [2557] !=
> cached rev [2414]
>

I've already seen that once on someone's machine (hi John ;-) ), without
having a clue about it.
So it would be very helpful if you could spend some time debugging this
in depth.
By inserting enough debug statements in trac/versioncontrol/cache.py,
you should be able to get a clue about what's going on (e.g. missing
read permission on the svn repo?).

> 2009-02-07 17:55:11,424 Trac[__init__] DEBUG: Retrieving session for
> ID '4b0a7707907d96a01f8ff7e0'
> 2009-02-07 17:55:11,803 Trac[__init__] DEBUG: Prepare chrome data for request
> 2009-02-07 17:55:11,805 Trac[__init__] DEBUG: No policy allowed
> anonymous performing TRAC_ADMIN on None
> 2009-02-07 17:55:11,806 Trac[__init__] DEBUG: No policy allowed
> anonymous performing PERMISSION_GRANT on None
> 2009-02-07 17:55:11,806 Trac[__init__] DEBUG: No policy allowed
> anonymous performing PERMISSION_REVOKE on None
> 2009-02-07 17:55:11,336 Trac[__init__] DEBUG: No policy allowed
> anonymous performing EMAIL_VIEW on None
> 2009-02-07 17:55:12,509 Trac[__init__] DEBUG: Updating wiki page index
> 2009-02-07 17:55:12,511 Trac[__init__] DEBUG: No policy allowed
> anonymous performing WIKI_CREATE on <Resource u'wiki:FreeFindData'>
> 2009-02-07 17:55:12,013 Trac[__init__] DEBUG: No policy allowed
> anonymous performing WIKI_CREATE on <Resource u'wiki:KeyBarLabels'>
> 2009-02-07 17:55:12,746 Trac[__init__] DEBUG: No policy allowed
> anonymous performing WIKI_CREATE on <Resource u'wiki:GoToFile'>
> 2009-02-07 17:55:12,266 Trac[__init__] DEBUG: No policy allowed
> anonymous performing WIKI_CREATE on <Resource u'wiki:GlobalLock'>
> 2009-02-07 17:55:12,308 Trac[__init__] DEBUG: No policy allowed

> <...>

That's puzzling as well. Do you have some random page generation plugin
installed? :-)
It might be a bot trying to visit links to non-existing Wiki pages, try
to set `[wiki] ignore_missing_pages = true` in your trac.ini file...

-- Christian

anatoly techtonik

unread,
Feb 8, 2009, 9:22:10 AM2/8/09
to trac...@googlegroups.com
On Sun, Feb 8, 2009 at 1:32 PM, Christian Boos <cb...@neuf.fr> wrote:
>>
>> Any ideas why Trac may stop syncing cache with repository after
>> upgrading to 0.11.2.1?
>> The log is full of strange messages about insufficient privileges for
>> creating pages.
>>
>> 2009-02-07 17:55:11,224 Trac[__init__] DEBUG: Dispatching <Request
>> "GET u'/timeline'">
>> 2009-02-07 17:55:11,402 Trac[__init__] DEBUG: Subversion bindings imported
>> 2009-02-07 17:55:11,417 Trac[__init__] INFO: repos rev [2557] !=
>> cached rev [2414]
>>
>
> I've already seen that once on someone's machine (hi John ;-) ), without
> having a clue about it.
> So it would be very helpful if you could spend some time debugging this
> in depth.
> By inserting enough debug statements in trac/versioncontrol/cache.py,
> you should be able to get a clue about what's going on (e.g. missing
> read permission on the svn repo?).

It turned out to be locked cache after timeout. At first I thought
about a race condition, but it was false. The cache can easily be
jammed with a long resync from repository on a busy server. It took
about 40 seconds to sync 200 revisions before timeout occurred. Take a
cache.py sync()
http://trac.edgewall.org/browser/branches/0.11-stable/trac/versioncontrol/cache.py#L158

On line 159 we check if a revision is being entered in DB by looking
if next revision number is already present in `revisions` table. That
means another thread is made an INSERT in line 184. The cache is now
locked. To unlock it, another thread should update youngest revision
at line 221.

So, the problem is when a timeout occurs during execution of 202:221 -
there is an update in `node_change` table that can potentially take a
long time to experience a timeout (esp. when server is under a heavy
load). There are also cset.get_changes() and
self.repos.next_rev(next_youngest) calls that can contribute to the
issue. I haven't found other way to recover than to manually patch DB.

DELETE FROM `node_change` WHERE `rev` > 2260;
DELETE FROM `revision` WHERE `rev` > 2260;
UPDATE `system` SET `value` = 2260 WHERE `name` = 'youngest_rev' LIMIT 1;

Seems like the only way to avoid locking is to introduce timeout on
184:221 cache operations. Like additional `sync_time_started` = None
or something else as an explicit flag.

I thought about a bot too, because page names are different every
time. These must be references from Source Browser, because Timeline
references contain <a> tags without href attributes. Perhaps these
were added from another thread while I was debugging my Timeline view.
It would be more useful to see pid or thread id displayed in log
instead of Trac[__init__] string. Does this Trac[__init__] change at
all?

--
--anatoly t.

Christian Boos

unread,
Feb 8, 2009, 2:23:04 PM2/8/09
to trac...@googlegroups.com
Correct.

> So, the problem is when a timeout occurs during execution of 202:221 -
> there is an update in `node_change` table that can potentially take a
> long time to experience a timeout (esp. when server is under a heavy
> load). There are also cset.get_changes() and
> self.repos.next_rev(next_youngest) calls that can contribute to the
> issue.

I don't really see how a time out can happen here. Do you mean a
TimeoutError exception?
That shouldn't happen, as the "db" variable already refers to a
connection, and TimeoutError exceptions are only raised when trying to
get a connection from the cache.

But you're nevertheless right, if anything wrong happens there (unicode
exception, db level timeout?), then there's no rollback done here.
That exception ends up being ignored at a later point (converted to a
warning) and if this is followed by a commit (e.g. session save), then
the cache remains locked.

> I haven't found other way to recover than to manually patch DB.
>
> DELETE FROM `node_change` WHERE `rev` > 2260;
> DELETE FROM `revision` WHERE `rev` > 2260;
> UPDATE `system` SET `value` = 2260 WHERE `name` = 'youngest_rev' LIMIT 1;
>
> Seems like the only way to avoid locking is to introduce timeout on
> 184:221 cache operations. Like additional `sync_time_started` = None
> or something else as an explicit flag.
>

Rather simply doing a rollback in all cases when something goes.
I'll work on a patch.

Have a look at http://trac.edgewall.org/wiki/TracLogging#LogFormat and the
"The format is a string which can contain any of the Python logging
Formatter variables. " part.
You can indeed add the thread id there.

Thanks for you in-depth analysis of the problem, this might solve one of
the "weird" issues we still carry on since Trac 0.10 (e.g. #5440).

-- Christian

anatoly techtonik

unread,
Feb 8, 2009, 4:51:47 PM2/8/09
to trac...@googlegroups.com
On Sun, Feb 8, 2009 at 9:23 PM, Christian Boos <cb...@neuf.fr> wrote:
>>
>> It turned out to be locked cache after timeout. At first I thought
>> about a race condition, but it was false. The cache can easily be
>> jammed with a long resync from repository on a busy server. It took
>> about 40 seconds to sync 200 revisions before timeout occurred. Take a
>> cache.py sync()
>> http://trac.edgewall.org/browser/branches/0.11-stable/trac/versioncontrol/cache.py#L158
>>
>> On line 159 we check if a revision is being entered in DB by looking
>> if next revision number is already present in `revisions` table. That
>> means another thread is made an INSERT in line 184. The cache is now
>> locked. To unlock it, another thread should update youngest revision
>> at line 221.
>>
> Correct.
>> So, the problem is when a timeout occurs during execution of 202:221 -
>> there is an update in `node_change` table that can potentially take a
>> long time to experience a timeout (esp. when server is under a heavy
>> load). There are also cset.get_changes() and
>> self.repos.next_rev(next_youngest) calls that can contribute to the
>> issue.
>
> I don't really see how a time out can happen here. Do you mean a
> TimeoutError exception?
> That shouldn't happen, as the "db" variable already refers to a
> connection, and TimeoutError exceptions are only raised when trying to
> get a connection from the cache.

I am running Trac via fastcgi on Apache. I think server kills the
process in the middle of operation. If a user dropped connection I
can't see why would Apache should allow process to run. Should there
be an exception? There must be timeouts for scripts as well. Are those
scripts notified before the kill? Log doesn't show that.

> But you're nevertheless right, if anything wrong happens there (unicode
> exception, db level timeout?), then there's no rollback done here.
> That exception ends up being ignored at a later point (converted to a
> warning) and if this is followed by a commit (e.g. session save), then
> the cache remains locked.

I completely ignored the fact that there should be a commit() for the
lock to occur. That makes things complicated. Here is the log content
for the lock time. The message "Resync from other thread in progress"
is inserted in "for" block after line 158. No warnings are displayed.

2009-02-08 03:27:48,106 Trac[__init__] DEBUG: Caching node change in
[2391]: (u'trunk/unicode_far/vbuild.m4', 'file', 'edit',
u'trunk/unicode_far/vbuild.m4',
2389)
2009-02-08 03:27:48,107 Trac[__init__] DEBUG: Caching node change in
[2391]: (u'trunk/unicode_far/wrap.cpp', 'file', 'edit',
u'trunk/unicode_far/wrap.cpp', 2
387)
2009-02-08 03:27:48,181 Trac[__init__] DEBUG: Dispatching <Request
"GET u'/chrome/site/fartrac.ico'">
2009-02-08 03:27:48,191 Trac[__init__] INFO: Trying to sync revision [2392]
2009-02-08 03:27:48,342 Trac[__init__] DEBUG: 18 unreachable objects found.
2009-02-08 03:27:50,422 Trac[__init__] DEBUG: Dispatching <Request
"GET u'/log/trunk/enc/enc_rus/styles'">
2009-02-08 03:27:50,430 Trac[__init__] INFO: repos rev [2559] !=
cached rev [2391]
2009-02-08 03:27:50,434 Trac[__init__] DEBUG: resync from other thread
in progress
2
2009-02-08 03:27:50,452 Trac[__init__] DEBUG: Prepare chrome data for request
2009-02-08 03:27:50,453 Trac[__init__] DEBUG: No policy allowed


anonymous performing TRAC_ADMIN on None

2009-02-08 03:27:50,453 Trac[__init__] DEBUG: No policy allowed


anonymous performing PERMISSION_GRANT on None

2009-02-08 03:27:50,453 Trac[__init__] DEBUG: No policy allowed


anonymous performing PERMISSION_REVOKE on None

2009-02-08 03:27:50,559 Trac[__init__] DEBUG: No policy allowed


anonymous performing EMAIL_VIEW on None

2009-02-08 03:27:50,918 Trac[__init__] DEBUG: 330 unreachable objects found.
2009-02-08 03:27:51,525 Trac[__init__] DEBUG: Dispatching <Request
"GET u'/browser/trunk/unicode_far'">
2009-02-08 03:27:51,568 Trac[__init__] INFO: repos rev [2559] !=
cached rev [2391]
2009-02-08 03:27:51,572 Trac[__init__] DEBUG: resync from other thread
in progress
2009-02-08 03:27:51,577 Trac[__init__] DEBUG: Prepare chrome data for request
2009-02-08 03:27:51,578 Trac[__init__] DEBUG: No policy allowed


anonymous performing TRAC_ADMIN on None
2

How did it happen that youngest revision was not updated in system
table if a commit that inserted rev number into revision table was
successful? Maybe transaction mechanism doesn't work for MySQL
backend?

> Rather simply doing a rollback in all cases when something goes.
> I'll work on a patch.
>

That would be great.

>> It would be more useful to see pid or thread id displayed in log
>> instead of Trac[__init__] string. Does this Trac[__init__] change at
>> all?
>
> Have a look at http://trac.edgewall.org/wiki/TracLogging#LogFormat and the
> "The format is a string which can contain any of the Python logging
> Formatter variables. " part.
> You can indeed add the thread id there.

I'll follow your advice. Thanks. I wonder why by default the setting
is so strange?

> Thanks for you in-depth analysis of the problem, this might solve one of
> the "weird" issues we still carry on since Trac 0.10 (e.g. #5440).

Hope this helps. BTW, what do these log messages about unreachable objects mean?

--
--anatoly t.

Christian Boos

unread,
Feb 24, 2009, 5:15:34 PM2/24/09
to trac...@googlegroups.com


See follow-up on http://trac.edgewall.org/ticket/8067
Still a bit mysterious to me.

>>> It would be more useful to see pid or thread id displayed in log
>>> instead of Trac[__init__] string. Does this Trac[__init__] change at
>>> all?
>>>
>> Have a look at http://trac.edgewall.org/wiki/TracLogging#LogFormat and the
>> "The format is a string which can contain any of the Python logging
>> Formatter variables. " part.
>> You can indeed add the thread id there.
>>
>
> I'll follow your advice. Thanks. I wonder why by default the setting
> is so strange?
>
>
>> Thanks for you in-depth analysis of the problem, this might solve one of
>> the "weird" issues we still carry on since Trac 0.10 (e.g. #5440).
>>
>
> Hope this helps. BTW, what do these log messages about unreachable objects mean?
>

Nothing, debugging noise ;-)

-- Christian

anatoly techtonik

unread,
Feb 25, 2009, 8:15:08 AM2/25/09
to trac...@googlegroups.com
On Wed, Feb 25, 2009 at 12:15 AM, Christian Boos <cb...@neuf.fr> wrote:
>
>> How did it happen that youngest revision was not updated in system
>> table if a commit that inserted rev number into revision table was
>> successful? Maybe transaction mechanism doesn't work for MySQL
>> backend?
>
> See follow-up on http://trac.edgewall.org/ticket/8067
> Still a bit mysterious to me.

I believe that the cause was in using MyISAM type of tables that
doesn't support multiple operation transactions. I've added a reply
with explanations to the ticket.

--
--anatoly t.

Reply all
Reply to author
Forward
0 new messages