Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Regular expressions, compilation & tcl 8.6...

12 views
Skip to first unread message

Georgios Petasis

unread,
Nov 11, 2009, 7:21:13 PM11/11/09
to
Hi all,

Has anything changed noticeably in the cache Tcl uses for caching
regular expressions?

I tried to verify a large set (thousands) of small & simple regular
expressions, with "regexp $one {}" and Tcl resulted in occupying 800MB
or RAM. Is this expected?

(I remembered that the last N expression compilations were kept in
memory...)

George

set patterns [<return a list of regexp patterns>]
## Ensure all patterns are valid!
foreach one $patterns {
if {[catch {regexp $one {}} error]} {
error "Invalid pattern: $one\n$error"
}
}

Alexandre Ferrieux

unread,
Nov 11, 2009, 8:13:05 PM11/11/09
to
On Nov 12, 1:21 am, Georgios Petasis <peta...@iit.demokritos.gr>
wrote:

I don't know of a specific cache in the RE engine; however there's the
Tcl_Obj internal rep which plays the same role. Since the compiled
automaton is then attached to the pattern value, it "sticks" to each
element of $patterns, which survives well over the lifecycle of the
loop variable. To verify this theory, you can try two things:

(1) unset $patterns and see the memory consumption drop
(2) or, defeat the Tcl_Obj caching:

if {[catch {regexp [string range $one 0 end] {}} error]} {

and see no more memory consumption than the list's storage.

-Alex

Donal K. Fellows

unread,
Nov 12, 2009, 4:36:34 AM11/12/09
to
On 12 Nov, 00:21, Georgios Petasis <peta...@iit.demokritos.gr> wrote:
> Has anything changed noticeably in the cache Tcl uses for caching
> regular expressions?

Not for many years.

Tcl has two caches for compiled REs. There is a per-thread cache that
is indexed by the literal string form of the RE (I believe that holds
the last 20 compiled REs, but could be wrong) and compiled REs are
also cached in the internal representation of the values. If you put
all the REs in global variables (or use literal REs) and just use them
by reference then it will be those internal representation caches
which are used.

> I tried to verify a large set (thousands) of small & simple regular
> expressions, with "regexp $one {}" and Tcl resulted in occupying 800MB
> or RAM. Is this expected?

Yes, that will trigger the building of all those internal
representations. Mostly that's a good strategy, but you've found the
case where it isn't. Congratulations.

Donal.

Georgios Petasis

unread,
Nov 12, 2009, 7:01:06 AM11/12/09
to Alexandre Ferrieux
O/H Alexandre Ferrieux έγραψε:

Dear Alex,

Indeed this solves the problem (i.e. wish stabilises ~30MB no matter how
many times I run the loop). But I cannot understand why.
There is a small cache per thread (as Donal also remembers), but the
regular expressions are stored in the same variable. Since the string of
the variable "one" changes, shouldn't the compiled regexp also be discarded?
Maybe there is a leak somewhere?

George

Georgios Petasis

unread,
Nov 12, 2009, 7:03:39 AM11/12/09
to Donal K. Fellows
O/H Donal K. Fellows έγραψε:

Dear Donal,

I am a little puzzled by this. Since the string of the variable changes
with each loop to a new pattern, why is the old compilation of the
regular expression kept?

George

Georgios Petasis

unread,
Nov 12, 2009, 7:05:48 AM11/12/09
to Alexandre Ferrieux
O/H Georgios Petasis έγραψε:

How stupid of me :D
Of course there in no leak, the regular expression objects are also
referenced by the list object (they are the list elements!)...

George

Georgios Petasis

unread,
Nov 12, 2009, 7:07:05 AM11/12/09
to Donal K. Fellows
O/H Georgios Petasis έγραψε:

Dear Donal,

Forget the last e-mail. I understood that the patterns are also ref
counted by the list object that holds all the patterns...
So, their internal representation (the compiled regexp) is cached as it
is indexed...

George

Alexandre Ferrieux

unread,
Nov 12, 2009, 7:43:40 AM11/12/09
to
On Nov 12, 1:05 pm, Georgios Petasis <peta...@iit.demokritos.gr>
wrote:

No it's my fault. I miserably failed to convey that meaning with


"element of $patterns, which survives well over the lifecycle of the

loop variable"...

-Alex

0 new messages