[perl #15797] [PATCH] Regex speedup (resubmit)

0 views
Skip to first unread message

Angel Faus

unread,
Aug 6, 2002, 1:07:51 AM8/6/02
to perl6-i...@perl.org
Hi,

This is a resubmit of my previous regex patch, hopefully with the
right diff options.

I sent it through RT a few days ago but it didn't get to the mailing
list, so I am sending it again.

Best,

-àngel

patch_re.diff

Brent Dax

unread,
Aug 5, 2002, 6:24:47 PM8/5/02
to af...@corp.vlex.com, perl6-i...@perl.org
Angel Faus:
# This is a resubmit of my previous regex patch, hopefully with the
# right diff options.

Um. Yeah. You're gonna have to explain all this, cause I can't even
understand it and I'm the one who wrote the initial version. There
don't seem to be any changes to the documentation that help me figure it
out, either.

If my limited understanding of this is correct, it's a huge
re-architecting; as such, it needs justification and explanation.

--Brent Dax <bren...@cpan.org>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

He who fights and runs away wasted valuable running time with the
fighting.

Angel Faus

unread,
Aug 6, 2002, 2:40:12 AM8/6/02
to Brent Dax, perl6-i...@perl.org

Brent Dax wrote:
>
> If my limited understanding of this is correct, it's a huge
> re-architecting; as such, it needs justification and explanation.
>

Yep, you are right. (in both points, lack of documentation, and
explanation required)

It was a bit more explained on the original post, but nevermind it
requirese some more clarification.

What it does is:

* Breaks the original rx_info structure, and saves _all_ state in the
registers and in the per-interpreter intstack.

This means that many more registers are used, and it also means that
this registers will have to be saved when calling a subrule.

On the other hand, this makes the access more direct, and avoids
constructing and destroying the state PMC. This was costing a very
significant portion of the overhead against perl5.

* Implements a new set of ops that merge together frequently used
combinations.

For example

rx_search S0, I1, I0, "foo", FAIL

Implements the combined action of advancing over the string, and
searching for the initial literal "foo". This is quite faster than
the equivalent:

# I0 represents ->startindex, and I1 represents ->index
ADV: rx_advance S0, I0, FAIL
set I1, I0
rx_literal S0, I0, "foo", ADV

The saving comes from the increased code density, and the ability to
prevent some repeated calculations.

* Makes a quick cheat on string.c to inline the code to compute
string_index(..) when the string is in a native ascii-compatible
encoding.

And that's all.

I honestly think that this patch is a good starting point for getting
into the perl5 range of speeds. It is certainly less elegant than the
previous version, and ignores some features that the previous version
did implement (look-behind, for example). But nevermind, is as fast
or faster than perl5 in the regexes that have been completely
optimized.

The point about the lack of documentation remains, and i will solve it
as soon as i get an indication that the design is ok.

Best,

-angel
af...@corp.vlex.com

Dan Sugalski

unread,
Aug 5, 2002, 6:44:47 PM8/5/02
to af...@corp.vlex.com, Brent Dax, perl6-i...@perl.org
At 8:40 AM +0200 8/6/02, Angel Faus wrote:
>The point about the lack of documentation remains, and i will solve it
>as soon as i get an indication that the design is ok.

The design's fine, and we can add what we need from here. Send docs
and I'll put it in.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Angel Faus

unread,
Aug 6, 2002, 2:46:51 AM8/6/02
to Brent Dax, perl6-i...@perl.org
myself wrongly said:

> # I0 represents ->startindex, and I1 represents ->index
> ADV: rx_advance S0, I0, FAIL
> set I1, I0
> rx_literal S0, I0, "foo", ADV
>

It should actually be:

# I0 represents ->startindex, and I1 represents ->index
ADV: rx_advance S0, I0, FAIL
set I1, I0

rx_literal S0, I1, "foo", ADV
^^

Since I1 represents the index on the string.

-àngel

Dan Sugalski

unread,
Aug 17, 2002, 5:41:09 PM8/17/02
to af...@corp.vlex.com, Brent Dax, perl6-i...@perl.org
At 8:45 AM +0200 8/7/02, Angel Faus wrote:
>Dan Sugalski escribió:

>> At 8:40 AM +0200 8/6/02, Angel Faus wrote:
>> >The point about the lack of documentation remains, and i will
>> > solve it as soon as i get an indication that the design is ok.
>>
>> The design's fine, and we can add what we need from here. Send docs
>> and I'll put it in.
>
>The attached file contains the patch with tests and documentation
>included.

Has someone looked at and maybe committed this?

Simon Cozens

unread,
Aug 17, 2002, 6:38:19 PM8/17/02
to perl6-i...@perl.org
d...@sidhe.org (Dan Sugalski) writes:
> Has someone looked at and maybe committed this?

The reason I asked which pieces of Parrot were prototypes was because
optimizing the hell out of something that's only a prototype is nothing
short of intellectual masturbation, and it seems nobody actually learnt
anything from my YAPC presentation after all.

--
"He was a modest, good-humored boy. It was Oxford that made him insufferable."

Simon Cozens

unread,
Aug 24, 2002, 11:37:08 AM8/24/02
to perl6-i...@perl.org
ni...@unfortu.net (Nicholas Clark) writes:
> 2 (for Simon): Is there an online version of your talk? I can find
> a title "Parroting On - Lessons from the coal face." but I'm unable to
> find a summary or the contents.

http://ddtm.simon-cozens.org/~simon/coalface.html or
http://ddtm.simon-cozens.org/~simon/coalface.mgp if you speak MagicPoint

--
King's Law of Clues : Common sense is inversely proportional to the
academic intelligence of the person concerned.

Nicholas Clark

unread,
Aug 24, 2002, 11:29:42 AM8/24/02
to Simon Cozens, perl6-i...@perl.org
I think I deleted a message from this thread saying that effectively all
parts of parrot were subject to re-write if they turned out to be too slow.
Whilst that is a logical and sensible position, and means that you can't
say for certain whether any given current bit of parrot will survive the
same in the 1.0 release, so,

On Mon, Aug 19, 2002 at 09:08:24PM +0200, Angel Faus wrote:


> Sunday 18 August 2002 00:38, Simon Cozens wrote:
> > d...@sidhe.org (Dan Sugalski) writes:
> > > Has someone looked at and maybe committed this?
> >
> > The reason I asked which pieces of Parrot were prototypes was
> > because optimizing the hell out of something that's only a
> > prototype is nothing short of intellectual masturbation, and it
> > seems nobody actually learnt anything from my YAPC presentation
> > after all.
>

> One possible reason would be to learn if a speed problem is due to the
> design or the implementation. Sometimes you only know if something
> can be fast enough until you try it to optimize the hell out of it.
>
> Could you share with the ones of us who where not in YAPC what was
> this presentation about?


1 (for Dan): What sections of parrot are definitely prototypes, and due to
be thrown out at some point?

2 (for Simon): Is there an online version of your talk? I can find
a title "Parroting On - Lessons from the coal face." but I'm unable to
find a summary or the contents.

Nicholas Clark
--
Even better than the real thing: http://nms-cgi.sourceforge.net/

Dan Sugalski

unread,
Aug 24, 2002, 1:36:38 PM8/24/02
to Nicholas Clark, Simon Cozens, perl6-i...@perl.org
At 4:29 PM +0100 8/24/02, Nicholas Clark wrote:
>I think I deleted a message from this thread saying that effectively all
>parts of parrot were subject to re-write if they turned out to be too slow.
>Whilst that is a logical and sensible position, and means that you can't
>say for certain whether any given current bit of parrot will survive the
>same in the 1.0 release, so,
>
>1 (for Dan): What sections of parrot are definitely prototypes, and due to
> be thrown out at some point?

Well, the GC system (though that's being redone now), the JIT (which
is also being redone now), the IO system, exceptions (Which really
need a first draft, let alone a rewrite) the parser (which also needs
a first draft), the regex engine (though that's being worked on), the
compiler....

The opcode design itself is pretty much final. The JIT and
GC/resource system will likely be essentially final when the current
abuse to it's done. The rest we'll see about as we keep going.

Reply all
Reply to author
Forward
0 new messages