Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Semantics for regexes
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Steve Fink  
View profile  
 More options Sep 1 2004, 11:24 pm
Newsgroups: perl.perl6.internals
From: st...@fink.com (Steve Fink)
Date: Wed, 1 Sep 2004 20:24:32 -0700
Local: Wed, Sep 1 2004 11:24 pm
Subject: Re: Semantics for regexes
On Sep-01, Dan Sugalski wrote:

> This is a list of the semantics that I see as needed for a regex
> engine. When we have 'em, we'll map them to string ops, and may well
> add in some special-case code for faster access.

> *) extract substring
> *) exact string compare
> *) find string in string
> *) find first character of class X in string
> *) find first character not of class X in string
> *) find boundary between X and not-X
> *) Find boundary defined by arbitrary code (mainly for word breaks)

Huh? What do you mean by "semantics"? The only semantics needed are the
minimum necessary to answer the question "is the fred at offset i equal
to the fred X?" (Sorry, not sure if fred is actually character or
codepoint or whatever, and is probably all of them at different levels.)

We also almost certainly need to be able to do character class
comparisons, although if you assume that you can always transcode to
what the regex was compiled with, then you don't even need that --
instead, you need to be able to convert to something like a difference
list of numbered freds. But if we're talking about semantics, then yes
you need the character class manipulation.

Everything else in this list sounds like optimizations to me, and
probably not the right optimizations (I don't think it's possible to
predict what will be useful yet.)

For other things that parrot will be used for, I suspect that the first
3 will be needed.

I'm curious as to how you came up with that list; it seems to imply a
particular way of implementing the grammar engine. I would expect all of
that, barring certain optimizations, to be done directly with existing
pasm instructions.

There will be a need for saving a stack of former values of hypothetical
variables, which can also be done with pasm ops but might interact with
overloaded assignment or something wacky like that.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.