On Tue, Jun 22, 2010 at 10:00 PM, andrew cooke <and...@acooke.org> wrote:
> I've just released the first (very much alpha) version of RXPY - a > regular expression library for Python. http://www.acooke.org/rxpy/
> This will eventually be used by Lepl (and will simplify the line aware > code, amongst other things).
> You can ignore it for now; I just wanted to show why development of > Lepl itself has slowed a little...
> Andrew
> -- > You received this message because you are subscribed to the Google Groups "lepl" group. > To post to this group, send email to lepl@googlegroups.com. > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
Oh, I didn't know about that (and I'm sorry that Lepl seems to have discouraged further development!). From the comments there it sounds very similar. My initial motivation was exactly the same - to process arbitrary sequences. It's silly that "re" doesn't (but understandable, I guess, since it's written in C).
Incidentally, RXPY will match arbitrary objects and do backtracking. For example, these tests - http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/direct/_tes... - show it matching a list of integers (not very exciting, I know, but you need to define a new "alphabet" for teh objects, and for initial testing an alphabet for 0-9 was easiest).
What made things easier was that (1) LEPL had something similar, so this is the "second time round", which is a big help and (2) it turns out that the Python re package has an amazing set of tests, which really helped make sure it worked. The hardest part this second time round was just getting all the details right (the re package has a lot of little features I had never used befire and RXPY reproduces them all).
Anyway, I should read more about re0 - I guess there are some good ideas I haven't thought of that I should learn...
On Wed, Jun 23, 2010 at 04:56:35PM -0400, Jasper St. Pierre wrote: > My friend had attempted to do something like this, he had some interesting > ideas, such as applying matching to other objects:
> Unfortunately, the hardest part of this sort of thing is backtracking.
> On Tue, Jun 22, 2010 at 10:00 PM, andrew cooke <and...@acooke.org> wrote:
> > I've just released the first (very much alpha) version of RXPY - a > > regular expression library for Python. http://www.acooke.org/rxpy/
> > This will eventually be used by Lepl (and will simplify the line aware > > code, amongst other things).
> > You can ignore it for now; I just wanted to show why development of > > Lepl itself has slowed a little...
> > Andrew
> > -- > > You received this message because you are subscribed to the Google Groups "lepl" group. > > To post to this group, send email to lepl@googlegroups.com. > > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com. > > For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
> -- > You received this message because you are subscribed to the Google Groups "lepl" group. > To post to this group, send email to lepl@googlegroups.com. > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
Ah, his approach and your approach are a bit different.
Instead of parsing an expression as a string with an alphabet, he took the longer route of wrapping the object and calling various methods like so: Atom(23).star()
On Wed, Jun 23, 2010 at 5:22 PM, andrew cooke <and...@acooke.org> wrote:
> Oh, I didn't know about that (and I'm sorry that Lepl seems to have > discouraged further development!). From the comments there it sounds very > similar. My initial motivation was exactly the same - to process arbitrary > sequences. It's silly that "re" doesn't (but understandable, I guess, since > it's written in C).
> Incidentally, RXPY will match arbitrary objects and do backtracking. For > example, these tests - > http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/direct/_tes... > - show it matching a list of integers (not very exciting, I know, but you need > to define a new "alphabet" for teh objects, and for initial testing an > alphabet for 0-9 was easiest).
> What made things easier was that (1) LEPL had something similar, so this is > the "second time round", which is a big help and (2) it turns out that the > Python re package has an amazing set of tests, which really helped make sure > it worked. The hardest part this second time round was just getting all the > details right (the re package has a lot of little features I had never used > befire and RXPY reproduces them all).
> Anyway, I should read more about re0 - I guess there are some good ideas I > haven't thought of that I should learn...
> Thanks, > Andrew
> On Wed, Jun 23, 2010 at 04:56:35PM -0400, Jasper St. Pierre wrote: >> My friend had attempted to do something like this, he had some interesting >> ideas, such as applying matching to other objects:
>> Unfortunately, the hardest part of this sort of thing is backtracking.
>> On Tue, Jun 22, 2010 at 10:00 PM, andrew cooke <and...@acooke.org> wrote:
>> > I've just released the first (very much alpha) version of RXPY - a >> > regular expression library for Python. http://www.acooke.org/rxpy/
>> > This will eventually be used by Lepl (and will simplify the line aware >> > code, amongst other things).
>> > You can ignore it for now; I just wanted to show why development of >> > Lepl itself has slowed a little...
>> > Andrew
>> > -- >> > You received this message because you are subscribed to the Google Groups "lepl" group. >> > To post to this group, send email to lepl@googlegroups.com. >> > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com. >> > For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
>> -- >> You received this message because you are subscribed to the Google Groups "lepl" group. >> To post to this group, send email to lepl@googlegroups.com. >> To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com. >> For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
> -- > You received this message because you are subscribed to the Google Groups "lepl" group. > To post to this group, send email to lepl@googlegroups.com. > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
I am the friend that wrote re0. Truthfully re0 doesn't do anything
above and beyond what is described in the (great) Russ Cox article at
http://swtch.com/~rsc/regexp/regexp1.html , so if you want to learn
about it, there's the place to go. I did try to keep the Python source
readable and idiomatic, though.
re0 is completed to the level described in his first article, but no
further; this makes it somewhat of an experiment. Regular expressions
aren't useful without submatch extraction. Cox more recently wrote
http://swtch.com/~rsc/regexp/regexp2.html , which offers a revised
algorithm (previously described in the form of C source code), and
http://swtch.com/~rsc/regexp/regexp3.html . re0 was written before
these two articles, and is somewhat lacking.
I'm not sure to what degree you want to integrate NFA search into
RXPY, although I do see that it's on the page as a desirable goal. The
mismatch in features/performance of backtracking and automaton search
might present interesting architectural challenges/goals. An engine
that defaults to a relatively full-featured automaton algorithm and
falls back to backtracking search is presented by Cox as ideal.
Also, I think the PyPy project may be interested in later versions of
RXPY; provided it's fast and fully compatible with re (or can be
wrapped to be compatible, e.g. with a dummy module), it could easily
replace their current re module. It might be worth chatting to them
about this.
I'll be looking at RXPY in detail tonight; I'd be very interested in
contributing.
Devin Jeanpierre
On Jun 23, 5:31 pm, "Jasper St. Pierre" <jstpie...@mecheye.net> wrote:
> Ah, his approach and your approach are a bit different.
> Instead of parsing an expression as a string with an alphabet, he took
> the longer route of wrapping the object and calling various methods
> like so: Atom(23).star()
> On Wed, Jun 23, 2010 at 5:22 PM, andrew cooke <and...@acooke.org> wrote:
> > Oh, I didn't know about that (and I'm sorry that Lepl seems to have
> > discouraged further development!). From the comments there it sounds very
> > similar. My initial motivation was exactly the same - to process arbitrary
> > sequences. It's silly that "re" doesn't (but understandable, I guess, since
> > it's written in C).
> > Incidentally, RXPY will match arbitrary objects and do backtracking. For
> > example, these tests -
> >http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/direct/_tes...
> > - show it matching a list of integers (not very exciting, I know, but you need
> > to define a new "alphabet" for teh objects, and for initial testing an
> > alphabet for 0-9 was easiest).
> > What made things easier was that (1) LEPL had something similar, so this is
> > the "second time round", which is a big help and (2) it turns out that the
> > Python re package has an amazing set of tests, which really helped make sure
> > it worked. The hardest part this second time round was just getting all the
> > details right (the re package has a lot of little features I had never used
> > befire and RXPY reproduces them all).
> > Anyway, I should read more about re0 - I guess there are some good ideas I
> > haven't thought of that I should learn...
> > Thanks,
> > Andrew
> > On Wed, Jun 23, 2010 at 04:56:35PM -0400, Jasper St. Pierre wrote:
> >> My friend had attempted to do something like this, he had some interesting
> >> ideas, such as applying matching to other objects:
> >> Unfortunately, the hardest part of this sort of thing is backtracking.
> >> On Tue, Jun 22, 2010 at 10:00 PM, andrew cooke <and...@acooke.org> wrote:
> >> > I've just released the first (very much alpha) version of RXPY - a
> >> > regular expression library for Python. http://www.acooke.org/rxpy/
> >> > This will eventually be used by Lepl (and will simplify the line aware
> >> > code, amongst other things).
> >> > You can ignore it for now; I just wanted to show why development of
> >> > Lepl itself has slowed a little...
> >> > Andrew
> >> > --
> >> > You received this message because you are subscribed to the Google Groups "lepl" group.
> >> > To post to this group, send email to lepl@googlegroups.com.
> >> > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> >> > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.
> >> --
> >> You received this message because you are subscribed to the Google Groups "lepl" group.
> >> To post to this group, send email to lepl@googlegroups.com.
> >> To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> >> For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.
> > --
> > You received this message because you are subscribed to the Google Groups "lepl" group.
> > To post to this group, send email to lepl@googlegroups.com.
> > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.
Ah! The "final" Russ Cox article is my main target :o)
I'm working on documentation at the moment (particularly code comments
for main classes), so things should be easier to understand over the
next few days. But the hope is that if you (or anyone else!) wants to
work on an engine (to implement, say, one of the Russ Cox approaches)
then you should be able to add that without needing to write a new
parser or all the extra little methods that are neccessary to get full
compliance with the re API.
Basically, you should only need to write the very core - something
that takes a graph of opcodes and an input string, and returns a
match.
However, I'm not quite there yet. In particular, there's only one
engine (a simple direct implementation, that does backtracking with a
stack, doesn't use a state machine, etc etc) the support for the re
API and the engine are still mixed together.
Andrew
On Jun 23, 11:56 pm, Devin Jeanpierre <jeanpierr...@gmail.com> wrote:
> I am the friend that wrote re0. Truthfully re0 doesn't do anything
> above and beyond what is described in the (great) Russ Cox article athttp://swtch.com/~rsc/regexp/regexp1.html, so if you want to learn
> about it, there's the place to go. I did try to keep the Python source
> readable and idiomatic, though.
> re0 is completed to the level described in his first article, but no
> further; this makes it somewhat of an experiment. Regular expressions
> aren't useful without submatch extraction. Cox more recently wrotehttp://swtch.com/~rsc/regexp/regexp2.html, which offers a revised
> algorithm (previously described in the form of C source code), andhttp://swtch.com/~rsc/regexp/regexp3.html. re0 was written before
> these two articles, and is somewhat lacking.
> I'm not sure to what degree you want to integrate NFA search into
> RXPY, although I do see that it's on the page as a desirable goal. The
> mismatch in features/performance of backtracking and automaton search
> might present interesting architectural challenges/goals. An engine
> that defaults to a relatively full-featured automaton algorithm and
> falls back to backtracking search is presented by Cox as ideal.
> Also, I think the PyPy project may be interested in later versions of
> RXPY; provided it's fast and fully compatible with re (or can be
> wrapped to be compatible, e.g. with a dummy module), it could easily
> replace their current re module. It might be worth chatting to them
> about this.
> I'll be looking at RXPY in detail tonight; I'd be very interested in
> contributing.
> Devin Jeanpierre
> On Jun 23, 5:31 pm, "Jasper St. Pierre" <jstpie...@mecheye.net> wrote:
> > Ah, his approach and your approach are a bit different.
> > Instead of parsing an expression as a string with an alphabet, he took
> > the longer route of wrapping the object and calling various methods
> > like so: Atom(23).star()
> > On Wed, Jun 23, 2010 at 5:22 PM, andrew cooke <and...@acooke.org> wrote:
> > > Oh, I didn't know about that (and I'm sorry that Lepl seems to have
> > > discouraged further development!). From the comments there it sounds very
> > > similar. My initial motivation was exactly the same - to process arbitrary
> > > sequences. It's silly that "re" doesn't (but understandable, I guess, since
> > > it's written in C).
> > > Incidentally, RXPY will match arbitrary objects and do backtracking. For
> > > example, these tests -
> > >http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/direct/_tes...
> > > - show it matching a list of integers (not very exciting, I know, but you need
> > > to define a new "alphabet" for teh objects, and for initial testing an
> > > alphabet for 0-9 was easiest).
> > > What made things easier was that (1) LEPL had something similar, so this is
> > > the "second time round", which is a big help and (2) it turns out that the
> > > Python re package has an amazing set of tests, which really helped make sure
> > > it worked. The hardest part this second time round was just getting all the
> > > details right (the re package has a lot of little features I had never used
> > > befire and RXPY reproduces them all).
> > > Anyway, I should read more about re0 - I guess there are some good ideas I
> > > haven't thought of that I should learn...
> > > Thanks,
> > > Andrew
> > > On Wed, Jun 23, 2010 at 04:56:35PM -0400, Jasper St. Pierre wrote:
> > >> My friend had attempted to do something like this, he had some interesting
> > >> ideas, such as applying matching to other objects:
> > >> Unfortunately, the hardest part of this sort of thing is backtracking.
> > >> On Tue, Jun 22, 2010 at 10:00 PM, andrew cooke <and...@acooke.org> wrote:
> > >> > I've just released the first (very much alpha) version of RXPY - a
> > >> > regular expression library for Python. http://www.acooke.org/rxpy/
> > >> > This will eventually be used by Lepl (and will simplify the line aware
> > >> > code, amongst other things).
> > >> > You can ignore it for now; I just wanted to show why development of
> > >> > Lepl itself has slowed a little...
> > >> > Andrew
> > >> > --
> > >> > You received this message because you are subscribed to the Google Groups "lepl" group.
> > >> > To post to this group, send email to lepl@googlegroups.com.
> > >> > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > >> > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.
> > >> --
> > >> You received this message because you are subscribed to the Google Groups "lepl" group.
> > >> To post to this group, send email to lepl@googlegroups.com.
> > >> To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > >> For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.
> > > --
> > > You received this message because you are subscribed to the Google Groups "lepl" group.
> > > To post to this group, send email to lepl@googlegroups.com.
> > > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.
PS Yes, Pypy is also a target :o) There was some recent work,
implementing a simple matcher in RPythin, but it was very incomplete.
This work is complete (full re API) but is not in RPython. Luckily,
again, only the engine needs to be in RPython for speed, so only a
small amount of code will need to be modified.
On Jun 23, 11:56 pm, Devin Jeanpierre <jeanpierr...@gmail.com> wrote:
> Also, I think the PyPy project may be interested in later versions of
> RXPY; provided it's fast and fully compatible with re (or can be
> wrapped to be compatible, e.g. with a dummy module), it could easily
> replace their current re module. It might be worth chatting to them
> about this.
Also, I will try to refresh the documentation periodically. I've just updated the website (Overview has more text, API is starting to be a bit more readable).
I've just released 0.0.1 of RXPY. This has been refactored, deocumented and generally cleaned up so that it might be possible for someone to write a new engine without too much trouble. See http://www.acooke.org/rxpy/new-engine.html