...
print p.declare("""
struct test1
{
int a,b;
float this,and,that;
};
struct test2
{
int count;
struct test1 data[80]
};
""")
...
and have it create instances I can easily assign data to
...
t2 = p.createInstance("test2")
t2.data[0].a = 42
...
and pack for C extensions by utilizing the struct module
...
data = p.pack(t2)
...
because of being fed up with that pack("qh34s>id",...) stuff. (I'll
post that to my website once its in a state I can let somebody else
see ;)
Anyway, that got me thinking on why do we have to deal with regular
expressions like r"((?:a|b)*)", when in most cases the code will look
something like this:
r = re.compile("<some cryptic re-string here>")
...
r.match(this) or r.find(that)
which means the real time is not spent in the compile() function, but
in the match or find function. So basically, couldn't one come up with
a *human readable* syntax for re, and compile that instead? Python
prides itself on its clean syntax, and human readability, an bang -
import re, get perl-ish code instantly!
Also, I think it would already be an improvement if the syntax
provided for clear and easy-to-understand special cases, like
re.compile("anything that starts with 'abc'")
and if you cannot find something in the special cases for you, you can
always go back to
re.compile("<some cryptinc re-string here>")
After all, *everyone* starting with re thinks the syntax is cryptic
and mind-boggling, and only if you get yourself into the "re mindset",
you understand things like r"\s*\w+\s*=\s*['\"].*?['\"]" instantly. If
we had an easier syntax, more people would be using re ;)
Is the idea utterly foolish?
If you only use the RE once, you can use the module-level functions ;-)
> which means the real time is not spent in the compile() function, but
> in the match or find function. So basically, couldn't one come up with
> a *human readable* syntax for re, and compile that instead?
That's equally powerful? Most probably not.
> Also, I think it would already be an improvement if the syntax
> provided for clear and easy-to-understand special cases, like
>
> re.compile("anything that starts with 'abc'")
s.startswith("abc")
s.lower().startswith("abc")
> and if you cannot find something in the special cases for you, you can
> always go back to
>
> re.compile("<some cryptinc re-string here>")
>
> After all, *everyone* starting with re thinks the syntax is cryptic
> and mind-boggling, and only if you get yourself into the "re mindset",
> you understand things like r"\s*\w+\s*=\s*['\"].*?['\"]" instantly. If
> we had an easier syntax, more people would be using re ;)
>
> Is the idea utterly foolish?
I don't really know. IMO if you have very simple string-searching, then
you can probably get away with the string methods, and if you have very
complex stuff, then you'll probably be better of with a parser generator
(like SimpleParse, which is very readable, IMO).
I don't find regular expressions that unreadably, especially when I
consider that I'd have to write many lines of error-prone Python code
instead. Stuff like this is just too convenient:
# working around zxDateTime limitations:
if JYTHON:
import re
ISO_DATE_RE = re.compile(r"(\d\d\d\d)-(\d\d)-(\d\d)")
def DateFrom(s):
match = ISO_DATE_RE.match(s)
if match is None:
raise ValueError
return DateTime(*map(int, match.groups()))
Gerhard
--
mail: gerhard <at> bigfoot <dot> de registered Linux user #64239
web: http://www.cs.fhm.edu/~ifw00065/ OpenPGP public key id AD24C930
public key fingerprint: 3FCC 8700 3012 0A9E B0C9 3667 814B 9CAA AD24 C930
reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b')))
> So basically, couldn't one come up with a *human readable* syntax
> for re, and compile that instead? Python prides itself on its clean
> syntax, and human readability, an bang - import re, get perl-ish
> code instantly!
There is an Emacs Lisp package called "symbolic regexps" (or
sregexp.el) that lets you write regular expressions in standard lisp
syntax, like
(sregexq bol (or "abc" "def"))
instead of "^(abc|def)" (bol meaning "beginning of line"). I don't see
how that maps elegantly to Python syntax, however.
Regards
Henrik
Maybe Ka-Ping Yee's rxb? http://web.lfw.org/python/rxb15.py
The problem with a new syntax is that no one else would be using it, so
you'd still need to learn the existing syntax for use with grep, vi, Perl,
&c. (It wouldn't surprise me if Perl 6's revised regexes run into this very
difficulty and don't gain much adoption.)
--amk
> There is an Emacs Lisp package called "symbolic regexps" (or sregexp.el)
> that lets you write regular expressions in standard lisp syntax, like
> (sregexq bol (or "abc" "def")) instead of "^(abc|def)" (bol meaning
> "beginning of line"). I don't see how that maps elegantly to Python syntax,
> however.
PLEX has a Python way to describe regular expressions. It is likely available
stand-alone. I found the copy I use within the Pyrex distribution, see:
http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/
--
François Pinard http://www.iro.umontreal.ca/~pinard
I see you've already gotten to my suggestion - look at Apocalypse 5
and Exegesis 5 on the O'Reilly Perl page and see what Larry Wall
has done.
I tend to agree that Larry is going out on a limb. However, since he's
done so, if we are thinking about a new regex syntax, we really
should consider doing it the same way. As you point out, there's going
to be enough difficulty getting one radical revision accepted. Getting
two different ones accepted is likely to sink everyone's effort.
Also, doing it the same way will simplify the Python effort in Parrot
(assuming anyone cares about that any more...)
And as far as grep, Vi and so forth are concerned (Perl isn't
an issue - that's the way it's going,) doing something to fix them
isn't all that hard - at least for the GNU versions of the programs.
>
> --amk
You might want to look at the Plex package. It defines patterns by
constructing data structures. Something like this:
symbol = Range("A-Za-z") + Any(Range("0-9A-Za-z") | Char("_"))
However, three points:
First, this will certainly be slower than regular expressions, since
there are many Python calls needed to build the structure. (Of course,
after you've compiled it, it can be as fast as regexps.)
Second, even if you use re module, it is still nowhere near Perl-ish
ugliness. You still have Python's clean syntax outside of the
pattern.
Third, readability is not a unilateral good thing; conciseness is also
important, and sometimes opposed to readability. Sacrificing a little
readability to get a lot of conciseness is usually a good thing. I
think, as long as the regexp is not too obnoxious, it is probably
better to keep it concise. (Of course, this depends a lot on what
you're doing and how flexible you need to be.)
--
CARL BANKS
http://www.aerojockey.com
> You might want to look at the Plex package. [...]
>
> First, this will certainly be slower than regular expressions, since
> there are many Python calls needed to build the structure. (Of course,
> after you've compiled it, it can be as fast as regexps.)
The Plex matching engine does not backtrack, which might be an advantage over
Python regexps at matching time, at least theoretically for some regexps.
Building the matching tables, however, consumes a lot of time. So I guess
that for most usual regexps, Python regexps are quite OK.
> Second, even if you use re module, it is still nowhere near Perl-ish
> ugliness. You still have Python's clean syntax outside of the pattern.
A clear and definite advantage! :-) :-)
> Third, readability is not a unilateral good thing; conciseness is also
> important [...]
Python regexps, given some flag at compilation time, may allow for embedded
whitespace and comments. With proper care, difficult regexps could be made
less compact and more readable, without changing at all how they behave at
run-time. Conciseness is a quality for short regexps. Horrid regexps are
more advantageously written non-compactly.
Why not? It won't be as fast, but it should be able to do anything a
regexp can do, and would be much more versatile.
>The problem with a new syntax is that no one else would be using it, so
>you'd still need to learn the existing syntax for use with grep, vi, Perl,
>&c. (It wouldn't surprise me if Perl 6's revised regexes run into this very
>difficulty and don't gain much adoption.)
A tounge-in-cheek-answer: I don't use grep, I use python. I don't use
vi, I use python. I don't use perl, I use python :)
Seriously, my thinking was, the re.compile function is there to
compile an expression to a binary representation for optimized
searching. So maybe, a "clean syntax" -> "ugly re syntax" compiler
would be good?
Besides, you could use the output ("ugly re syntax") in grep, vi, perl
if you really so intended.
vi? Yuck. vim or die :-P
> I use python. I don't use perl, I use python :)
>
> Seriously, my thinking was, the re.compile function is there to
> compile an expression to a binary representation for optimized
> searching. So maybe, a "clean syntax" -> "ugly re syntax" compiler
> would be good?
I think it would. Though the other way round wouldn't be that bad,
either.
-- Gerhard
> [Henrik Motakef]
>
> PLEX has a Python way to describe regular expressions. It is likely available
> stand-alone.
http://www.cosc.canterbury.ac.nz/~greg/python/Plex/
You get documentation as well if you get it from there. :-)
Note that Plex's RE implementation is very special
purpose -- you couldn't use it as a direct replacement
for the re module. But a wrapper for the RE module
which uses the same syntax could easily be made.
--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg
What would the equivalent of r"(.)(.)(.)\3\2\1"
This means a "palindrome of 6 characters"
But it is unlikely that the human readable processor would understand
that (isn't it??)
It would be more likely to look like this (I haven't put too much
thought into this)
"anything,anything,anything,same_as_3rd,same_as_2nd,same_as_1st"
or would you like to suggest something else?
palindrome_6 = re.compile(r"(.)(.)(.)\3\2\1")
palindrome_6 =
re.compile("anything,anything,anything,same_as_3rd,same_as_2nd,same_as_1st")
Sure there are some cases where the re is loaded with meta characters...
hmmm
OK is this about writing maintainable code or people not wanting to
learn all the ins and outs of re's?
John
I used to use OmniMark a lot when it was free. With OmniMark's
equivalent of REs, the palindrome would be
any => char1 any => char2 any => char3 char3 char2 char1
I selected a non-trivial OmniMark RE from old code at random and came up with
('<!DOCTYPE' white-space+ [any except white-space]+ white-space+ "PUBLIC" white-space+ '"'
upto-inc('"')) => a.whole (white-space+ "SYSTEM"? white-space* '"'
upto-inc('"'))?
The (untested) Python RE equivalent is something like
"""(?P<a.whole><!DOCTYPE\s+[^\n]+\s+"PUBLIC"\s+"[^"]+")(?:\s+(?:"SYSTEM")?\s+"[^"]*")?"""
or, more readably if compiled with the re.VERBOSE flag
"""
(?P<a.whole>
<!DOCTYPE
\s+[^\n]+
\s+"PUBLIC"
\s+"[^"]+")
(?:\s+(?:"SYSTEM")?
\s+"[^"]*")?
"""
Which is the easiest to understand?
I'm used to REs, so I don't find the verbose Python RE too difficult to read. When I was learning
OmniMark, however, it was nice to be able to use "digit" and "letter" rather than
"\d" and [a-zA-Z] as creating non-trivial effective search patterns is never easy.
_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.
Ridiculous. If you can map human readable code into machine language,
then you can map human readable code into regular expressions.
Cryptic as they are, regular expressions are still systematic; thus it
is possible to systematically convert the cryptic regexp syntax into
more readable and consistent syntax.
> What would the equivalent of r"(.)(.)(.)\3\2\1"
> This means a "palindrome of 6 characters"
> But it is unlikely that the human readable processor would understand
> that (isn't it??)
Nope. You don't appear to appreciate the power of computers to
translate human readable text into comlicated internal data, and are
evidently forgetting that interpretters such as Python that do a much
more difficult translation of readable text.
> It would be more likely to look like this (I haven't put too much
> thought into this)
No kidding.
> "anything,anything,anything,same_as_3rd,same_as_2nd,same_as_1st"
> or would you like to suggest something else?
How about:
pattern = Group(Any()) + Group(Any()) + Group(Any()) \
+ GroupRef(3) + GroupRef(2) + GroupRef(1)
There's no reason it has to be re.compile with a string.
[snip]
>
> Sure there are some cases where the re is loaded with meta characters...
That's the idea, chief. For a simple regexp like you gave above, it
would be overkill to use a human readable syntax. And it would still
be overkill for many regexps more complicated that that.
But eventually, the regexps will become complicated enough that a more
human readable syntax is preferrable. Not to mention that a human
readable syntax will be more versatile, when that is needed.
> hmmm
> OK is this about writing maintainable code or people not wanting to
> learn all the ins and outs of re's?
Nope. For me, this is about understanding that complicated regexps
could benefit from a more readable and consistent syntax, and that the
more consistent syntax could add a lot of power and versatility to
regexps.
I agree about new syntax, but I wouldn't mind having a re.help(regexp) function
for interactive use that would just explain in 'English' what the regexp expression
stands for. It would be a nice easy double check on whether I wrote what I meant,
and helpful for understanding someone else's magic. It shouldn't be that hard to do.
Regards,
Bengt Richter
>I agree about new syntax, but I wouldn't mind having a re.help(regexp) function
>for interactive use that would just explain in 'English' what the regexp expression
>stands for. It would be a nice easy double check on whether I wrote what I meant,
>and helpful for understanding someone else's magic. It shouldn't be that hard to do.
>
Many years ago, in my PL/I & IBM Mainframe days, I wrote a gizmo to
check the result of some tricky JCL (Job Control Language) parameters.
(Required because if it did not do what the user intended, it could
fail in an overnight run and the user wouldn't know until morning)
Not only did it take an expression and show the result, but it gave a
decision-by-decision commentary on why the result was what it was, so
provided a live tutorial by way of a simulator. It was moderately
difficult to write, IIRC, but most difficult was checking that the
simulator was in accord with reality.
However I don't understand regexs well enough to attempt this yet.
--
John W Hall <wweexxss...@telusplanet.net>
Calgary, Alberta, Canada.
"Helping People Prosper in the Information Age"
No, it's not utterly foolish. You might be surprised to learn that
Larry Wall agrees with you that the Perl regex syntax is much
too obtuse, and in need of a basic, ground up redesign. Even
current Perl syntax allows you a special form where you can
insert blanks for readability.
http://www.perl.com/pub/a/2002/06/04/apo5.html
http://www.perl.com/pub/a/2002/08/22/exegesis5.html
It's an interesting redesign of basic regex functionality.
Some of the things you can do with it are very, very
interesting indeed.
John Roth
>
>
> I agree about new syntax, but I wouldn't mind having a re.help(regexp) function
> for interactive use that would just explain in 'English' what the regexp expression
> stands for. It would be a nice easy double check on whether I wrote what I meant,
> and helpful for understanding someone else's magic. It shouldn't be that hard to do.
>
> Regards,
> Bengt Richter
I think that's an excellent idea (probably has already been done,
anybody know?)
It is unfortunate that often a re that starts off as a simple idea
turns out to be a big mess. I think some sort of graphical tool
would be even better so you could see what the re does. Especially
if it could convert both ways and allow you to edit the re.
John
>>It would be more likely to look like this (I haven't put too much
>>thought into this)
>
>
> No kidding.
>
>
>
>>"anything,anything,anything,same_as_3rd,same_as_2nd,same_as_1st"
>>or would you like to suggest something else?
>
>
> How about:
>
> pattern = Group(Any()) + Group(Any()) + Group(Any()) \
> + GroupRef(3) + GroupRef(2) + GroupRef(1)
>
Err symantically that's exactly the same as the re and my suggestion
only the syntax is different. It's still nothing like saying
pattern = "6 character palindrome"
John
Oh. Well, methinks you don't give humans enough credit. I'm a human,
and I can read the verbose regexp. Of course, humans can read regular
expressions, too. Maybe I don't give humans enough credit, either.
:-)
The point is, I was arguing for less-cryptic, more-verbose regular
expressions as a way to make complicated patterns more transparent. I
certainly wasn't arguing for "6 character palindrome".
Sorry-for-the-confusion-ly y'rs
Do you mean something like this?
def palindrome_re(n):
pat = ["(.)" * ((n+1)/2)]
for i in range(n/2, 0, -1):
pat.append("\\%d" % i)
return "".join(pat)
With a little work, you can extend this to use named groups and named
backrefs as well, so that you can use it as a building block for larger
patterns:
def Any(): return "."
def Group(s, g): return "(?P<%s>%s)" % (g, s)
def Backref(g): return "(?P=%s)" % g
def Or(*args): return "|".join(args)
def palindrome_re(n, p):
pat = [Group(Any(), "%s%d") % (p, i+1) for i in range((n+1)/2)]
for i in range(n/2, 0, -1):
pat.append(Backref("%s%d" % (p, i)))
return "".join(pat)
I think that building REs in functions is a great approach for more
complex REs.
>>> q = re.compile(palindrome_re(7, "a") + palindrome_re(6, "b"))
>>> q.match("abcdcbaxyzzyx")
<_sre.SRE_Match object at 0x401c4f00>
>>> _.groupdict()
{'l4': 'd', 'l2': 'b', 'l3': 'c', 'l1': 'a', 'i1': 'x', 'i3': 'z', 'i2': 'y'}
>>> q = re.compile(Or(palindrome_re(7, "a"), palindrome_re(6, "b")))
>>> q.match('abccbb')
>>> q.match("abcdcba")
<_sre.SRE_Match object at 0x401c4f00>
>>> q.match("abccba")
<_sre.SRE_Match object at 0x402e5020>
Jeff
> http://www.perl.com/pub/a/2002/06/04/apo5.html
>
> http://www.perl.com/pub/a/2002/08/22/exegesis5.html
I just had a brief look at this, and the underlying
ideas seem to be a lot like the way Snobol patterns
work.
Maybe it's time for me to resurrect the Snobol-style
pattern matching module that I started on a while
back and never got around to releasing.
Would anyone be interested in this? Its
syntax is similar to that of Plex REs, except that
the primitives are Snobol-like, and it uses a
backtracking matching algorithm that's much more
powerful than a DFA (you can write entire parsers
in it, for example).
Some interesting references, possibly? --
http://sourceforge.net/projects/pystemmer/
http://snowball.sourceforge.net
These may be more specialized -- Snowball is
specifically for algorithmically stemming words,
and PyStemmer is an interface to it. I haven't
really looked into how it works. The name is
related to SNOBOL, but I'm not sure how much
Snowball actually resembles it (if at all).
I'm using pystemmer as part of a function which
converts object titles to (hopefully) mnemonic
file names (ids) in Zope. I haven't really
looked into how it works.
But I though it might be relevant to you.
Cheers,
Terry
--
------------------------------------------------------
Terry Hancock
han...@anansispaceworks.com
Anansi Spaceworks
http://www.anansispaceworks.com
P.O. Box 60583
Pasadena, CA 91116-6583
------------------------------------------------------
There used to be a tkinter tool like this in Python under 1.52 written
by Guido. I don't know what happened to it though.
regards Max m
>>What would the equivalent of r"(.)(.)(.)\3\2\1"
>>This means a "palindrome of 6 characters"
>>But it is unlikely that the human readable processor would understand
>>that (isn't it??)
Why not just write the parts of the regex as named strings?
like::
name = '[a-zA-Z0-9_.]+'
at = '@'
dot = '\.'
topDomain = 'com|org|dk'
email = name + at + name + dot + topDomain
instead of::
email = '[a-zA-Z0-9_.]+@[a-zA-Z0-9_.]+\.com|org|dk'
Well my syntax is most likely wrong as I suck at regex, but the meaning
should be clear enough.
It's still around as Tools/scripts/redemo.py, I think.
--amk
I don't know much about Snobol, unfortunately.
I think my biggest issue here is that we shouldn't
reinvent the wheel unless there is a good reason.
In other words, Larry is taking Perl in a specific
direction. Assuming we want to make major
changes to regex, is there any _good_ reason
for doing something conceptually different and
consequently adding to the cacophony?
John Roth
It would also be useful if patterns can be built up as structured objects:
def palindrome_re(n):
pat = Empty()
for i in range(n): pat += Group(Any())
for i in range(n, 0, -1): pat += GroupRef(i)
return pat
class Palindrom:
def __init__(self, n): self.n = n
def __call__(self): return palindrome_re(self.n)
pattern = Palindrom(6)
Once the tree structure is revealed it could be mapped to whatever syntax
that is convenient for the situation. The perl-like syntax would just be
one of them.
Huaiyu
I don't think anyone wants to make changes to regular expressions, per
se. I think this discussion is about adding a higher level syntax to
regular expressions, that makes them more readable and versatile, for
cases where that's important.
> is there any _good_ reason
> for doing something conceptually different and
> consequently adding to the cacophony?
I think there is a good reason to have a higher level syntax. It
serves a purpose heretofore unserved: that complex regular expressions
no longer have to be unbearably unreadable.
I could live without it, though.
One of the things Perl has right now is the ability to use spaces
within regular expressions via a suffix flag. In Perl 6, this will
become
standard, and will include the ability to include comments!
Apocalypse 5 is well worth reading. Larry sets out why he
decided to redesign regular expression syntax. Much of the
reason was exactly what you specify: it's hard to read, hard
to understand, and has picked up lots of cruft over the years.
Making regular expressions "more versatile" is a great
idea. There's lots of things I'd like to do. However,
everything you load onto the poor thing makes it that
much more complex, hard to understand, hard to read
and errorprone. At some point, the mess has to be
redesigned. Since Python's re syntax isn't currently
as complex as Perl's (I think) it might not be there yet.
Python also supports that with the re.VERBOSE flag.
Skip
http://www.ai.mit.edu/people/shivers/sre.txt
The beauty of this kind of approach is that the expressive power of the host
language is fully available in constructing complex expressions.
-- F
> Seriously, my thinking was, the re.compile function is there to
> compile an expression to a binary representation for optimized
> searching. So maybe, a "clean syntax" -> "ugly re syntax" compiler
> would be good?
note that the SRE engine contains an "ugly syntax" to "internal
data structure" parser, and an "internal data structure" to "engine
code" compiler.
it's probably easier (and definitely more efficient) to turn a clean
syntax into an "internal data structure" than into an ugly syntax.
(the next step is to use python's own parse tree instead of SRE's
internal structure, and use an extension to python's compiler for
the final step. make it all pluggable, and you have perl6...)
</F>
> I agree about new syntax, but I wouldn't mind having a re.help(regexp) function
> for interactive use that would just explain in 'English' what the regexp expression
> stands for.
you can ask SRE to dump the internal parse tree
to stdout:
>>> sre.compile("[a-z]\d*", sre.DEBUG)
in
range (97, 122)
max_repeat 0 65535
in
category category_digit
turning this into 'English' is left as an exercise etc.
</F>
Interesting, thanks. Does the above mean that sre can't fully match
'a'+'9'*65537
?
Regards,
Bengt Richter
> >you can ask SRE to dump the internal parse tree
> >to stdout:
> >
> >>>> sre.compile("[a-z]\d*", sre.DEBUG)
> >in
> > range (97, 122)
> >max_repeat 0 65535
> > in
> > category category_digit
> >
> >turning this into 'English' is left as an exercise etc.
>
> Interesting, thanks. Does the above mean that sre can't fully match
> 'a'+'9'*65537
> ?
in this context, 65535 represents any number:
>>> import re
>>> p = re.compile("[a-z]\d*")
>>> s = "a"+"9"*65537
>>> len(s)
65538
>>> m = p.match(s)
>>> len(m.group(0))
65538
</F>
>Bengt Richter wrote:
>
>> >you can ask SRE to dump the internal parse tree
>> >to stdout:
>> >
>> >>>> sre.compile("[a-z]\d*", sre.DEBUG)
>> >in
>> > range (97, 122)
>> >max_repeat 0 65535
>> > in
>> > category category_digit
>> >
>> >turning this into 'English' is left as an exercise etc.
>>
>> Interesting, thanks. Does the above mean that sre can't fully match
>> 'a'+'9'*65537
>> ?
>
>in this context, 65535 represents any number:
Doesn't that cause problems for something like this?
>>> m=re.compile(r'\d{0,65535}a').match(('9'*1000000)+'a')
>>> len(m.group(0))
1000001
--
BTR
You're going to set me up as a kind of slovenly attached pig that
Jack Kornfeld can slice down in his violent zen compassion?
-- Larry Block
>On Thu, 05 Sep 2002 06:41:14 GMT, "Fredrik Lundh" <fre...@pythonware.com>
>wrote:
>
>>Bengt Richter wrote:
>>
>>> >you can ask SRE to dump the internal parse tree
>>> >to stdout:
>>> >
>>> >>>> sre.compile("[a-z]\d*", sre.DEBUG)
>>> >in
>>> > range (97, 122)
>>> >max_repeat 0 65535
>>> > in
>>> > category category_digit
>>> >
>>> >turning this into 'English' is left as an exercise etc.
>>>
>>> Interesting, thanks. Does the above mean that sre can't fully match
>>> 'a'+'9'*65537
>>> ?
>>
>>in this context, 65535 represents any number:
>
>Doesn't that cause problems for something like this?
>
>>>> m=re.compile(r'\d{0,65535}a').match(('9'*1000000)+'a')
>>>> len(m.group(0))
>1000001
>
Looks like a bug to me if {0,65535} acts like {0,}
BTW, a search for \d{0,65534} seems to mean it, and compiles
so slowly that I lost patience waiting. Not very optimized, I guess.
>>> import re
>>> m=re.compile(r'\d{0,65535}a').search(('9'*1000000)+'a')
>>> len(m.group(0))
1000001
That went reasonably in time(though it's wrong), but this snoozed.
It must be brute forcing something.
>>> m=re.compile(r'\d{0,65534}a').search(('9'*1000000)+'a')
^C
[18:50] C:\pywk\junk>
Regards,
Bengt Richter