<-[a-z]>
and its ilk are rather hard to read because of the two hyphens
that mean different things. We can't use <![a-z]> because that's a
0-width lookahead. Given that we're trying to get rid of special
exceptions, and - in character classes is weird, and we already
use .. for ranges everywhere else, and nobody is going to put a
repeated character into a character class, I'm wondering if
<-[a..z]>
should be allowed/encouraged/required. It greatly improves the
readability in my estimation. The only problem with requiring .. is
that people *will* write <[a-z]> out of habit, and we would probably
have to outlaw the - form for many years before everyone would get
used to the .. form. So maybe we allow - but warn if not backslashed.
Larry
I don't see why the old syntax has to be supported at all.
Lots of other regexp details are already being changed, such as the
bounding '<>' and the removal of the leading internal '^', so people
already have to edit their regexps. So they can replace the '-' too
while they're at it; not very difficult.
Moreover, I often create character classes that have a literal '-' in
it, and it would be nice to not have to make that the last character
in the class for it to parse properly.
Also, the '..' is easy to learn because it is consistent with other
parts of Perl 6. Likewise, the consistency is another plus when
demonstrating what is good about Perl to folk who don't use it.
-- Darren Duncan
So, <[a.z]> matches "a", ".", and "z",
while <[a..z]> matches characters "a" through "z" inclusive.
I think that works for me. I'll implement it that way (and yes, there
*are* updates to PGE coming very soon!).
I guess I can't complain too loudly about ".." over "-" for ranges
since I was the one who suggested replacing "," with ".." in quantifiers
(e.g., {1..3} instead of {1,3}). Not that I'd be complaining anyway. :-)
> The only problem with requiring .. is
> that people *will* write <[a-z]> out of habit, and we would probably
> have to outlaw the - form for many years before everyone would get
> used to the .. form. So maybe we allow - but warn if not backslashed.
Just to make sure I have it right, by "allow -" you mean that
<[a-z]> matches "a", "-", and "z" and produces a warning
about an unescaped '-'?
Pm
> So, <[a.z]> matches "a", ".", and "z",
> while <[a..z]> matches characters "a" through "z" inclusive.
I was going to say that that was inconsistent, but since you never need
to repeat a letter in a character class, well, I guess it isn't. But
the first person to write <[a...]> gets what's comin' to 'em.
Regards,
David
--
David Wheeler
President, Kineticode, Inc.
http://www.kineticode.com/
Kineticode. Setting knowledge in motion.[sm]
Given ASCII, <[\x20...]> would then be everything except control
characters. Handy!
By the way, does ...5 mean -Inf..5? ;)
Juerd
--
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html
http://convolution.nl/gajigu_juerd_n.html
> On Thu, 2005-04-14 at 21:32 -0700, David Wheeler wrote:
> > On Apr 14, 2005, at 7:06 PM, Patrick R. Michaud wrote:
> >
> > > So, <[a.z]> matches "a", ".", and "z",
> > > while <[a..z]> matches characters "a" through "z" inclusive.
> >
> > I was going to say that that was inconsistent, but since you never need
> > to repeat a letter in a character class, well, I guess it isn't. But
> > the first person to write <[a...]> gets what's comin' to 'em.
>
> A silly question: is there a canonical character set from which we
> extract these ranges? Are we hard-coding Unicode here, or is there some
> way for the user to specify the character set for ranges?
>
<delurk>
even sillier question:
if <[a.z]> matches "a", "." and "z"
and <[a...]> matches all characters from "a" including (for some definition
of 'all')
how will be range \x21 .. \x2e written?
<[!..\.]>? (i.e. "." escaped?)
</delurk>
braňo
A silly question: is there a canonical character set from which we
I was assuming from Larry's mail that <[a...]> would parse as either:
1) a character class containing the range from 'a' to '.' (what that
means is a bit mind-bending for a friday afternoon) 2) a character class containing 'a' then a range from '.' to... oh, an
error
Which way might be ambiguous, but could of course be defined in the
grammar. It hadn't occurred to me that ... for the range to infinity would
be allowed or useful here. I suppose it could just mean 'up to the end of
the available codepoints'.
I do love the idea of <[a..f]> type ranges though. It's just what the
three dots mean that's got me confused.
At the moment, PGE (the part that implements the rule engine) is
deferring such questions to Parrot, and otherwise assuming Unicode.
Plus, S02 explicitly indicates that Perl is written in Unicode
and has consistent Unicode semantics, so I think that's what we should
go with. It's certainly the way the compiler will go, at least
initially.
Pm
: In writing some character class translation, I realized that
I think, if we bear in mind, as it has been stressed previously, that
many changes concerning regular expressions have been introduced and
require users to assimilate themselves accordingly, it doesn't seem
unreasonable requiring to write double-dot instead of a hyphen; it also
fits the "Principle of least surprise" idiom nicely, in my opinion.
Nevertheless, as mentioned by David, <[a...]> would become rather
confusing to people first and secondly to the compiler; although,
regardless whether we assume dot preceeds double-dot or vice-versa,
there would be an expansion enforced (what I'd expect), perhaps
accompanied by a warning.
I agree on a warning upon non-escaped hyphen.
Steven
Perl 5 forces [a-z] (or [i-j] for that matter) to be a range of
> But the first person to write <[a...]> gets what's comin' to 'em.
Is that nothing (since '.' lt 'a'), or everything after 'a'?
-- Rod Adams
Might as well make it everything after 'a' for consistency. One could
also view the last dot as a special version of the ordinary "any" dot,
and read it "a to whatever".
Larry
> -----Original Message-----
> From: Paul Hodges [mailto:ydb...@yahoo.com]
> Sent: Sunday, April 17, 2005 1:30 PM
> To: Larry Wall; perl6-l...@perl.org
> Subject: Re: should we change [^a-z] to <-[a..z]> instead of <-[a-z]>?
>
>
> --- Larry Wall <la...@wall.org> wrote:
> . . .
> > <-[a..z]>
> >
> > should be allowed/encouraged/required. It greatly improves the
> > readability in my estimation. The only problem with requiring .. is
> > that people *will* write <[a-z]> out of habit, and we would probably
> > have to outlaw the - form for many years before everyone would get
> > used to the .. form. So maybe we allow - but warn if not
> > backslashed.
>
> In general, I think this is a great idea, but what exactly do you mean
> by "warn if not backslashed"? That I'd get a warning *any* time I use a
> dash in a character class? I guess I can live with that.
On the other hand, you can use the canonical perl 5 trick of having the
dash be the first character in the class if you want to use a literal dash.
Joe Gottman.
I think that if we're looking for consistency, the default should be to
read it as "a and everything after it". If someone wants "a to
whatever", they should write it <[a..\.]> since it's a pretty odd
fringe case.
__________________________________
Do you Yahoo!?
Plan great trips with Yahoo! Travel: Now over 17,000 guides!
http://travel.yahoo.com/p-travelguide
In general, I think this is a great idea, but what exactly do you mean
by "warn if not backslashed"? That I'd get a warning *any* time I use a
dash in a character class? I guess I can live with that.