Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

regexp quickie

2 views
Skip to first unread message

Phil Carmody

unread,
Jun 27, 2007, 8:45:54 AM6/27/07
to go...@perl.org
Say I had a string satisfying /^[A-Z_]{6}$/, but not equal to '______' and I
wish to extract from that the 1 or 2 letters which are closest to the n-th
character in the string. Is there a simple regexp to perform that task?

e.g.
if the string=A_Z_K_ then:
if n=1, then I want 'A' (or 'AA', not fussed)
if n=2, then I want 'AZ'
if n=3, then I want 'Z' (or 'ZZ', not fussed)
if n=4, then I want 'ZK'
if n=5 or 6, then I want 'K' (or 'KK', not fussed)

I can see how to do it with the concatenation of two matches from two substrs,
but that's barely simpler than a naive loop over each character forwards and
backwards.

Phil

() ASCII ribbon campaign () Hopeless ribbon campaign
/\ against HTML mail /\ against gratuitous bloodshed

[stolen with permission from Daniel B. Cristofani]



____________________________________________________________________________________
Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games.
http://sims.yahoo.com/

Ronald J Kimball

unread,
Jun 27, 2007, 11:01:36 AM6/27/07
to Phil Carmody, go...@perl.org
On Wed, Jun 27, 2007 at 05:45:54AM -0700, Phil Carmody wrote:
> Say I had a string satisfying /^[A-Z_]{6}$/, but not equal to '______'
> and I wish to extract from that the 1 or 2 letters which are closest to
> the n-th character in the string. Is there a simple regexp to perform
> that task?
>
> e.g.
> if the string=A_Z_K_ then:
> if n=1, then I want 'A' (or 'AA', not fussed)
> if n=2, then I want 'AZ'
> if n=3, then I want 'Z' (or 'ZZ', not fussed)
> if n=4, then I want 'ZK'
> if n=5 or 6, then I want 'K' (or 'KK', not fussed)
>
> I can see how to do it with the concatenation of two matches from two
> substrs, but that's barely simpler than a naive loop over each character
> forwards and backwards.

Well, I wouldn't exactly call this regex simple... But I have come up with
one that does it:

for (qw/ A_Z_K_ A_____ _____K /) {
print "$_\n";
for my $n (1 .. 6) {
my $r = $n - 1;
print "$n: ";
/^(?(?=.{0,$r}[A-Z]).{0,$r}|.*)([A-Z])(?(?<!^..{$r}).*?([A-Z]|$))/
&& print "$1 $2";
print "\n";
}
}

This has the advantage of always putting the matched characters in $1 and
$2. (Note that $1 is always set; if there is no letter at or before the
position, $1 will contain the first letter after the position and $2 will
be empty.)


Here are two other approaches:

/^.{$r}([A-Z])/ || /^.{0,$r}([A-Z]).*?([A-Z]|$)/ || /^.*([A-Z])/
&& print "$1 $2";
is simpler, but uses three separate regular expressions.

/^.{$r}([A-Z])|^.{0,$r}([A-Z]).*?([A-Z]|$)|^.*([A-Z])/
&& print $1 || $4 || "$2 $3";
uses a single regular expression, but the results will be in $1, or in $2
and $3, or in $4. (And if digits were allowed the print logic would need
to be modified.)


Ronald

Phil Carmody

unread,
Jun 27, 2007, 12:53:49 PM6/27/07
to go...@perl.org
--- Ronald J Kimball <rjk-pe...@tamias.net> wrote:
> On Wed, Jun 27, 2007 at 05:45:54AM -0700, Phil Carmody wrote:
> > Say I had a string satisfying /^[A-Z_]{6}$/, but not equal to '______'
> > and I wish to extract from that the 1 or 2 letters which are closest to
> > the n-th character in the string. Is there a simple regexp to perform
> > that task?
>
> Well, I wouldn't exactly call this regex simple... But I have come up with
> one that does it:
...
> /^(?(?=.{0,$r}[A-Z]).{0,$r}|.*)([A-Z])(?(?<!^..{$r}).*?([A-Z]|$))/

> /^.{$r}([A-Z])/ || /^.{0,$r}([A-Z]).*?([A-Z]|$)/ || /^.*([A-Z])/


> /^.{$r}([A-Z])|^.{0,$r}([A-Z]).*?([A-Z]|$)|^.*([A-Z])/

Woh! I'm glad that's complicated, as I don't feel so bad just doing it the
naive way. (I chose the loop, rather than the joining of the result from two
substrs, one using a /^ match and the other using a $/.)

I have another idea how to solve the problem of isolating the (1 or) 2 closest
real letters:

In the A_Z_K_ case, for n=2 (r=1), an equally useful output could be
A_Z___
That is, to replace anything beyond the first letter at or after char r with
the underscore. Two passes would be necessary, one to strip that which is
before, one to strip that which is after.

Of course, I can probably simplify things by handling the cases of when there
is and is not a letter at the point of interest separately. If there is a
letter there, the answer is already found.

It's not a big deal, I was just noodling...

Thanks for the rather heroic regexps though!
Phil

() ASCII ribbon campaign () Hopeless ribbon campaign
/\ against HTML mail /\ against gratuitous bloodshed

[stolen with permission from Daniel B. Cristofani]



____________________________________________________________________________________
Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545469

Roy Johnson

unread,
Jun 27, 2007, 3:26:35 PM6/27/07
to go...@perl.org
Mine (borrowing RJK's testing code): $1 will be the last letter (non-underscore) before or at the target location; $2 will be the first letter at or after the target location, or the last letter if no such letter exists.

for (qw/ A_Z_K_ A_____ _____K /) {
print "$_\n";
for my $n (1 .. 6) {
my $r = $n - 1;
print "$n: ";

print /^(?=.{0,$r}([^_]))?.{0,$r}.*?([^_])/
? "[$1] ($2)" : "no match";
print "\n";
}
}

Ronald J Kimball

unread,
Jun 27, 2007, 3:38:21 PM6/27/07
to Roy.J...@shell.com, go...@perl.org

I wish I'd thought of doing it that way!

If you want $2 to only ever be the first letter at or after the target
location, you can just tweak the second half of the regex:

/^(?=.{0,$r}([^_]))?(?:.{$r}.*?([^_]))?/

Ronald

Roy Johnson

unread,
Jun 27, 2007, 3:48:39 PM6/27/07
to go...@perl.org
> If you want $2 to only ever be the first letter at or after the target
> location, you can just tweak the second half of the regex:
>
> /^(?=.{0,$r}([^_]))?(?:.{$r}.*?([^_]))?/

Save a stroke:
/^(?=.{0,$r}([^_]))?(?:.{$r,}?([^_]))?/

Roy

Phil Carmody

unread,
Jun 27, 2007, 6:18:13 PM6/27/07
to go...@perl.org

You guys amaze me! (And gals, too, in case Abigail reads this list. In which
case she might reduce that it is for RoRoRo's EinStein bot!)

Phil

() ASCII ribbon campaign () Hopeless ribbon campaign
/\ against HTML mail /\ against gratuitous bloodshed

[stolen with permission from Daniel B. Cristofani]



____________________________________________________________________________________
Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online.
http://smallbusiness.yahoo.com/webhosting

Shlomi Fish

unread,
Jun 28, 2007, 5:29:43 AM6/28/07
to go...@perl.org, Phil Carmody
On Thursday 28 June 2007, Phil Carmody wrote:
> --- Roy.J...@shell.com wrote:
> > Mine (borrowing RJK's testing code): $1 will be the last letter
> > (non-underscore) before or at the target location; $2 will be the first
> > letter at or after the target location, or the last letter if no such
> > letter exists.
> >
> > for (qw/ A_Z_K_ A_____ _____K /) {
> > print "$_\n";
> > for my $n (1 .. 6) {
> > my $r = $n - 1;
> > print "$n: ";
> > print /^(?=.{0,$r}([^_]))?.{0,$r}.*?([^_])/
> > ? "[$1] ($2)" : "no match";
> > print "\n";
> > }
> > }
>
> You guys amaze me! (And gals, too, in case Abigail reads this list. In
> which case she might reduce that it is for RoRoRo's EinStein bot!)
>

Sorry to disappoint you but Abigail is a guy (at least in the context of
Perl):

{{{{{{{{{
Sep 18 22:11:57 <rindolf> fxn: Abigail is a guy, right?
Sep 18 22:12:23 <Yaakov> Abigail is a very Dutch, very male person.
Sep 18 22:12:35 <fxn> rindolf: yes
Sep 18 22:13:49 <rindolf> fxn: is Abigail his psedonym?
Sep 18 22:14:25 <fxn> rindolf: I don't know, his name in the conference was
Abigail
Sep 18 22:14:51 <fxn> rindolf: that's a picture by cog:
http://www.flickr.com/photos/cogurov/42641335/in/photostream/
Sep 18 22:14:52 <shorten> fxn's url is at http://xrl.us/hnmc
}}}}}}}}}

It is indeed a feminine name, though, at least in its Hebrew origin:

http://en.wikipedia.org/wiki/Abigail

Regards,

Shlomi Fish

> Phil
>
> () ASCII ribbon campaign () Hopeless ribbon campaign
> /\ against HTML mail /\ against gratuitous bloodshed
>
> [stolen with permission from Daniel B. Cristofani]
>
>
>
> ___________________________________________________________________________

>_________ Building a website is a piece of cake. Yahoo! Small Business gives


> you all the tools to get online. http://smallbusiness.yahoo.com/webhosting

--

---------------------------------------------------------------------
Shlomi Fish shl...@iglu.org.il
Homepage: http://www.shlomifish.org/

If it's not in my E-mail it doesn't happen. And if my E-mail is saying
one thing, and everything else says something else - E-mail will conquer.
-- An Israeli Linuxer

Phil Carmody

unread,
Jun 28, 2007, 6:52:41 AM6/28/07
to go...@perl.org
--- Shlomi Fish <shl...@iglu.org.il> wrote:
> On Thursday 28 June 2007, Phil Carmody wrote:
> > You guys amaze me! (And gals, too, in case Abigail reads this list.
>
> Sorry to disappoint you but Abigail is a guy (at least in the context of
> Perl):
...
> http://www.flickr.com/photos/cogurov/42641335/in/photostream/

Oh, that's going to get confusing... I'll have no idea what personal pronoun to
use when referring to hän in the future on littlegolem, usenet, and here.
Cop-out#1 - use of Finnish (genderless) pronouns.

Phil

() ASCII ribbon campaign () Hopeless ribbon campaign
/\ against HTML mail /\ against gratuitous bloodshed

[stolen with permission from Daniel B. Cristofani]



____________________________________________________________________________________
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
http://autos.yahoo.com/new_cars.html

0 new messages