Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

grouping in regex

5 views
Skip to first unread message

punit jain

unread,
Dec 23, 2012, 8:27:38 AM12/23/12
to begi...@perl.org
Hi,

I am doing grouping but seeing some weird behavior :-

the strings in a file are like :-
AccessModes =
(18,Mail,POP,IMAP,PWD,WebMail,WebSite,Relay,Mobile,FTP,MAPI,TLS,LDAP,WebCAL);
....
...
.
multiple lines

I am seeing which lines have both POP and Webmail as below :-

if( $line =~ /AccessModes\s*=\s*.*(WebMail)*.*(POP).*(WebMail)*.*/ ) {
if(defined $2) {
print "$2 $1"."$line"."\n";
}
}


However I get these below :-

POPUse of uninitialized value in concatenation (.) or string at
test.plline 283, <GEN85> line 2.
POPUse of uninitialized value in concatenation (.) or string at
test.plline 283, <GEN86> line 2.
POPUse of uninitialized value in concatenation (.) or string at
test.plline 283, <GEN87> line 2.

Any clue why ?

Regards.

Ken Slater

unread,
Dec 23, 2012, 11:01:36 AM12/23/12
to punit jain, begi...@perl.org
Hi,
Your basic mistake is making the WebMail strings you are searching for
optional by having an asterisk (*) following them. This means zero or
more instances of the strings. Thus once it matches POP, it is quite
happy to match zero instances of WebMail. I included some code below
to demonstrate:

use strict;
use warnings;
use Data::Dumper;

my @regExps = (
# Original
qr/AccessModes\s*=\s*.*(WebMail)*.*(POP).*(WebMail)*.*/,
# Here we make the second WebMail non-optional
(removed following *)
# so it will match POP with a WebMail later on the line
qr/AccessModes #literal match
\s* # optionasl whitespace
= # literal equal sign
\s* # optional whitespace
.* # match as many characters
(WebMail)* # literal WebMail, but this is optional
# Therefore, it is skipped if POP is found first
.* # match as many characters as we can
(POP) # match literal POP
.* # match as many characters as we can
# until we see WebMail
(WebMail) # match literal WebMail
/x,
# Make the matches non-optional and check for both strings
# at the same time using 'or' (|).
# This is probably the method you want to use.
qr/AccessModes #literal match
\s* # optionasl whitespace
= # literal equal sign
\s* # optional whitespace
.* # match as many characters as we can
# until we find WebMail or POP
(WebMail|POP) # literal WebMail or POP
.* # match as many characters as we can
# non-greedily, so it will match second
# occurence
(WebMail|POP) # literal WebMail or POP
/x
);

while (my $line = <DATA>) {
print $line;
for my $re ( @regExps ) {
print " RE: $re\n";
if( $line =~ /$re/) {
print " 1: >$1<\n" if defined($1);
print " 2: >$2<\n" if defined($2);
print " 3: >$3<\n" if defined($3);
}
}
}

__DATA__
AccessModes = (POP);
AccessModes = (18,Mail,POP,IMAP,PWD,WebMail,WebSite,Relay,Mobile,FTP,MAPI,TLS,LDAP,WebCAL);
AccessModes = (18,Mail,IMAP,PWD,WebMail,WebSite,POP,Relay,Mobile,FTP,MAPI,TLS,LDAP,WebCAL);


HTH, Ken

Danny Gratzer

unread,
Dec 23, 2012, 1:37:09 PM12/23/12
to Ken Slater, punit jain, begi...@perl.org
Shouldn't that *.** be *.*? *to avoid having it consume everything?
> --
> To unsubscribe, e-mail: beginners-...@perl.org
> For additional commands, e-mail: beginne...@perl.org
> http://learn.perl.org/
>
>
>


--
Danny Gratzer

John W. Krahn

unread,
Dec 23, 2012, 5:19:14 PM12/23/12
to Perl Beginners
Danny Gratzer wrote:
> Shouldn't that *.** be *.*? *to avoid having it consume everything?
>

It is not clear exactly which *.** you are referring to however a
non-greedy match does not necessarily consume less than a greedy match.



John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

Zachary Bornheimer

unread,
Dec 23, 2012, 5:43:13 PM12/23/12
to punit jain, begi...@perl.org
For this error, try something like this:

Take note of any regex changes that you were advised in other emails

if ( $line =~ /AccessModes\s*=\s*.*(WebMail)*.*(POP).*(WebMail)*.*/ ) {
if (defined $2 && $2 && $1 ) {
print $2 . " " . $1 . $line . "\n";
}
}

Make sure that $2 is initialized. $2 may be defined, but not initialized.
Because $2 cannot be defined if $1 is not defined, we don't need to check
if $1 is defined, but we should check if it is initialized. (Note, I
didn't make this my style, but I did change it a little).


## Z. Bornheimer


On Sun, Dec 23, 2012 at 8:27 AM, punit jain <contactp...@gmail.com>wrote:

shawn wilson

unread,
Dec 23, 2012, 6:00:19 PM12/23/12
to Zachary Bornheimer, punit jain, begi...@perl.org
it might be better to use a named capture here since you're expecting
certain things and then you don't have to do if defined.

Rob Dixon

unread,
Dec 24, 2012, 5:44:03 AM12/24/12
to begi...@perl.org, punit jain
You would be bettre off using a look-ahead, which doesn't care what
order the two options appear in the list. The program below shows an
example.

Rob

use strict;
use warnings;

while (my $line = <DATA>) {
if ($line =~ /AccessModes\s*=\s*(?=.*(\bPOP\b))(?=.*(\bWebMail\b))/) {
print "$1 $2\n";
}
}


__DATA__
AccessModes =
(18,Mail,POP,IMAP,PWD,WebMail,WebSite,Relay,Mobile,FTP,MAPI,TLS,LDAP,WebCAL);
AccessModes =
(18,Mail,IMAP,PWD,WebMail,WebSite,Relay,Mobile,FTP,MAPI,TLS,LDAP,POP,WebCAL);

**output**

POP WebMail
POP WebMail

Paul Johnson

unread,
Dec 24, 2012, 8:08:13 AM12/24/12
to punit jain, begi...@perl.org
It's unclear to me why, having specified what you are searching for, you
then ask what was found. The only reason I can see for doing that would
be to find out in which order you found the items.

In any case, the easiest way to find out whether two substrings appear
in the same same string is to program the way you define the problem:

if (/WebMail/ && /POP/) { ... }

--
Paul Johnson - pa...@pjcj.net
http://www.pjcj.net

Rob Dixon

unread,
Dec 24, 2012, 3:01:11 PM12/24/12
to begi...@perl.org, Paul Johnson
Hi Paul

I think

> why, having specified what you are searching for, you then ask what was found

could be expressed better, or at least needs an explanation.

Rob




Paul Johnson

unread,
Dec 25, 2012, 2:30:01 PM12/25/12
to Rob Dixon, begi...@perl.org
On Mon, Dec 24, 2012 at 08:01:11PM +0000, Rob Dixon wrote:
> On 24/12/2012 13:08, Paul Johnson wrote:
> >On Sun, Dec 23, 2012 at 06:57:38PM +0530, punit jain wrote:
> >>I am seeing which lines have both POP and Webmail as below :-
> >>
> >>if( $line =~ /AccessModes\s*=\s*.*(WebMail)*.*(POP).*(WebMail)*.*/ ) {
> >> if(defined $2) {
> >> print "$2 $1"."$line"."\n";
> >> }
> >> }

> Hi Paul
>
> I think
>
> >why, having specified what you are searching for, you then ask what was found
>
> could be expressed better, or at least needs an explanation.

Yes, you're probably correct.

My point was that the captured expressions are constants: "WebMail",
"POP" and "WebMail" again. So if $1, $2 and $3 are defined then you
already know what their values are.

So, once you know that the regular expression has matched, your only
interest in $1, $2 and $3 is to see whether $1 and $3 are defined which
will tell you the order of the items on the line. My guess is that this
is not important to know.

So the "specified what you are searching for" part is the constant
captured expressions, and the "ask what was found" bit is examining $1,
$2 and $3.
0 new messages