Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to match @ and # character at the beginning

27 views
Skip to first unread message

bubunia...@gmail.com

unread,
Jul 30, 2015, 10:36:45 PM7/30/15
to
Hi all,
I tried to match the ssn with the beginning of special characters as mentioned below (@ or # etc etc). But I am not getting correct output. I tried \b as well but it did not work as expected. Could you please help me in this regard?

#!/usr/bin/perl
use strict;
use warnings;

#test.txt contents:
#I folow #ssn
#I just found 10 #ssn
#I just found 20 @ssn
#good lesson learnt ssn
# hi there

#Output : 4


sub getssncount {
my $count = 0;
open(FILE,"< test.txt") or die $!;
while (<FILE>) {
if ($_ =~ /(\S+\s+\s*ssn)||^(@?=ssn)||(?=ssn)/i) {
print "Reading line : $_ :Match found \n" ;
$count++;
}
}

print "ssn Count: $count";
close(FILE);

}

&getssncount;

gamo

unread,
Jul 31, 2015, 12:54:53 AM7/31/15
to
El 31/07/15 a las 04:36, bubunia...@gmail.com escribió:
> I tried to match the ssn with the beginning of special characters as mentioned below (@ or # etc etc).

But I am not getting correct output.

Try putting a escape char \ before each special char. I.e. \@ and \#

--
http://www.telecable.es/personales/gamo/
The generation of random numbers is too important to be left to chance

bubunia...@gmail.com

unread,
Jul 31, 2015, 1:23:37 AM7/31/15
to
I tried putting the escape character as well. But it is giving wrong output.

if ($_ =~ /(\S+\s+\s*ssn)||^(\@\?=ssn)||^(\#\?=ssn)/i) {
$count++;
}

It is printing count=8 which is wrong. But when I tried :
if ($_ =~ /(\S+\s+\s*ssn)/i) {
$count++;
}

It prints 1 which is correct. So I suspect I am doing something wrong in @ part.

Jens Thoms Toerring

unread,
Jul 31, 2015, 1:44:29 AM7/31/15
to
bubunia...@gmail.com wrote:

> I tried to match the ssn with the beginning of special characters as
> mentioned below (@ or # etc etc). But I am not getting correct output. I
> tried \b as well but it did not work as expected. Could you please help me
> in this regard?

> #!/usr/bin/perl
> use strict;
> use warnings;

> #test.txt contents:
> #I folow #ssn
> #I just found 10 #ssn
> #I just found 20 @ssn
> #good lesson learnt ssn
> # hi there

> #Output : 4

> sub getssncount {
> my $count = 0;
> open(FILE,"< test.txt") or die $!;

You must be reading some rather old tutorial, you better
follow the modern ways and use a "normal" variable for
the file handle and the three-argument form of open():

open my $file, '<', 'test.txt' or die $!;

> while (<FILE>) {

This then becomes

while ( <$file> ) {

> if ($_ =~ /(\S+\s+\s*ssn)||^(@?=ssn)||(?=ssn)/i) {

No idea what this is supposed to do, but I guess you want
matches for all lines that contain 'ssn' (no matter if upper-
or lower-case), if preceeded by either a white-space, a '@'
or a '#'. Then it's rather simple:

if ( /[\s@#]ssn/i ) {

Or, if you want all lines where a white-space is followed by
'ssn' with, optionally, a single '#' or '@' between the white-
space and the 'ssn', use

if ( /\s[#@]?ssn/i ) {

There's also no need to refer to '$_', it's used auto-
matically for the match if there's nothing else.

> print "Reading line : $_ :Match found \n" ;

Unless you want to have a line-break before the ':Match...'
you better call 'chomp' before this - you've read the
line in '$_' from a file and the lines in the file all end
(except perhaps the very last one) in a "\n".

> $count++;
> }
> }
>
> print "ssn Count: $count";

You rather likely want a "\n" at the end here.

> close(FILE);

And that would become

close $file;

> }

> &getssncount;

Drop the '&' - Perl knows perfectly well that 'getssncount'
is a subroutine.
Regards, Jens
--
\ Jens Thoms Toerring ___ j...@toerring.de
\__________________________ http://toerring.de

gamo

unread,
Jul 31, 2015, 5:00:03 AM7/31/15
to
El 31/07/15 a las 07:23, bubunia...@gmail.com escribió:
> I tried putting the escape character as well. But it is giving wrong output.
>
> if ($_ =~ /(\S+\s+\s*ssn)||^(\@\?=ssn)||^(\#\?=ssn)/i) {
> $count++;
> }
>
> It is printing count=8 which is wrong. But when I tried :
> if ($_ =~ /(\S+\s+\s*ssn)/i) {
> $count++;
> }
>
> It prints 1 which is correct. So I suspect I am doing something wrong in @ part.

I suspect that is the ? part. That is a especial char that would not be
escaped, normally.

shar...@hotmail.com

unread,
Jul 31, 2015, 9:31:40 AM7/31/15
to
On Friday, 31 July 2015 08:06:45 UTC+5:30, bubunia...@gmail.com wrote:
> Hi all,
> I tried to match the ssn with the beginning of special characters as mentioned below (@ or # etc etc). But I am not getting correct output. I tried \b as well but it did not work as expected. Could you please help me in this regard?
>

> #test.txt contents:
> #I folow #ssn
> #I just found 10 #ssn
> #I just found 20 @ssn
> #good lesson learnt ssn
> # hi there
>
> #Output : 4
>

>
> if ($_ =~ /(\S+\s+\s*ssn)||^(@?=ssn)||(?=ssn)/i) {
> print "Reading line : $_ :Match found \n" ;
> $count++;
> }
>

ITYM, increment the match count, when you see (case insensitively):
1. a whitespace followed by "ssn", or
2. a w.s., then, an "@" followed by "ssn", or
3. a w.s., then a "#" followed by "ssn".

In perl, they are written as:
1. /\sssn/i ||
2. /\s@ssn/i ||
3. /\s#ssn/i

Combining all into one, we arrive at:
if ( /\sssn/i || /\s@ssn/i || /\s#ssn/i ) {
....increment the kounter ++
}

They can further be condensed, as you've already been instructed before, into:
/\s[@#]?ssn/i

There's a commenting option/modifier in perl regexes, /x,
using which you can document your regexes inline like as,

/
\s # a whitespace
[ @ \# ]? # followed by either an @ or a hash, one time or not at all
ssn # finally followed by an ssn, case insensitively
/ix

Although your data (test.txt) didn't have it, but it is nevertheless a good
idea to make your regexes match as tightly as possible. What if your data
had a line like this:
#Should the deadly Sunburn SSN-12 be matched

The regex as it exists will match the above line. If you didn't want it matched,
then make the ssn hug either the end-of-line or a whitespace:

/
\s # a whitespace
[ @ \# ]? # followed by either an @ or a hash, one time or not at all
ssn # followed by an ssn, case insensitively,
( \s | $ ) # the ssn should see either a white sp. or an end of line
/ix

perldoc perlretut and perldoc perlre hold more info on regexes in perl.

John Black

unread,
Jul 31, 2015, 10:21:07 AM7/31/15
to
In article <d20g9n...@mid.uni-berlin.de>, j...@toerring.de says...
> > open(FILE,"< test.txt") or die $!;
>
> You must be reading some rather old tutorial, you better
> follow the modern ways and use a "normal" variable for
> the file handle and the three-argument form of open():
>
> open my $file, '<', 'test.txt' or die $!;

I have a lot of old scripts that use the open(FILE,"< test.txt") form. Briefly, what is the
drawback to that syntax? Is it worthwhile going back and changing them if people are still
using the scripts? Thanks.

John Black

Jens Thoms Toerring

unread,
Jul 31, 2015, 11:49:08 AM7/31/15
to
I think this is a rather nice write-up (at least better
than I could do;-) about the advantages of the way it's
recommended to do it know-a-days:

http://perlmaven.com/open-files-in-the-old-way

Not that clearly mentioned is that when you use file
globs any name you use will be accepted, so you don't
get all the help 'use strict' will give you when you
make a typo. I.e. if you do

open( XXX, "<file.txt" ) or die $!;

and later

close( XXY );

you will only get a warning that 'XXY' is only used once
(and the file stays open) and not an immediate abort when
the script is started.

But I'm not convinced that the advantages are that big
that going back and changing all old scripts (and perhaps
accidentally breaking them in the process) would be worth
it (unless you have to re-work them anyway).

Best regards, Jens

Rainer Weikusat

unread,
Jul 31, 2015, 12:23:31 PM7/31/15
to
John Black <jbl...@nospam.com> writes:
> In article <d20g9n...@mid.uni-berlin.de>, j...@toerring.de says...
>> > open(FILE,"< test.txt") or die $!;
>>
>> You must be reading some rather old tutorial, you better
>> follow the modern ways and use a "normal" variable for
>> the file handle and the three-argument form of open():
>>
>> open my $file, '<', 'test.txt' or die $!;
>
> I have a lot of old scripts that use the open(FILE,"< test.txt") form.
> Briefly, what is the drawback to that syntax?

It conflates a data argument possibly coming from outside of the program
and one with very loose syntactical requirements (for UNIX(*), a
filename may contain anything except '/' and "\0") with a control
argument: This means the programmer has to make sure that all character
perl could possibly interpret are quoted and in return for this, the
interpreter has to parse it in order to recover the control argument and
unquote it again.

Another point would be that it uses a bareword to refer to an
(hopefully) otherwise unused glob in the current package which is used
to store the real filehandle. This means the filehandle is potentially
accessible from anywhere in the program, at best, it can be localized to
ensure that using it in this way won't affect the calling code but it's
still going to be visible to any called code. And its lifetime has to
be managed with explicit code: Unless the glob is either localized or
closed with some close call, the filehandle will just stay open. Using a
lexical variable holding an anonymous glob means it will be closed as
soon as this variable goes out of scope[*].

[*] As opposed to a myth propagated by a popular Linux man page,
checking the return value of a close is a pointless exercise, unless
printing a "Hey, your data just got lost!" message is considered useful
for something. Code which cares about physical writes has to flush
everything manually and use fsync and deal with the consequences before
closing the filehandle.

shar...@hotmail.com

unread,
Jul 31, 2015, 5:34:16 PM7/31/15
to
On Friday, 31 July 2015 19:51:07 UTC+5:30, John Black wrote:
[snip]
> >
> > open my $file, '<', 'test.txt' or die $!;
>
> I have a lot of old scripts that use the open(FILE,"< test.txt") form. Briefly, what is the
> drawback to that syntax? Is it worthwhile going back and changing them if people are still
> using the scripts? Thanks.
>
> John Black
>


> open(FILE,"< test.txt") form

This would make perl open the file "test.txt" for you, i.e., the leading space
is dropped.
With the 2-argument form of open you would have a hard time opening
a file actually named " test.txt", i.e., one with a leading a space.
Not so with the 3-arg form of open, though,
open FILE, "<", " test.txt" ...

Say, your file is named >test.txt , with the 2-arg form of open
it can be potentially overwritten if one is not careful.
my $fn = $ARGV[0]; #user enters ">test.txt"
open(FILE, $fn) ...

Lo & behold the file ">test.txt" will be nulled in the process of being read :-\

So, we see that based on the input filenames , open 2-arg is not reliable.

What's more, the 3-arg form is visually very appealing. The purpose and argument
stand out clearly, whatever be the filenames.

This is part of the problem. The other half of the pbm. comes from the 1st arg.
to open, the filehandle. (FILE in this case) Notice that a FILE is a "bareword".
A subroutine is also one. So if an unsuspecting user/maintainer were to add
a subroutine named FILE then things can get topsy turvy, & insidious, if it
happens to return a filehandle which happens to be used in your code.

Since filehandles are typically global (unless localized), they are open to
modifs. A subroutine might open FILE in it's scope & the coder closes it as
well in the subroutine. That action closes YOUR FILE filehandle as well.
Ofc, this can be avoided by localizing FILE within the sub.

Global filehandles of the *FILE variety suffer from file locking problems.
Say we open & then lock a file , but then either forget to close it or our
script gets aborted midway, but the files stays locked, thereby keeping all
potential users in a hanging mode.
Lexical FHs of the my $FILE variety are immune from such vagaries, as they
automagically close when the lexical goes out of scope.

So looking at all these problems, if you get the chance for a re-work on the
code, then upgrade all 2-arg opens to 3-arg forms.

More info can be seen here:

"Two-arg open() considered dangerous" on the hyperlink..
http://www.perlmonks.org/index.pl?node_id=131085
0 new messages