Reading contents of file

Anand Kumar

unread,

Jan 9, 2006, 1:01:02 AM1/9/06

to begi...@perl.org

Hi

I am new to PERL. I have a doubt, The following script is used to read the contents of a file (word by word (or) word preceeded by a number), check for the existence of the matched pattern which is done by the help of bookref() sub, if it is found then the matched pattern is prefixed with the tag <book>. When I run the following script it goes to unending because it is not pointing to the next string after the complete execution of the loop . Please correct me

open IN, "r1.dat";
open OUT, ">r1.txt";
$/=undef;
$line=<IN>;
while ($line=~m/([123]?[\.\s]*[a-z\\=\.]+)/oi)
{
$book=$1;
print $book,"\n";
$t_book=$book;
$t_book=~s/\.//g;
$t_book=~s/\\l=([a-z]+)\\/$1/ig;
$t_book=~s/<(|\/)(B|I|SC|U)>//ig;
$quote=bibref($t_book);
if($quote)
{
$line=~s/($book)/<book>$1/ig;
}
}
print OUT $line;
close(IN);

Thanks in advance for the help.

Regards
Anand

Send instant messages to your online friends http://in.messenger.yahoo.com

Dhanashri Bhate

unread,

Jan 9, 2006, 2:00:46 AM1/9/06

to anand kumar, begi...@perl.org

Hi Anand,
You know where its going wrong then change it :)

Ok, I haven't gone into the regex etc. but I see these 2 major problems:

1. $line=<IN>; should be also there inside the while loop, to make the
program get the next line in the file, and hence eventually come out of the
loop when it reaches EOF.
Or, make the change the while loop as "while ( $line = <IN> )" and put the
regex check in "if" conditional.

2. Again, you want to make some changes to the line you've read, and then
print it to the output file, the line "print OUT $line;" should be in the
while loop, not outside.

See this:
while ( $line = <IN )
{
if ($line=~m/([123]?[\.\s]*[a-z\\=\.]+)/oi)
{
..
... do what u want if regex matches
...
print OUT $line;
}
}

Dhanashri

Anand Kumar

unread,

Jan 9, 2006, 2:25:31 AM1/9/06

to dhanash...@persistent.co.in, begi...@perl.org

Hi

I have already tried the way u have told but when i use ' while($line=<IN>)' and the regex in the if loop then the regex pattern is matched only once in the current line and does'nt get repeated for the same line with differnt bookname present in the same line.

For Eg.

If the input is Genesis 45. with 1 chron ..........

here genesis an 1 chron are the book names which can be identified by the function .

the output is:
<book>Genesis 45. eith 1 chron..........
here the loop does'nt get repeated for 1 chron.

Please help in this regard

Dhanashri Bhate <dhanash...@persistent.co.in> wrote:
Hi Anand,
You know where its going wrong then change it :)

Ok, I haven't gone into the regex etc. but I see these 2 major problems:

1. $line=; should be also there inside the while loop, to make the

program get the next line in the file, and hence eventually come out of the
loop when it reaches EOF.

Or, make the change the while loop as "while ( $line = )" and put the

regex check in "if" conditional.

2. Again, you want to make some changes to the line you've read, and then
print it to the output file, the line "print OUT $line;" should be in the
while loop, not outside.

See this:
while ( $line = {

if ($line=~m/([123]?[\.\s]*[a-z\\=\.]+)/oi)
{
..
... do what u want if regex matches
...
print OUT $line;
}
}

Dhanashri

-----Original Message-----
From: anand kumar [mailto:mca_ba...@yahoo.co.in]
Sent: Monday, January 09, 2006 11:31 AM
To: begi...@perl.org
Subject: Reading contents of file

Hi

I am new to PERL. I have a doubt, The following script is used to read the
contents of a file (word by word (or) word preceeded by a number), check for
the existence of the matched pattern which is done by the help of bookref()
sub, if it is found then the matched pattern is prefixed with the tag

. When I run the following script it goes to unending because it is
not pointing to the next string after the complete execution of the loop .
Please correct me

open IN, "r1.dat";
open OUT, ">r1.txt";
$/=undef;

$line=;

while ($line=~m/([123]?[\.\s]*[a-z\\=\.]+)/oi)
{
$book=$1;
print $book,"\n";
$t_book=$book;
$t_book=~s/\.//g;
$t_book=~s/\\l=([a-z]+)\\/$1/ig;
$t_book=~s/<(|\/)(B|I|SC|U)>//ig;
$quote=bibref($t_book);
if($quote)
{

$line=~s/($book)/$1/ig;
}
}
print OUT $line;
close(IN);

Thanks in advance for the help.

Regards
Anand

Send instant messages to your online friends http://in.messenger.yahoo.com

--
To unsubscribe, e-mail: beginners-...@perl.org
For additional commands, e-mail: beginne...@perl.org

Dhanashri Bhate

unread,

Jan 9, 2006, 2:42:35 AM1/9/06

to anand kumar, begi...@perl.org

Ok, got it now :-) , sorry, I misunderstood your problem earlier.

If you are expecting multiple matches in a single line, then i would
suggest, keep the regex match as it is i.e. while
($line=~m/([123]?[\.\s]*[a-z\\=\.]+)/oi),

But inside the loop when you a find a match and do relevant things, modify
$line in such a way, that you get the next ( may be replace the matched
substring with empty string? ) This way the lop will end once all matches on
the line are found.

Well I'm not a Perl expert, so this is what I could think of :-) But I'm
sure there must be a better way to do it!

Dhanashri

_____

From: anand kumar [mailto:mca_ba...@yahoo.co.in]
Sent: Monday, January 09, 2006 12:56 PM
To: dhanash...@persistent.co.in; begi...@perl.org
Subject: RE: Reading contents of file

Hi

I have already tried the way u have told but when i use '

while($line=<IN>)' and the regex in the if loop then the regex pattern is
matched only once in the current line and does'nt get repeated for the same
line with differnt bookname present in the same line.

For Eg.

If the input is Genesis 45. with 1 chron ..........

here genesis an 1 chron are the book names which can be identified by the
function .

the output is:

<book>Genesis 45. eith 1 chron..........

here the loop does'nt get repeated for 1 chron.

Please help in this regard

Dhanashri Bhate <dhanash...@persistent.co.in> wrote:

Hi Anand,
Yo! u know where its going wrong then change it :)

use...@davidfilmer.com

unread,

Jan 9, 2006, 3:04:39 AM1/9/06

to

Anand Kumar wrote:
> I am new to PERL.

Welcome to Perl (not PERL, BTW).

> read the contents of a file (word by word (or) word preceeded by a number),
> check for the existence of the matched pattern which is done by the help
> of bookref() sub, if it is found then the matched pattern is prefixed with the
> tag <book>.

Maybe you should reconsider your approach, which seems needlessly
complex. Since the namespace of Biblical books is rather small, you
can do something like this (note the hash which enumerates the books
(or a subset, in this example) and provides the desired tag,
essentially replacing your bookref subroutine, which you didn't
provide). I'm using some sample __DATA__ that I looked up at random
from a Bible, since I don't have a sample of your input file.

If this doesn't meet your needs (or get you going in the right
direction), prehaps you can be a little more specific and I can help
further...

#!/usr/bin/perl

my %book;
$book{$_}= "<book>$_" for qw/Matt Mark Luke John Acts 1Cor 2Cor/;

$/ = ''; #slurp mode
while (my $line = <DATA>) {
$line =~ s/$_/$book{$_}/ig for keys %book;
print $line;
}

__DATA__
In John 6, Our Lord says, "Unless you eat my flesh and
drink my blood, you have no life in you, for my flesh
is true food and my blood is true drink."

In 1Cor, St. Paul says, "It is for this reason that many
of you have died, not discerning the Body of the Lord."

--
http://DavidFilmer.com

use...@davidfilmer.com

unread,

Jan 9, 2006, 1:49:49 AM1/9/06

to

Anand Kumar wrote:
> I am new to PERL.

Welcome to Perl (not PERL, BTW).

> read the contents of a file (word by word (or) word preceeded by a number),

> check for the existence of the matched pattern which is done by the help
> of bookref() sub, if it is found then the matched pattern is prefixed with the
> tag <book>.

Maybe you should reconsider your approach, which seems needlessly

Anand Kumar

unread,

Jan 9, 2006, 3:39:04 AM1/9/06

to dhanash...@persistent.co.in, begi...@perl.org

Hi

I have tried the way u explained, Thanks for that But I have few new problems . The code after the modification is as below:

open IN, "r1.dat";
open OUT, ">r1.txt";
$/=undef;

$line=<IN>;
while($line=~m/([123]?[\.\s]*[a-z\\=\.]+)/cig)

{
$book=$1;
print $book,"\n";
$t_book=$book;
$t_book=~s/\.//g;
$t_book=~s/\\l=([a-z]+)\\/$1/ig;
$t_book=~s/<(|\/)(B|I|SC|U)>//ig;
$quote=bibref($t_book);
if($quote)
{

print OUT "<bibl>$book";
}
else
{
print OUT $book," ";
}
$line=~s/$book//;

}
close(IN);

Input:
Genesis 45. anand 1 chron mca 2 chron
1 chron kumar

output:
<bibl>Genesis. anand <bibl>1 chron mca <bibl>2 chron<bibl>1 chron
kumar

Here the problems are :
1) The new line character '\n' cannot be detected.
2)I am not able to print the unmatched data i.e in the above example '45' doesn't match the regex in the while loop so this cannot be printed in the output file.

Plz suggest me in the matter . Is there any way to find the data which is unmatched with the regex.

Thanks for the help

Regards
Anand

Dhanashri Bhate <dhanash...@persistent.co.in> wrote:
v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} Ok, got it now J , sorry, I misunderstood your problem earlier.

If you are expecting multiple matches in a single line, then i would suggest, keep the regex match as it is i.e. while ($line=~m/([123]?[\.\s]*[a-z\\=\.]+)/oi),
But inside the loop when you a find a match and do relevant things, modify $line in such a way, that you get the next ( may be replace the matched substring with empty string? ) This way the lop will end once all matches on the line are found.

Well I’m not a Perl expert, so this is what I could think of J But I’m sure there must be a better way to do it!

Dhanashri

---------------------------------

Dhanashri Bhate

unread,

Jan 9, 2006, 4:37:25 AM1/9/06

to anand kumar, begi...@perl.org

>>Here the problems are :

>>1) The new line character '\n' cannot be detected.

Well, I'm not getting exactly what your problem is, I'm afraid!

>>2)I am not able to print the unmatched data i.e in the above example '45'
doesn't match the regex in the while loop so this cannot be printed in the
output file.

Obviously it won't :-)

Ok, do this, have 2 separate variables, one for regex matches ( i.e. $line)
and one for printing to the output file ($lineout).

when u have a line from the file, copy the same to this new variable ( i.e.
$lineout = $line ).

Now, you'll replace the matched pattern by empty string in $line, but
$lineout will remain intact, and whatever changes you want to do in the line
before printing it to OUT file, do them in the $lineout. And print $lineout
to the OUT file. So unmatched part will also remain there.

>>Plz suggest me in the matter . Is there any way to find the data which is
unmatched with the regex.

Can you attach a sample input file, your expected output and the code with
it?