Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

I have no problems eating cereal...after it softens. Why is replacing a simple string so hard then?

0 views

Skip to first unread message

sam...@mytrashmail.com

unread,

Oct 13, 2006, 8:12:26 PM10/13/06

In other areas of my life, like eating oatmeal and getting dressed, I
have no real problems. Some might even say I am a savant.

But I am just beginning Perl, and things I think are easy turn out not
to me. Now, (said in Scarface voice) Let me introduce you to my lil'
friend!

My task is sooo deceptively simple: Just replace a simple string with
another string. How hard could that be?

My data file is here: http://home.comcast.net/~tankomail/preg.htm
And a sample is at the very bottom of this post. I just want to replace
/<form[.*]?*\/form>/ with the word "block"

Basically I just want to replace all <form> </form> fields and
everything in between with nothing, but in testing, I wanted to see my
work so I chose the word "block" as a good simple substitute which I
could then replace with nothing.

Way Below is my base code. But here, just under is the pulled line from
the base code that seems to be the issue:
$orgtext = Whey; # this one right here
$newtext = Popcorn;

The above works. I reduced it to it's simplest form as a sanity check.
Then I tried:

$orgtext = /[Ww]hey/; # this one right here
$newtext = Popcorn;

But beyond the most primitive replacement, I invariably get:

Use of uninitialized value in pattern match (m//) at
C:\russ\scripts\_Master_Snippets\clean_2_input_output_file.pl line 9.

Eventually I want to try:

$orgtext = /<form[.*]?*\/form>/; # this one right here
$newtext = block;

But I can't get past the staring blocks. I know this code works in
general, but my modifications seem to break it.

I also tried some while (<$intext>) variations, even removing the undef
$/ slurp line, so that the intext would receive the data line by line -
but no luck anywhere. I have spent quite a bit of research time trying
various things - but apparently it's not a trivial task.

Any suggestions as to:

1.) Is my basic model okay, slurping the whole file into a variable? or
2.) Should I use a while <> structure?

And even when I do get the simple Whey replaced with Popcorn - it only
does the first instance, basically, I am guessing, because there is no
iterative code in this script. And the only iterative examples I've
seen are not with a whole file in one "intext" variable, but always
with a while <> structure.

Your input and examples are GREATLY appreciated because the red spot on
my banging against the cubicle wall head is growing.

L,
Sam

---------------------------

Here is my base code.

$infile = 'C:\russ\weights\preg.htm';
$outfile = 'C:\russ\weights\preg_clean.htm';

# No, I am not pregnant, but I am helping a pregnant woman out!
No...not just helping her get
# her start either :)

$orgtext = Whey;
$newtext = Popcorn;

undef $/; #slurp mode, read files in a whole

open IN, $infile or die $!;
$intext = <IN>;
close IN;

$intext =~ s/$orgtext/$newtext/ms;
# the ms is for coping correctly with newlines (that can easily appear
in a binary).

open OUT, ">$outfile" or die $!;
print OUT $intext;
close OUT;

# replaces ALL occurrences of orgtext with newtext and places the
number of occurences in $count

--------data sample. link to complete data above

</div></td>
</tr>
<tr>
<td><div align="center"> <form method=POST
style="margin-bottom: 0"
action="https://www.linkpointcart.net/cgi-bin/cart.cgi">
<input type=hidden name="ViewCart"
value="ThreadsCart">
<input type=submit value="View Cart">
</form></div></td>
</tr>
<tr>
<td><div align="center"><form method=POST
style="margin-bottom: 0"
action="https://www.linkpointcart.net/cgi-bin/cart.cgi">
<input type=hidden name="CheckOut" value="Online">
<input type=hidden name="CartID" value="ThreadsCart">
<input type=submit value="Check Out">
</form></div></td>
</tr>
<tr>
<td><table width="100%" border="0" cellspacing="0"
cellpadding="0">
<tr>
<td><br><div align="center"><a
href="catalog.html"><img src="2005-menu/catalog-banner.gif" width="196"
height="50" border="0"></a></div></td>
</tr>
</table>
<div align="center"><font size="2" face="Arial,
Helvetica, sans-serif"><strong><br>
We want to hear from you.<br>
Suggest a NEW PRODUCT!!<br>
<a href="suggest.htm">:: click
here::</a></strong></font></div></td>
</tr>
</table></td>
</tr>
</table></td>
<td width="76%" height="28" valign="top"><div align="right"><img
src="2005-menu/top-image.gif" width="604" height="98" border="0"
usemap="#Map"></div></td>
</tr>
<tr>
<td valign="top"><br> 
<table width="90%" border="0" align="center" cellpadding="1"
cellspacing="1">
<tr>
<td><table width="560" border="0" align="center"
cellpadding="3" cellspacing="0">
<tr>
<td rowspan="2" valign="top"><div align="center"><img
src="bottles/whey-chocolate-s.gif" width="102" height="150"
border="0"><br>
<font color="#666666" size="1" face="Arial,
Helvetica, sans-serif"></font></div></td>
<td><div align="left"><font size="2" face="Arial,
Helvetica, sans-serif"><strong><font size="3">Whey
Protein<br>
Chocolate 3.3 lbs.</font><br>
54 grams of protein per serving<br>
<br>
</strong></font><font face="Verdana, Arial,
Helvetica, sans-serif"><strong><font size="3" face="Arial, Helvetica,
sans-serif">$
39.99</font></strong></font></div></td>
<td rowspan="2" valign="top"><div align="center"><img
src="bottles/whey-vanilla-s.gif" width="102" height="150"
border="0"><br>
</div></td>
<td><div align="left"><font size="2" face="Arial,
Helvetica, sans-serif"><strong><font size="3">Whey
Protein<br>
Vanilla </font><font size="2" face="Arial,
Helvetica, sans-serif"><strong><font size="3">3.3
lbs.</font></strong></font><br>
54 grams of protein per serving.<br>
<br>
</strong></font><font face="Verdana, Arial,
Helvetica, sans-serif"><strong><font size="3" face="Arial, Helvetica,
sans-serif">$
39.99</font></strong></font><font size="2"
face="Arial, Helvetica, sans-serif"><strong>
</strong></font></div></td>
</tr>
<tr>
<td><form method="post"
action="https://www.linkpointcart.net/cgi-bin/cart.cgi">
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td><font size="2" face="Arial, Helvetica,
sans-serif">Quantity:</font></td>
<td><font face="Verdana, Arial, Helvetica,
sans-serif">
<input type="text" name="VARQuantity"
value="1" size="4" />
</font></td>
</tr>
<tr>
<td colspan="2" align="center"> <font
face="Verdana, Arial, Helvetica, sans-serif">
<input type="hidden" name="VAR000" value="|"
/>
<input type="hidden" name="AddItem"
value="ThreadsCart|Lifesource Labs - Whey Protein Powder Chocolate
VAR000 $39.99|VARQuantity|||price5|||||||" />
<input name="submit" type="submit" value="Add
To Cart" />
</font></td>
</tr>
</table>
</form></td>
<td><form method="post"
action="https://www.linkpointcart.net/cgi-bin/cart.cgi">
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td><font size="2" face="Arial, Helvetica,
sans-serif">Quantity:</font></td>
<td><font face="Verdana, Arial, Helvetica,
sans-serif">
<input type="text" name="VARQuantity2"
value="1" size="4" />
</font></td>
</tr>
<tr>
<td colspan="2" align="center"> <font
face="Verdana, Arial, Helvetica, sans-serif">
<input type="hidden" name="VAR000" value="|"
/>
<input type="hidden" name="AddItem"
value="ThreadsCart|Lifesource Labs - Whey Protein Powder Vanilla VAR000
$39.99|VARQuantity|||price5|||||||" />
<input name="submit" type="submit" value="Add
To Cart" />
</font></td>
</tr>
</table>
</form></td>

Paul Lalli

unread,

Oct 14, 2006, 8:17:56 AM10/14/06

sam...@mytrashmail.com wrote:
> My data file is here: http://home.comcast.net/~tankomail/preg.htm
> And a sample is at the very bottom of this post. I just want to replace
> /<form[.*]?*\/form>/ with the word "block"

That pattern is not what you think it is. It is searching for "<form",
possibly followed by either a period or a star, followed by "/form>".
Is that what you meant? I'm guessing you wanted "<form" followed by
any amount of anything, but only as much as is necessary, followed by
"/form>". That's:
/<form(.*?)\/form>/

> Basically I just want to replace all <form> </form> fields and
> everything in between with nothing, but in testing, I wanted to see my
> work so I chose the word "block" as a good simple substitute which I
> could then replace with nothing.
>
> Way Below is my base code. But here, just under is the pulled line from
> the base code that seems to be the issue:
> $orgtext = Whey; # this one right here
> $newtext = Popcorn;

If you're using those pieces of code, you must not be using warnings.
Please start doing so. They will catch 90% of the errors you're
making. In particular, they will tell you you should be quoting your
strings, not using barewords. Additionally, please make sure you use
strict, which will force you to declare your variables.

my $orgtext = 'Whey';
my $newtext = 'Popcorn';

> The above works. I reduced it to it's simplest form as a sanity check.
> Then I tried:
>
> $orgtext = /[Ww]hey/; # this one right here
> $newtext = Popcorn;
>
> But beyond the most primitive replacement, I invariably get:
>
> Use of uninitialized value in pattern match (m//) at
> C:\russ\scripts\_Master_Snippets\clean_2_input_output_file.pl line 9.

Which is the correct warning message for the line of code you typed.
You using a pattern match, but not binding the pattern match to any
variable (using the =~ operator). Therefore, you are matching that
pattern against the $_ variable. The $_ variable does not currently
have a value, and so Perl warns you to that effect.

If $orgtext is supposed to be a string that will eventually be used in
a regular expression, then make it a string:
my $orgtext = ' [Ww]hey';
If $orgtext is supposed to be a regular expression that you will
pattern match against, make it a pattern match:
my $orgtext = qr/[Ww]hey/;

> Eventually I want to try:
>
> $orgtext = /<form[.*]?*\/form>/; # this one right here

Once again, that's assigning $orgtext to be the result of matching $_
to that pattern. And once again, your pattern isn't what you think it
is. See above.

> $newtext = block;
>
> But I can't get past the staring blocks. I know this code works in
> general, but my modifications seem to break it.

No, the code really doesn't work in general. The code you claimed
worked was assigning two strings (albeit in their bareword versions) to
a variable. Then you suddenly stopped doing that and started assigning
the result of a pattern match to that variable, using the //
delimiters.

> I also tried some while (<$intext>) variations, even removing the undef
> $/ slurp line, so that the intext would receive the data line by line -
> but no luck anywhere.

"Throw it at the wall and see what sticks" is rarely a good method of
programming. Look at the warning message, figure out what it's telling
you, and research that.

> I have spent quite a bit of research time trying
> various things - but apparently it's not a trivial task.

Yes, it really is. Instead of randomly typing code without knowing
what you're doing, your "research time" would have been better spent
actually doing research. Please read some basic Perl documentation.
In this case, start with:
perldoc perlretut

> Any suggestions as to:
>
> 1.) Is my basic model okay, slurping the whole file into a variable? or
> 2.) Should I use a while <> structure?

In general, you don't want to slurp unless you actually need to. In
this case, however, you do need to, because your pattern spans more
than one line of the file. So no, you cannot use the while() loop
approach in your case.

> And even when I do get the simple Whey replaced with Popcorn - it only
> does the first instance, basically, I am guessing, because there is no
> iterative code in this script.

STOP GUESSING. READ. LEARN. Read the documentation I pointed you to
above to see how to make a s/// operation replace all instances of a
pattern. Again, it's trivial, but you are not going to learn how to do
it by just guessing and typing random characters.

> And the only iterative examples I've
> seen are not with a whole file in one "intext" variable, but always
> with a while <> structure.

Massive red herring. Examples using while(<>) would have the same
problem if the pattern appeard twice on the same line. Only the first
on each line would be replaced.

> Your input and examples are GREATLY appreciated because the red spot on
> my banging against the cubicle wall head is growing.

Your cranium would suffer less damage if you took the time to read the
documentation instead of using a trial-and-error approach.

Paul Lalli

nobu...@gmail.com

unread,

Oct 14, 2006, 9:49:00 AM10/14/06

On Oct 14, 1:17 pm, "Paul Lalli" <mri...@gmail.com> wrote:
> If $orgtext is supposed to be a string that will eventually be used in
> a regular expression, then make it a string:

> my $orgtext = ' [Ww]hey';

> If $orgtext is supposed to be a regular expression that you will
> pattern match against, make it a pattern match:

> my $orgtext = qr/[Ww]hey/;

I think mustI disagree with Paul on this one.

I advocate, on the gounds of clarity, the use of qr// to quote regular
expression fragments that are later to be combined into larger regex.
The exception is when you are in a seriously time-critical context
where the wasted RE compilation would be significant.

In the OP's case there's no problem with using quotes but as soon as we
get backslashes it gets messy:

my $match_digits_dot_digits = qr/\d+\.\d+/; # Relatively clean
my $match_digits_dot_digits = '\\d+\\.\\d+'; Messy

sam...@mytrashmail.com

unread,

Oct 16, 2006, 10:48:30 AM10/16/06

comments inline like this jjjjjjjjjjjjjjjjjjjj:

Paul Lalli wrote:
> sam...@mytrashmail.com wrote:
> > My data file is here: http://home.comcast.net/~tankomail/preg.htm
> > And a sample is at the very bottom of this post. I just want to replace
> > /<form[.*]?*\/form>/ with the word "block"
>
> That pattern is not what you think it is. It is searching for "<form",
> possibly followed by either a period or a star, followed by "/form>".
> Is that what you meant? I'm guessing you wanted "<form" followed by
> any amount of anything, but only as much as is necessary, followed by
> "/form>". That's:
> /<form(.*?)\/form>/

jjjjjjjjjjjjjjjjjjjjjjjj:
Thank you...I got my brackets confused :\

>
> > Basically I just want to replace all <form> </form> fields and
> > everything in between with nothing, but in testing, I wanted to see my
> > work so I chose the word "block" as a good simple substitute which I
> > could then replace with nothing.
> >
> > Way Below is my base code. But here, just under is the pulled line from
> > the base code that seems to be the issue:
> > $orgtext = Whey; # this one right here
> > $newtext = Popcorn;
>
> If you're using those pieces of code, you must not be using warnings.
> Please start doing so. They will catch 90% of the errors you're
> making. In particular, they will tell you you should be quoting your
> strings, not using barewords. Additionally, please make sure you use
> strict, which will force you to declare your variables.

jjjjjjjjjjjjjjjjjjjjjjj:
So many examples, most of them, don't seem to declare their variables,
even in the perldocs. what benefit is there...aren't they
auto-declared when a value is assigned to them? I know in vbscript one
must declare or dimension...but I thought Perl was free from that
contraint? Again...so many examples don't seem to??

>
> my $orgtext = 'Whey';
> my $newtext = 'Popcorn';
>
>
> > The above works. I reduced it to it's simplest form as a sanity check.
> > Then I tried:
> >
> > $orgtext = /[Ww]hey/; # this one right here
> > $newtext = Popcorn;
> >
> > But beyond the most primitive replacement, I invariably get:
> >
> > Use of uninitialized value in pattern match (m//) at
> > C:\russ\scripts\_Master_Snippets\clean_2_input_output_file.pl line 9.
>
> Which is the correct warning message for the line of code you typed.
> You using a pattern match, but not binding the pattern match to any
> variable (using the =~ operator). Therefore, you are matching that
> pattern against the $_ variable. The $_ variable does not currently
> have a value, and so Perl warns you to that effect.

jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
Paul - this was said VERY well. It clicks now. It was said before, but
you were a bit more thorough and pedagogish. :)

>
> If $orgtext is supposed to be a string that will eventually be used in
> a regular expression, then make it a string:
> my $orgtext = ' [Ww]hey';
> If $orgtext is supposed to be a regular expression that you will
> pattern match against, make it a pattern match:
> my $orgtext = qr/[Ww]hey/;
>
>
> > Eventually I want to try:
> >
> > $orgtext = /<form[.*]?*\/form>/; # this one right here
>
> Once again, that's assigning $orgtext to be the result of matching $_
> to that pattern. And once again, your pattern isn't what you think it
> is. See above.

jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
Oops..you lost me again in that I thought an tacit comparison was
always made to $_ in the absence of an explicit =~ comparison. But
that line above seems to be the most basic assignment statement like
I've seen in myriad examples, albeit with a fault [.*] bracket bit
needing to be corrected.

In the same way I could assign a string to $orgtext, can't I assign a
regex to it for comparison to another variable or string later in the
script?

>
>
> > $newtext = block;
> >
> > But I can't get past the staring blocks. I know this code works in
> > general, but my modifications seem to break it.
>
> No, the code really doesn't work in general. The code you claimed
> worked was assigning two strings (albeit in their bareword versions) to
> a variable. Then you suddenly stopped doing that and started assigning
> the result of a pattern match to that variable, using the //
> delimiters.

jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
yes...I see your point and also possibly validation of assigning the
result of a pattern match to a variable? Or is the distinction "result
of a pattern match" vs assigning the regex pattern itself to a
variable?

>
> > I also tried some while (<$intext>) variations, even removing the undef
> > $/ slurp line, so that the intext would receive the data line by line -
> > but no luck anywhere.
>
> "Throw it at the wall and see what sticks" is rarely a good method of
> programming. Look at the warning message, figure out what it's telling
> you, and research that.

jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj

Yes...it's a bit of spaghetti on the wall, but based on the 50 or so
pages I've read of the perldocs. It all makes so much sense in the
docs...and the code on the page matches the logic in my head. I feel
like I can program an auto air-traffic control collision deterance
system based on my 50 pages. But...apparently some pilots aren't happy
with the system. Hey...there are more planes where those came from.

But yes...point taken. Read up or clam up. Understood. I have quite a
library of Perl books, but all the O'Reilley books are these talkative
friendly Alice in wonderland meets it's a small world rambles. I am
used to Linux book where a rule is stated tersely, an example given -
DONE!!! The perldocs are this way and are beloved by me. If I had 2
lifetimes...sure....teach me via anecdotal valley girl talk. But I
have so little time...is there a book that just gives a brief
explanation of code snippets, a few examples of each, and we're off?
Kind of like the orignal Bjorn C books.

>
> > I have spent quite a bit of research time trying
> > various things - but apparently it's not a trivial task.
>
> Yes, it really is. Instead of randomly typing code without knowing
> what you're doing, your "research time" would have been better spent
> actually doing research. Please read some basic Perl documentation.
> In this case, start with:
> perldoc perlretut

jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
Your "randomly typing code" line makes me warm all over. I love a funny
good critique!!!

I am a little guilty of benefitting from random code. It often works?
I once got a BMW to add 50 horsepower by randomly programming his chip,
albeit the horsepower only lasted until a piston was thrown 10 minutes
later. :( Random code is a roller coaster ride of fun!

>
> > Any suggestions as to:
> >
> > 1.) Is my basic model okay, slurping the whole file into a variable? or
> > 2.) Should I use a while <> structure?
>
> In general, you don't want to slurp unless you actually need to. In
> this case, however, you do need to, because your pattern spans more
> than one line of the file. So no, you cannot use the while() loop
> approach in your case.

jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
why can't while span lines? can't it be told to process until /n or
something? I know there are all sorts of catchall whitespace /t /n /r
characters to account for this? Again...something in my reading...but
it's all a blur...walls closing in....usb drive OVERHEATING!!!!

>
> > And even when I do get the simple Whey replaced with Popcorn - it only
> > does the first instance, basically, I am guessing, because there is no
> > iterative code in this script.
>
> STOP GUESSING. READ. LEARN. Read the documentation I pointed you to
> above to see how to make a s/// operation replace all instances of a
> pattern. Again, it's trivial, but you are not going to learn how to do
> it by just guessing and typing random characters.

jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
Yes...I am guessing a bit, but truly...honestly...based on lots of
reading and scouring examples. My code was VERY CLOSE to working wasn't
it? Wasn't it just a minor detail or two wrong?

But I am heeding your advice and will tatoo the perdocs to my arms. I
really am reading a lot...as time permits...it's just that I'm trying
to also be productive along the way...WITH A LITTLE HELP FROM MY
FRIENDS.

Hey...when I get "My First Air Collision Deterrant" system
coded...purely in Regex...would you like to be my test passenger? We
lost our first beta batch. Shhhhh.

>
> > And the only iterative examples I've
> > seen are not with a whole file in one "intext" variable, but always
> > with a while <> structure.
>
> Massive red herring. Examples using while(<>) would have the same
> problem if the pattern appeard twice on the same line. Only the first
> on each line would be replaced.

jjjjjjjjjjjjjjjjj: Now this seems true, but entirely addressable? Isn't
there a way to match 1st and 2nd and x occurances per line or spanning
lines?

>
> > Your input and examples are GREATLY appreciated because the red spot on
> > my banging against the cubicle wall head is growing.
>
> Your cranium would suffer less damage if you took the time to read the
> documentation instead of using a trial-and-error approach.
>
> Paul Lalli

jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
Thank you Paul

L,
Samiam

Paul Lalli

unread,

Oct 16, 2006, 11:13:19 AM10/16/06

sam...@mytrashmail.com wrote:
> comments inline like this jjjjjjjjjjjjjjjjjjjj:

There's really no need for that marker. That's what the > things are
for at the beginning of each quoted line.

> Paul Lalli wrote:
> > sam...@mytrashmail.com wrote:

> > If you're using those pieces of code, you must not be using warnings.
> > Please start doing so. They will catch 90% of the errors you're
> > making. In particular, they will tell you you should be quoting your
> > strings, not using barewords. Additionally, please make sure you use
> > strict, which will force you to declare your variables.
>

> So many examples, most of them, don't seem to declare their variables,
> even in the perldocs. what benefit is there...aren't they
> auto-declared when a value is assigned to them? I know in vbscript one
> must declare or dimension...but I thought Perl was free from that
> contraint? Again...so many examples don't seem to??

The examples in the docs are just that - examples. They are intended
for brevity, not thoroughness. As for the value of declaring your
variables, do you really trust yourself to NEVER typo a variable name?
Ever? Why would you want to run the risk of introducing an absurdly
hard to find bug, just to avoid typing the two characters 'my' in front
of your variable names? You understand that without strict, this
code:
$var1 = "Hello World\n";
#900 lines of code later....
print "I said: $varl\n";

will not give any errors at all? Instead, it will merrily print out "I
said: " followed by a newline, forcing you to try to figure out where
in the 900 previous lines of code you accidentally changed $var1 to an
empty string (never realizing that you accidentally typed a lowercase L
instead of a number one).

Additionally, declaring your variables explicitly allows you to declare
them in the shortest scope possible, meaning that your variables are
only visible in the block in which they are declared. This has many
benefits: the memory they use is returned to Perl when they go out of
scope; you have far less code to look through to find where something
went wrong with your variable; external modules that you use can't
accidentally change your variables.

> > > Eventually I want to try:
> > >
> > > $orgtext = /<form[.*]?*\/form>/; # this one right here
> >
> > Once again, that's assigning $orgtext to be the result of matching $_
> > to that pattern. And once again, your pattern isn't what you think it
> > is. See above.
>
> jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
> Oops..you lost me again in that I thought an tacit comparison was
> always made to $_ in the absence of an explicit =~ comparison.

Yes, exactly.

> But
> that line above seems to be the most basic assignment statement like
> I've seen in myriad examples, albeit with a fault [.*] bracket bit
> needing to be corrected.

No. You're still not understanding. Without a =~ operator, the bare
/match/ syntax ALWAYS makes a comparison to $_. You then assign the
results of that pattern match to another variable. You're using a
shortcut without realizing it. Expanding that line for completeness
would be:
$origtext = ( $_ =~ m/<form[.*]?*\/form>/);

> In the same way I could assign a string to $orgtext, can't I assign a
> regex to it for comparison to another variable or string later in the
> script?

Yes you can, but that is not what your syntax above does. If you want
to assign a regular expression to a variable, you have to use the qr//
operator, not the m// operator (which is what empty slashes implicitly
are):

my $origtext = qr/<form[.*]?*\/form>/;

> > > $newtext = block;
> > >
> > > But I can't get past the staring blocks. I know this code works in
> > > general, but my modifications seem to break it.
> >
> > No, the code really doesn't work in general. The code you claimed
> > worked was assigning two strings (albeit in their bareword versions) to
> > a variable. Then you suddenly stopped doing that and started assigning
> > the result of a pattern match to that variable, using the //
> > delimiters.
>
> jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
> yes...I see your point and also possibly validation of assigning the
> result of a pattern match to a variable? Or is the distinction "result
> of a pattern match" vs assigning the regex pattern itself to a
> variable?

Yes. A pattern match is this:
$x =~ /match/;
That is, look for 'match' inside $x. That expression *returns* a
value. If 'match' was found in $x, the expression returns 1. If it
was not, it returns the empty string. When you do:
$result = ($x =~ mmatch/);
you are assigning $result to be either 1 or the empty string.
Likewise, if you do:
$origtext = ($_ =~ m/<form>.*<\/orm>/);
you are assigning $orig text to be either 1 or the empty string,
depending on whether or not the pattern was found in $_. And as
already noted, the above can be reduced through some shortcuts to:
$origtext = /<form>.*<\/form>/;

To assign the regexp pattern itself, you have to use the qr// syntax,
not the m// syntax.

> But yes...point taken. Read up or clam up. Understood. I have quite a
> library of Perl books, but all the O'Reilley books are these talkative
> friendly Alice in wonderland meets it's a small world rambles. I am
> used to Linux book where a rule is stated tersely, an example given -
> DONE!!! The perldocs are this way and are beloved by me. If I had 2
> lifetimes...sure....teach me via anecdotal valley girl talk. But I
> have so little time...is there a book that just gives a brief
> explanation of code snippets, a few examples of each, and we're off?
> Kind of like the orignal Bjorn C books.

I don't know what books you have, so it's difficult to recommend
alternatives. Learning Perl is probably not what you want, as that's a
tutorial. Programming Perl is the canonical reference book. The Perl
Cookbook is a companion to Programming Perl, in that it offers a
multitude of "here's how you do <foo>" recipes.

> > > Any suggestions as to:
> > >
> > > 1.) Is my basic model okay, slurping the whole file into a variable? or
> > > 2.) Should I use a while <> structure?
> >
> > In general, you don't want to slurp unless you actually need to. In
> > this case, however, you do need to, because your pattern spans more
> > than one line of the file. So no, you cannot use the while() loop
> > approach in your case.
>
> jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
> why can't while span lines? can't it be told to process until /n or
> something?

Assumign you meant \n, that's exatcly what it does. Except it's not
while, it's the readline operator, <>. That reads until the first
newline that it finds, and then returns the current string. You then
operate on that string. Then it reads the next line. So let's say
your file is:
My form: <form><input type ="text" name="foo">
<input type="submit"></form>

And you process like so:
while (my $line = <$fh>) {
if ($line =~ /<form>.*<\/form>/) { do_something() }
}

The first time through the while loop, $line contains: 'My form:
<form><input type ="text" name="foo">'; Obviously, that line does not
match the pattern, so do_something() does not happen. Then the second
time through the while loop, $line contains: '<input
type="submit"></form>'. Clearly, that line doesn't match the pattern
either. So do_something() does not happen.

You need to have your entire file in one big string, so that you can
pattern-match against the entire thing at once. You can't look for a
pattern that spans multiple lines by looking only in one line at a
time.

> I know there are all sorts of catchall whitespace /t /n /r
> characters to account for this? Again...something in my reading...but
> it's all a blur...walls closing in....usb drive OVERHEATING!!!!

Yes, you can change the $/ variable to make Perl read until some other
character instead of the newline. That's what you did. You changed it
to the undefined value, which forces a single <> operation to read the
entire file into one gigantic string. That's exactly what you need to
do. But if you have only one string, which contains the entire file,
it makes no sense of any kind to have a while loop around it. The loop
would only execute once.

> > > And even when I do get the simple Whey replaced with Popcorn - it only
> > > does the first instance, basically, I am guessing, because there is no
> > > iterative code in this script.
> >
> > STOP GUESSING. READ. LEARN. Read the documentation I pointed you to
> > above to see how to make a s/// operation replace all instances of a
> > pattern. Again, it's trivial, but you are not going to learn how to do
> > it by just guessing and typing random characters.
>
> jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
> Yes...I am guessing a bit, but truly...honestly...based on lots of
> reading and scouring examples. My code was VERY CLOSE to working wasn't
> it? Wasn't it just a minor detail or two wrong?

No, you had severe syntax and logic errors. You have syntax errors in
assigning to variables. You have regexp syntax errors in looking for
the correct pattern. You have a complete logic error in trying to look
for one line-spanning pattern in a single line. You have a regexp bug
in wanting to find all instances, but not telling the regexp to look
for all instances. And those are just the ones I remember without
reviewing this entire thread.

> > > And the only iterative examples I've
> > > seen are not with a whole file in one "intext" variable, but always
> > > with a while <> structure.
> >
> > Massive red herring. Examples using while(<>) would have the same
> > problem if the pattern appeard twice on the same line. Only the first
> > on each line would be replaced.
>
> jjjjjjjjjjjjjjjjj: Now this seems true, but entirely addressable? Isn't
> there a way to match 1st and 2nd and x occurances per line or spanning
> lines?

You're asking two distinct questions here. The first answer is "Yes,
read perldoc perlretut to find out how to find all occurrences". The
second answer is "Yes, by changing $/ as you already did.". The two
answers have nothing to do with each other because the two questions
have nothing to do with each other.

$x = "This # string # has # many # signs\n";
$y = "So # does # this \n # one, \n # plus \n # newlines\n";
$x = ~ s/#/&/;
$y = ~ s/#/&/;

The above code would only replace the FIRST instance of # with & in
each of the two strings. The fact that $y contains multiple lines is
100% irrelevant.

Paul Lalli

sam...@mytrashmail.com

unread,

Oct 16, 2006, 1:38:27 PM10/16/06

Aren't my comments much easier to locate if identified with a new
string?

inline comments found near 888888888888

888888888888888888
Oh...I was just thinking that if I do an assign like:

$origtxt= '/foo(.*.bar/'
then I could do a match =~ preceding or subsequent in the code
but you are saying the match occurs the exact time the assign line is
processed: there is not consider previous matches in code above or
subsequent matches in the code below, because if one is not explicit in
this assign statement, a match will be made against $_

So, I have learned that one must identify and consider what $_ is
holding and realize it is being compared with any /regex/ at the exact
time the /regex/ is evaluated. right?

>
> > In the same way I could assign a string to $orgtext, can't I assign a
> > regex to it for comparison to another variable or string later in the
> > script?
>
> Yes you can, but that is not what your syntax above does. If you want
> to assign a regular expression to a variable, you have to use the qr//
> operator, not the m// operator (which is what empty slashes implicitly
> are):

8888888888

ooooohhhhhh....anytime one uses /foobar/ an m/foobar/ is always
implied. Hmmm, there is such inconsistent use, I inferred that each
was different, especially since I read that the m/foo/ means one can
use alternate delimiters. But again, I misunderstood what I read?
Perhaps the m/foo/ or s/foo/ have to be explicit to use alternate
delimiters such as m!foo! or s!foo! ?

8888888888

you wrote:
If you want
> to assign a regular expression to a variable, you have to use the qr//

Now that's something I have not come across in any of my tutorials or
perldocs. Thanks Paul! It would have taken me a much longer time to
find this bit...although...I bet it doesn't work the way I think it
does :)

>
> my $origtext = qr/<form[.*]?*\/form>/;
>
> > > > $newtext = block;
> > > >
> > > > But I can't get past the staring blocks. I know this code works in
> > > > general, but my modifications seem to break it.
> > >
> > > No, the code really doesn't work in general. The code you claimed
> > > worked was assigning two strings (albeit in their bareword versions) to
> > > a variable. Then you suddenly stopped doing that and started assigning
> > > the result of a pattern match to that variable, using the //
> > > delimiters.
> >
> > jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
> > yes...I see your point and also possibly validation of assigning the
> > result of a pattern match to a variable? Or is the distinction "result
> > of a pattern match" vs assigning the regex pattern itself to a
> > variable?
>
> Yes. A pattern match is this:
> $x =~ /match/;
> That is, look for 'match' inside $x. That expression *returns* a
> value. If 'match' was found in $x, the expression returns 1. If it
> was not, it returns the empty string. When you do:
> $result = ($x =~ mmatch/);
> you are assigning $result to be either 1 or the empty string.

888888888888

I see. I am going to print this out and keep it with my Perldocs. Very
insiteful Paul - again...thank you.

> Likewise, if you do:
> $origtext = ($_ =~ m/<form>.*<\/orm>/);
> you are assigning $orig text to be either 1 or the empty string,
> depending on whether or not the pattern was found in $_. And as
> already noted, the above can be reduced through some shortcuts to:
> $origtext = /<form>.*<\/form>/;

8888888888888888888888
$origtext = ($_ =~ m/<form>.*<\/orm>/); THIS equals
$origtext = /<form>.*<\/form>/; THIS

Very explanatory.

Oh...it just occurred to me that you are using a fancy newreader like
Agent, which parses the >>>> maybe? And that's why my otherwise
conspicuously helpful unique idents (8888) have no value for you?

>
> To assign the regexp pattern itself, you have to use the qr// syntax,
> not the m// syntax.

8888888888
got it!

>
> > But yes...point taken. Read up or clam up. Understood. I have quite a
> > library of Perl books, but all the O'Reilley books are these talkative
> > friendly Alice in wonderland meets it's a small world rambles. I am
> > used to Linux book where a rule is stated tersely, an example given -
> > DONE!!! The perldocs are this way and are beloved by me. If I had 2
> > lifetimes...sure....teach me via anecdotal valley girl talk. But I
> > have so little time...is there a book that just gives a brief
> > explanation of code snippets, a few examples of each, and we're off?
> > Kind of like the orignal Bjorn C books.
>
> I don't know what books you have, so it's difficult to recommend
> alternatives. Learning Perl is probably not what you want, as that's a
> tutorial. Programming Perl is the canonical reference book. The Perl
> Cookbook is a companion to Programming Perl, in that it offers a
> multitude of "here's how you do <foo>" recipes.

888888888888888
A 20 year Perl veteran at work recommended I read these Perl books in
this order:

1.) Beginning Perl
2.) Perl Best Practices

Then use Programming Perl and Perl Cookbook as needed.
I thought I might also make Perl, a Problem Solution approach a 1.5 on
the list?

88888888888888
I had a long talk with Mr. Twenty Year Perl Veteran (TYPV) and he said
that everyone goes through the stage of "Why can't regex be used to
process multi-line html files?"

And he said I just needed to go through it, take my lumps, and
eventually learn that cpan and modules are my best friends. He said,
ALWAYS use a module to parse any html and any multiple line data. He
said he scours CPAN and uses modules for most everything.

I told him I tried using HMLT parser, since deprecated but now a
wrapper for the Treebuilder module. I mentioned the documents were a
bit cryptic if not sparse.

He agreed and said that's just the way it was...but I would have to
learn how to use the modules in spite of this. He made a big
impression enough for me to think:

Just don't:
1.) Use regex for anything more than a simple replace
2.) Never attempt to parse html without a module
3.) Never try to parse multi-line code without a module.

But modules are going to be the steepest learning curve I think. I
think one has to pretty much know Perl very well to even begin to
understand the Module docs?

>
> > I know there are all sorts of catchall whitespace /t /n /r
> > characters to account for this? Again...something in my reading...but
> > it's all a blur...walls closing in....usb drive OVERHEATING!!!!
>
> Yes, you can change the $/ variable to make Perl read until some other
> character instead of the newline. That's what you did. You changed it
> to the undefined value, which forces a single <> operation to read the
> entire file into one gigantic string. That's exactly what you need to
> do. But if you have only one string, which contains the entire file,
> it makes no sense of any kind to have a while loop around it. The loop
> would only execute once.

88888888888888888
Yes...I see that now.

>
>
>
> > > > And even when I do get the simple Whey replaced with Popcorn - it only
> > > > does the first instance, basically, I am guessing, because there is no
> > > > iterative code in this script.
> > >
> > > STOP GUESSING. READ. LEARN. Read the documentation I pointed you to
> > > above to see how to make a s/// operation replace all instances of a
> > > pattern. Again, it's trivial, but you are not going to learn how to do
> > > it by just guessing and typing random characters.
> >
> > jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
> > Yes...I am guessing a bit, but truly...honestly...based on lots of
> > reading and scouring examples. My code was VERY CLOSE to working wasn't
> > it? Wasn't it just a minor detail or two wrong?
>
> No, you had severe syntax and logic errors. You have syntax errors in
> assigning to variables. You have regexp syntax errors in looking for
> the correct pattern. You have a complete logic error in trying to look
> for one line-spanning pattern in a single line. You have a regexp bug
> in wanting to find all instances, but not telling the regexp to look
> for all instances. And those are just the ones I remember without
> reviewing this entire thread.

888888888888888
Yes...but my pet hogs need some slop or they starve. In time, I will
put them on a diet as my code comes up to par with the help of yourself
and other Perl guardians.

Uri Guttman

unread,

Oct 16, 2006, 2:12:15 PM10/16/06

>>>>> "s" == samiam <sam...@mytrashmail.com> writes:

s> Aren't my comments much easier to locate if identified with a new
s> string?

s> inline comments found near 888888888888

a very useless thing. your comments are already demarked by lacking >
prefixes. notice how no one else needs to do that?

s> 88888888888888
s> I had a long talk with Mr. Twenty Year Perl Veteran (TYPV) and he said
s> that everyone goes through the stage of "Why can't regex be used to
s> process multi-line html files?"

s> Just don't:
s> 1.) Use regex for anything more than a simple replace

bullshit. why replace only? matching is fine. and you can do very
complex and powerful things in regexes. no need to restrict them to
simple stuff. you have to learn what they can do and how do use them
properly. you have a long way to go there.

s> 2.) Never attempt to parse html without a module

bullshit. in most cases that is correct but there are some where using
PROPERLY written regexes on FIXED (you know they won't change) html is
ok. but again, you have to know where and when this can work.

s> 3.) Never try to parse multi-line code without a module.

major bullshit. it all depends on the format of the file. even multiline
things in a file can be handled by regexes without modules. and you will
find many file formats without any modules to parse them. one more
time, you have to learn when and where to use a module to parse vs using
regexes and other stuff.

looks like your 20 year perl vet (and that makes little sense as perl is
barely that old if it is 20) isn't giving you good advice. and if he is
so good, why do you keep coming here for help?

s> But modules are going to be the steepest learning curve I think. I
s> think one has to pretty much know Perl very well to even begin to
s> understand the Module docs?

huh???

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Paul Lalli

unread,

Oct 16, 2006, 2:15:33 PM10/16/06

sam...@mytrashmail.com wrote:

> Paul Lalli wrote:
> > sam...@mytrashmail.com wrote:
> > > Paul Lalli wrote:
> > > > sam...@mytrashmail.com wrote:

> > No. You're still not understanding. Without a =~ operator, the bare
> > /match/ syntax ALWAYS makes a comparison to $_. You then assign the
> > results of that pattern match to another variable. You're using a
> > shortcut without realizing it. Expanding that line for completeness
> > would be:
> > $origtext = ( $_ =~ m/<form[.*]?*\/form>/);
>
>
> 888888888888888888
> Oh...I was just thinking that if I do an assign like:
>
> $origtxt= '/foo(.*.bar/'
> then I could do a match =~ preceding or subsequent in the code

No, you can't, but for a different reason. What you've typed here is
COMPLETELY DIFFERENT than your original. Those quotes make ALL the
difference. Yet another reason to use strict, and have it tell you
when you're using a bareword. When you say:
$origtext = /foo(.*)bar/;
that's assigning $origtext to the result of that pattern match applied
to $_
When you say:
$origtext = qr/foo(.*)bar/;
that's assigning $origtext to be a regular expression you will later
applie to some string.
Now this new one you've introduced is:
$origtext = '/foo(.*)bar/';
This one is a string that contains your original pattern plus two
slashes. Those slashes are no long pattern match delimiters. Now
they're part of the string. If you were to later use this string
within a regular expression, you would be looking for the slashes as
well as foo and bar.

> but you are saying the match occurs the exact time the assign line is
> processed: there is not consider previous matches in code above or
> subsequent matches in the code below, because if one is not explicit in
> this assign statement, a match will be made against $_

It's not the assignment that's implicit, it's the binding.
$foo = 'bar'; #assigning 'bar' to $foo
$foo =~ /bar/; #binding the pattern /bar/ to the string $foo.

"Assignment" means to make that variable have that value. "Binding"
means to apply a pattern match to an existing string.

> So, I have learned that one must identify and consider what $_ is
> holding and realize it is being compared with any /regex/ at the exact
> time the /regex/ is evaluated. right?

If and only if you type a pattern match operator without binding it to
another string.

> > > In the same way I could assign a string to $orgtext, can't I assign a
> > > regex to it for comparison to another variable or string later in the
> > > script?
> >
> > Yes you can, but that is not what your syntax above does. If you want
> > to assign a regular expression to a variable, you have to use the qr//
> > operator, not the m// operator (which is what empty slashes implicitly
> > are):
>
> 8888888888
>
> ooooohhhhhh....anytime one uses /foobar/ an m/foobar/ is always
> implied. Hmmm, there is such inconsistent use, I inferred that each
> was different, especially since I read that the m/foo/ means one can
> use alternate delimiters.

Yes it does. You can use any non-alphanumeric characters as delimters.
The slashes are the default delimiter. If you choose to use these
default delimiters, you are permitted to drop the 'm'. Both of these
facts (the implicit binding to $_ and the dropping of 'm') are spelled
out fairly clearly in perldoc perlop:
======================================
m/PATTERN/cgimosx
/PATTERN/cgimosx
Searches a string for a pattern match, and in scalar
context returns true if it succeeds, false if it
fails. If no string is specified via the "=~" or
"!~" operator, the $_ string is searched.
<snip>
If "/" is the delimiter then the initial "m" is
optional. With the "m" you can use any pair of
non-alphanumeric, non-whitespace characters as
delimiters.
======================================

> But again, I misunderstood what I read?
> Perhaps the m/foo/ or s/foo/ have to be explicit to use alternate
> delimiters such as m!foo! or s!foo! ?

Correct. If you want to choose alternate delimiters, you MUST use the
m. You must ALWAYS use the s for a s/// operation. Dropping that is
never permitted, even if you use the / as the delimiter.

> > to assign a regular expression to a variable, you have to use the qr//
>
> Now that's something I have not come across in any of my tutorials or
> perldocs.

perldoc perlop
================================================
qr/STRING/imosx
This operator quotes (and possibly compiles) its
STRING as a regular expression. STRING is
interpolated the same way as PATTERN in
"m/PATTERN/". If "'" is used as the delimiter, no
interpolation is done. Returns a Perl value which
may be used instead of the corresponding
"/STRING/imosx" expression.
================================================

> Thanks Paul! It would have taken me a much longer time to
> find this bit...although...I bet it doesn't work the way I think it
> does :)

Well, I can't speak to that as I don't know how you think it works. It
works exactly as specified in the above documentation, however....

> > Likewise, if you do:
> > $origtext = ($_ =~ m/<form>.*<\/orm>/);
> > you are assigning $orig text to be either 1 or the empty string,
> > depending on whether or not the pattern was found in $_. And as
> > already noted, the above can be reduced through some shortcuts to:
> > $origtext = /<form>.*<\/form>/;
>

> $origtext = ($_ =~ m/<form>.*<\/orm>/); THIS equals
> $origtext = /<form>.*<\/form>/; THIS

Correct, though I'd probably say that in the opposite order. The
second one expands internally to the first....

> Oh...it just occurred to me that you are using a fancy newreader like
> Agent, which parses the >>>> maybe? And that's why my otherwise
> conspicuously helpful unique idents (8888) have no value for you?

There is nothing fancy about Google Groups, I assure you. >>> is a
standard visual indicator for pretty much all Usenet and email mediums.

> A 20 year Perl veteran at work recommended I read these Perl books in
> this order:
>
> 1.) Beginning Perl
> 2.) Perl Best Practices
>
> Then use Programming Perl and Perl Cookbook as needed.
> I thought I might also make Perl, a Problem Solution approach a 1.5 on
> the list?

PBP is great for learning how to make robust well written Perl
programs. It's not a reference. You can't "look up" any feature of
the language in it. I can't speak to the other two as I've never read
them.

> I had a long talk with Mr. Twenty Year Perl Veteran (TYPV) and he said
> that everyone goes through the stage of "Why can't regex be used to
> process multi-line html files?"
>
> And he said I just needed to go through it, take my lumps, and
> eventually learn that cpan and modules are my best friends. He said,
> ALWAYS use a module to parse any html and any multiple line data. He
> said he scours CPAN and uses modules for most everything.

Mr TYPV is a very knowledgable and intelligent man. Follow his advice.
:-)

> I told him I tried using HMLT parser, since deprecated but now a
> wrapper for the Treebuilder module. I mentioned the documents were a
> bit cryptic if not sparse.

I don't know what you mean by "HMLT parser". If you meant
HTML::Parser, then I agree with you. I prefer HTML::TokeParser myself.
Also available on CPAN. In fact, if you downloaded and installed
HTML::Parser, you probably got HTML::TokeParser in the bundle.

> He agreed and said that's just the way it was...but I would have to
> learn how to use the modules in spite of this. He made a big
> impression enough for me to think:
>
> Just don't:
> 1.) Use regex for anything more than a simple replace

Disagree. Regexps can be arbitrarily complicated. They just take
practice.

> 2.) Never attempt to parse html without a module

Agree. HTML is not a regular-enough format for a RegExp to be able to
parse it.

> 3.) Never try to parse multi-line code without a module.

Disagree. A RegExp can parse multi-line data just fine, so long as you
make sure the string you're matching against actually *is* multiline,
and you're not just trying to match one line at a time.

> But modules are going to be the steepest learning curve I think. I
> think one has to pretty much know Perl very well to even begin to
> understand the Module docs?

"has to", no. But it helps.

Paul Lalli

sam...@mytrashmail.com

unread,

Oct 16, 2006, 3:50:10 PM10/16/06

Well...I don't know him very well at all and I tend to keep a stiff
upper lip at work. I prefer to forge my way outside of work, bring
value to the workplace...and if someone is amenable at work to helping
me...fine....but work is the last place where I want to voice my
experimental opinions on code. hope you can relate to that
sensitivity.

In other words - don't you guys go away!!!

Believe it or not, just THIS ONE THREAD has taken me light years from
where I was a few days ago.

I agree with your words Uri - I undoubtedly mis-represented the TYPV -
. He is extremely well paid, and handles major Perl projects with
apparent ease. So if he doesn't look good...it's only because I was his
representative. In fact Uri, you and he would probably get a long great
:)

thanks for the reponse!

L,
Samiam

sam...@mytrashmail.com

unread,

Oct 16, 2006, 4:01:32 PM10/16/06

comments inline with bbbbbbbbbbbb

bbbbbbbbbbbbbbbbbb
Ahhh....I recognized the logic but didn't know it was called "binding."
Thanks.

> means to apply a pattern match to an existing string.
>
> > So, I have learned that one must identify and consider what $_ is
> > holding and realize it is being compared with any /regex/ at the exact
> > time the /regex/ is evaluated. right?
>
> If and only if you type a pattern match operator without binding it to
> another string.

>
> > > > In the same way I could assign a string to $orgtext, can't I assign a
> > > > regex to it for comparison to another variable or string later in the
> > > > script?
> > >
> > > Yes you can, but that is not what your syntax above does. If you want
> > > to assign a regular expression to a variable, you have to use the qr//
> > > operator, not the m// operator (which is what empty slashes implicitly
> > > are):

bbbbbbbbbbbbbbbb
Yes...I believe I get it now. I am looking up each salient point you
raise in the perldocs.
You have spared me a lot of confused wandering!

bbbbbbbbbbbbbbbbb
I actually beat you to this doc, though I hadn't yet come to this
section :-)

bbbbbbbbbbbbbbbbbbbb
Please....God....don't let me be accidentally talking with a
co-worker.....Please....God....

Whenever I hear compliments like that flowing, makes me wonder....uh
oh...it's the man himself!!

Thanks for your help Paul!!!

L,
Samiam

Glenn Jackman

unread,

Oct 16, 2006, 7:51:34 PM10/16/06

At 2006-10-16 03:50PM, "sam...@mytrashmail.com" wrote:
> Well...I don't know him very well at all and I tend to keep a stiff
> upper lip at work. I prefer to forge my way outside of work, bring
> value to the workplace...and if someone is amenable at work to helping
> me...fine....but work is the last place where I want to voice my
> experimental opinions on code. hope you can relate to that
> sensitivity.

I've always believed the only stupid questions are the ones you don't
ask. Don't you think you could bring more value if you were to use the
resources available to you and ask questions right away so you could
become productive right away?

--
Glenn Jackman
Ulterior Designer

Brian Wilkins

unread,

Oct 17, 2006, 3:03:44 PM10/17/06

I suggest you take a look at the Perl module HTML:Parser here :
http://search.cpan.org/~gaas/HTML-Parser-3.35/Parser.pm

I have example code of what exactly you want to do at home, but I am at
work now. If I remember, I will post it here later.

Brian Wilkins

unread,

Oct 17, 2006, 5:09:33 PM10/17/06

Brian Wilkins wrote:
> I suggest you take a look at the Perl module HTML:Parser here :
> http://search.cpan.org/~gaas/HTML-Parser-3.35/Parser.pm
>
> I have example code of what exactly you want to do at home, but I am at
> work now. If I remember, I will post it here later.
>

Here is some code for you to examine. It strips the <span> tags and
leaves everything between <span> and </span>
# This function connects to Extern via the dispatch script
# and returns the CDRs for a specified period (based on BillingCycle)

sub connect_via_dispatch {

my $curl = Curl::easy::init();

if(!$curl) {
die "curl init failed!\n";
}

my ($temp_dc) = @_;
my ($DC) = "DC0".$temp_dc; # Add the string DC0 to the front since
the database stores
# DC numbers differently

$DC_NUM = $DC;

my $url = "https://www.mundotel.cc/cgi-bin/dispatch.cgi";
my $rawHTML = ""; # Stores the HTML returned from
dispatch.cgi?Service=CDR

$::errbuf = "";
Curl::easy::setopt($curl, CURLOPT_ERRORBUFFER, "::errbuf");

Curl::easy::setopt($curl, CURLOPT_URL, $url);
Curl::easy::setopt($curl, CURLOPT_NOPROGRESS, 1);
Curl::easy::setopt($curl, CURLOPT_TIMEOUT, 30);
Curl::easy::setopt($curl, CURLOPT_HEADERFUNCTION, \&header_callb);
Curl::easy::setopt($curl, CURLOPT_WRITEFUNCTION, \&body_callb);
Curl::easy::setopt($curl, CURLOPT_POST, 1);
Curl::easy::setopt($curl,
CURLOPT_POSTFIELDS,"Service=CDR&PIN=$DC&StartDate=$prev_m-$prev_d-$prev_y&EndDate=$endmonth-$endday-$endyear&PageItems=99999999&Offset=-6");

# Curl::easy::setopt($curl,
CURLOPT_POSTFIELDS,"Service=CDR&PIN=$DC&StartDate=$prev_m-$prev_d-$prev_y&EndDate=$month-$day-$year&PageItems=99999999");
# USE THE LINE BELOW IF YOU WANT TO MANUALLY RUN INVOICES FOR A DATE
RANGE
# MONTH MUST BE TWO DIGITS IN LENGTH

# Curl::easy::setopt($curl,
CURLOPT_POSTFIELDS,"Service=CDR&PIN=$DC&StartDate=08-1-2004&EndDate=08-31-2004&PageItems=99999999");
Curl::easy::setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
Curl::easy::setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
Curl::easy::setopt($curl, CURLOPT_SSL_VERIFYHOST, 1);
Curl::easy::setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0");
Curl::easy::perform($curl);
Curl::easy::cleanup($curl);
}

# Used with cURL; stores the raw HTML retrieved from Extern

sub body_callb {
my($chunk,$handle)=@_;
${handle} .= $chunk;
$rawHTML .= $chunk;
return length($chunk);
}

# Used with cURL; gets header for debugging purposes.

sub header_callb {
return length($_[0]);
}

# When we reach a </tr> tag, that means
# that we have reach the EOL and append
# a new line char to the end of $result

sub parse_html {
my $tp = HTML::TokeParser->new(\$rawHTML) or die "Can't open $!";

while (my $tag = $tp->get_tag) {
if($tag->[0] eq 'span') {
$result .= $tp->get_text("/span").",";
}
else {
if ($tag->[0] eq '/tr') {
$result .= "\n";
}
}
}
}

Brian Wilkins

unread,

Oct 17, 2006, 7:48:18 PM10/17/06

sam...@mytrashmail.com wrote:
[snipped]

Why don't you read in the web page into a scalar variable and chomp all
the \n characters. Then you have a large scalar variable all on one
line. Like so (caution not tested):

====

use strict;
use warnings;
use HTML::TokeParser;

$data_file="webpage.htm";

open(DAT, $data_file) || die("Could not open file!");

@raw_data=<DAT>;

close(DAT);

foreach $line (@raw_data)
{
chomp($line);
$rawHTML .= $line;
}

$rawHTML =~ s#<form .*?</form>#BLOCK#sg;

open(OUTFILE, ">parsed_file.htm");

print OUTFILE $rawHTML;

close(OUTFILE);

You can also use HTML::TokeParser. It may work better in this case.

sam...@mytrashmail.com

unread,

Oct 18, 2006, 2:30:30 PM10/18/06

Just wanted to say that I received some very nice offline emails and
wanted to say thank you in particular to Brian Wilkins. Thanks for your
code and your gracious nature Brian.

Also to Mr. Ritty aka Paul Lalli - sorry my dissension rankled you so.
I am still grateful for your input.

Thanks to all who contribute, but especially to those whose gifts are
vitriol free :-)

L,
S

0 new messages