Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

No Question - Just Posting some Code I'm Proud of

2 views
Skip to first unread message

Mike Flannigan

unread,
Aug 1, 2003, 11:39:46 AM8/1/03
to

I'm just posting some code I wrote and asking for any advice
you might have. It may not work perfectly, but it's got me
fooled into thinking it does. Keep in mind I hadn't even
heard of Perl until about 50 days ago.

Believe it or not, I'm kinda proud of this code. That's only
because I know how much improved it is over my first
draft. If anybody has any suggestions, I want them.

_______________________________________

# This file takes a CSV file,
# removes offending characters from the name fields
# and gives the 6 digit names field unique names.
# It also shortens the long name field a little.

use strict;
use warnings;

my $rep = 1;
my $file = "bran2.csv";
my (@temp, $temp);
open WPIN,"<$file" or die "Can't create $file: $!";

while (<WPIN>) {
my ($wp1, $deg1, $name1, $lat, $long, $date, $time, $name2) = split
/,/, $_, 8;
push @temp, ($wp1, $deg1, $name1, $lat, $long, $date, $time, $name2);

}


# Replace bad characters
for (my $turn = 10; $turn <= $#temp; $turn+=8) {
$temp[$turn] =~ tr/()#.:\/?'&!//d;
$temp[$turn+5] =~ s/\(historical\)/\(hist\)/g;
$temp[$turn+5] =~ s/\(abandoned\)/\(aban\)/g;
$temp[$turn+5] =~ s/National Recreation Area/NRA/g;
$temp[$turn+5] =~ s/Cemetery/Cem/g;
$temp[$turn+5] =~ s/ North / N /g;
$temp[$turn+5] =~ s/ North$/ N/g;
$temp[$turn+5] =~ s/^North /N /g;
$temp[$turn+5] =~ s/ South / S /g;
$temp[$turn+5] =~ s/ South$/ S/g;
$temp[$turn+5] =~ s/^South /S /g;
$temp[$turn+5] =~ s/ East / E /g;
$temp[$turn+5] =~ s/ East$/ E/g;
$temp[$turn+5] =~ s/^East /E /g;
$temp[$turn+5] =~ s/ West / W /g;
$temp[$turn+5] =~ s/ West$/ W/g;
$temp[$turn+5] =~ s/^West /W /g;
$temp[$turn+5] =~ s/High School/HS/g;
$temp[$turn+5] =~ s/Community College/College/g;
$temp[$turn+5] =~ s/Campground/Campgr/g;
$temp[$turn+5] =~ s/Post Office/PO/g;
$temp[$turn+5] =~ s/Junior/Jr/g;
$temp[$turn+5] =~ s/Mountain/Mtn/g;
}


# Rename duplicate name1's
for (my $turn = 10; $turn <= $#temp; $turn+=8) {
next if ($temp[$turn] ne $temp[$turn-8]);
$rep++;
if (length $temp[$turn-8] > 3){
substr ($temp[$turn-8], 4, 2, "0$rep") if ($rep < 10);
substr ($temp[$turn-8], 4, 2, "$rep")if ($rep > 9);
$rep = 1 if ($temp[$turn] ne $temp[$turn+8]);
}
else {
$temp[$turn-8] =~ s/^(.{0,4})$/${1}0$rep/ if ($rep < 10);
$temp[$turn-8] =~ s/^(.{0,4})$/${1}$rep/ if ($rep > 9);
$rep = 1 if ($temp[$turn] ne $temp[$turn+8]);
}
}

# print to the wpout.txt file
open WPOUT,">WPout.txt" or die "Can't create WPout.txt: $!";

foreach my $printout ( @temp ) {
if ($printout =~ /\n$/) {print WPOUT ("$printout")}
else {print WPOUT ("$printout,")}
}

close (WPOUT);

__END__

_______________________________________


If anybody feels like messing with this, here is some
good data to put in 'bran2.csv' to test it:

Datum,WGS84,WGS84,0,0,0,0,0
WP,DMS,Adkins,35.5143,-93.4256,12/31/1989,0:00:00,Adkins Cemetery
WP,DMS,Adkins,35.2233,-93.4800,12/31/1989,0:00:00,Aetna Gas Field
(historical)
WP,DMS,Adkins,36.0800,-93.4042,12/31/1989,0:00:00,Alabam School hist
WP,DMS,Adkins,36.3807,-93.1326,12/31/1989,0:00:00,Alexander Park
WP,DMS,Adkins,35.2546,-94.0115,12/31/1989,0:00:00,Anice hist
WP,DMS,Bakhom,33.4855,-94.3659,12/31/1989,0:00:00,Bakhoma Recreation
West
WP,DMS,Ba/ld ,35.1131,-94.4803,12/31/1989,0:00:00,Bald Knob
WP,DMS,B/??i,35.2700,-94.4352,12/31/1989,0:00:00,East Baldridge Mountain

WP,DMS,Bali,38.4356,-90.4107,12/31/1989,0:00:00,Barbs Northhampton House

WP,DMS,Basin ,36.1758,-93.2119,12/31/1989,0:00:00,Mountain Spring
WP,DMS,B&&t)e,35.3608,-93.4713,12/31/1989,0:00:00,Eastmore Springs
WP,DMS,Be!r(C,36.1751,-93.1059,12/31/1989,0:00:00,Bear West Springs


I thought I could replace:
$temp[$turn+5] =~ s/ East / E /g;
$temp[$turn+5] =~ s/ East$/ E/g;
$temp[$turn+5] =~ s/^East /E /g;
with:
$temp[$turn+5] =~ s/([^|\s])East(\s)/$1E$2/g;

but it did not work for me :-(

$temp[$turn+5] =~ s/(\s)East(\s)/$1E$2/g;
almost works, but not quite since it doesn't work if
"East" is the first thing in the string.


My next step is to push $lat and $long together and
print a file of records that have the same $lat . $long
Probably pretty easy, but I may have to do brute force
way if I don't get smart on hashes real soon.


Mike


Brian McCauley

unread,
Aug 1, 2003, 12:48:01 PM8/1/03
to
Mike Flannigan <mike...@earthlink.net> writes:

> I'm just posting some code I wrote and asking for any advice
> you might have. It may not work perfectly, but it's got me
> fooled into thinking it does. Keep in mind I hadn't even
> heard of Perl until about 50 days ago.

With that in mind it is very good. You are however not thinking in
Perl. You are thinking in some other language and translating.

>
> Believe it or not, I'm kinda proud of this code. That's only
> because I know how much improved it is over my first
> draft. If anybody has any suggestions, I want them.
>
> _______________________________________
>
> # This file takes a CSV file,
> # removes offending characters from the name fields
> # and gives the 6 digit names field unique names.
> # It also shortens the long name field a little.
>
> use strict;
> use warnings;
>
> my $rep = 1;
> my $file = "bran2.csv";
> my (@temp, $temp);

The variable $temp is never used as far as I can see.

Having two variables of the same name (@temp and $temp) can be
confusing. I do it myself sometimes but I usually feel guilty.

@temp is not a good name for a variable - especially not a good name
for the longest lived variable in your whole program.

> open WPIN,"<$file" or die "Can't create $file: $!";

The word "create" should be "read".

Bareword filehandles and 2-argument open are no last-centuary :-).
(IMHO they should only be used where bacward compatability is an issue).

open my $wpin,'<',$file or die "Can't read $file: $!";


> while (<WPIN>) {
> my ($wp1, $deg1, $name1, $lat, $long, $date, $time, $name2) = split
> /,/, $_, 8;
> push @temp, ($wp1, $deg1, $name1, $lat, $long, $date, $time, $name2);
>
> }

Using 8 scalars to hold a list is a bad idea, use an array.

Abusing a 8*n element 1D array as a n x 8 element 2D array is a bad
idea, use a 2D array.

You also don't need to worry about the 8 if you do this.

You probably should chomp your incomming data.

while (<$wpin>) {
chmop;
push @temp, [ split /,/ ];
}


> # Replace bad characters
> for (my $turn = 10; $turn <= $#temp; $turn+=8) {
> $temp[$turn] =~ tr/()#.:\/?'&!//d;
> $temp[$turn+5] =~ s/\(historical\)/\(hist\)/g;
> $temp[$turn+5] =~ s/\(abandoned\)/\(aban\)/g;
> $temp[$turn+5] =~ s/National Recreation Area/NRA/g;
> $temp[$turn+5] =~ s/Cemetery/Cem/g;
> $temp[$turn+5] =~ s/ North / N /g;
> $temp[$turn+5] =~ s/ North$/ N/g;
> $temp[$turn+5] =~ s/^North /N /g;
> $temp[$turn+5] =~ s/ South / S /g;
> $temp[$turn+5] =~ s/ South$/ S/g;
> $temp[$turn+5] =~ s/^South /S /g;
> }

If you'd used a 2D array could iterate of the _elements_ of a list
directly - would not need to faff about with array subscripts.

Why do you start at 10 not 2? If you want to discard the top row (a
legend?) then I would take it out of @temp and put it in a diffent
variable.

You can relpace lots of =~ with for()

$wibble =~ s/this/that/;
$wibble =~ s/cat/dog/;

..becomes..

for ( $wibble ) {
s/this/that/;
s/cat/dog/;
}

You can use ORs in regular expressions;

s/ North / N /g;

s/ North$/ N/g;
s/^North /N /g;

.. is nearly the same as ..
s/(^| )North( |$)/$1N$2/g;

But you may want to consider using \b instead

s/\bNorth\b/N/g;


So putting it together...

my $legend = shift @temp;
for ( @temp ) {
$_->[2] =~ tr/()#.:\/?'&!//d;
for ( $_->[7] ) {


s/\(historical\)/\(hist\)/g;

s/\bNorth\b/N/g;
# etc....


}
}

> # Rename duplicate name1's
> for (my $turn = 10; $turn <= $#temp; $turn+=8) {
> next if ($temp[$turn] ne $temp[$turn-8]);
> $rep++;
> if (length $temp[$turn-8] > 3){
> substr ($temp[$turn-8], 4, 2, "0$rep") if ($rep < 10);
> substr ($temp[$turn-8], 4, 2, "$rep")if ($rep > 9);
> $rep = 1 if ($temp[$turn] ne $temp[$turn+8]);
> }
> else {
> $temp[$turn-8] =~ s/^(.{0,4})$/${1}0$rep/ if ($rep < 10);
> $temp[$turn-8] =~ s/^(.{0,4})$/${1}$rep/ if ($rep > 9);
> $rep = 1 if ($temp[$turn] ne $temp[$turn+8]);
> }
> }

I'm not going to try to understand that.

>
> # print to the wpout.txt file
> open WPOUT,">WPout.txt" or die "Can't create WPout.txt: $!";
>
> foreach my $printout ( @temp ) {
> if ($printout =~ /\n$/) {print WPOUT ("$printout")}
> else {print WPOUT ("$printout,")}
> }
>
> close (WPOUT);

open my $wpout,'>',"WPout.txt" or die "Can't create WPout.txt: $!";

for( $legend, @temp ) {
print $wpout join(',', @$_),"\n";
}

> I thought I could replace:
> $temp[$turn+5] =~ s/ East / E /g;
> $temp[$turn+5] =~ s/ East$/ E/g;
> $temp[$turn+5] =~ s/^East /E /g;
> with:
> $temp[$turn+5] =~ s/([^|\s])East(\s)/$1E$2/g;
>
> but it did not work for me :-(

Look up what [] mean in regular expressions - then remove them when
you discover that [] is in no way related to what you were tring to
do.

> $temp[$turn+5] =~ s/(\s)East(\s)/$1E$2/g;
> almost works, but not quite since it doesn't work if
> "East" is the first thing in the string.

I've addressed this above.

> My next step is to push $lat and $long together and
> print a file of records that have the same $lat . $long
> Probably pretty easy, but I may have to do brute force
> way if I don't get smart on hashes real soon.

The time taken to figure ot hashes would be well spent.

Actually there was no reason to slurp the whole file - you could have
processed the file with only one line at a time in memory. For a
small file this is not important.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\

Greg Bacon

unread,
Aug 1, 2003, 1:27:39 PM8/1/03
to
In article <3F2A8A4E...@earthlink.net>,
Mike Flannigan <mike...@earthlink.net> wrote:

: [...]
:
: I thought I could replace:


: $temp[$turn+5] =~ s/ East / E /g;
: $temp[$turn+5] =~ s/ East$/ E/g;
: $temp[$turn+5] =~ s/^East /E /g;
: with:
: $temp[$turn+5] =~ s/([^|\s])East(\s)/$1E$2/g;
:
: but it did not work for me :-(

You were close. See below for that and other improvements. Note that
mine inverts the order of the repeated locations in the output.

#! /usr/local/bin/perl

# This file takes a CSV file,
# removes offending characters from the name fields
# and gives the 6 digit names field unique names.
# It also shortens the long name field a little.

use strict;
use warnings;

my $file = "bran2.csv";
my (@temp, $temp);

open WPIN, "<", $file or die "$0: open $file: $!";

while (<WPIN>) {
my $fields = 8;
my $count = tr/,//;

if ($count != $fields-1) {
my $commas = $count == 1 ? "comma" : "commas";
warn "$0: $file:$.: $count $commas -- bad line?\n";
}

chomp;
push @temp => [ split /,/, $_, $fields ];
}

my %saw;
for (@temp) {
# Replace bad characters


$_->[2] =~ tr/()#.:\/?'&!//d;

for ($_->[-1]) {
s/\(historical\)/(hist)/g;
s/\(abandoned\)/(aban)/g;
s/National Recreation Area/NRA/g;
s/Cemetery/Cem/g;

s/(\s*) \b North \b (\s*)/${1}N${2}/gx;
s/(\s*) \b South \b (\s*)/${1}S${2}/gx;
s/(\s*) \b East \b (\s*)/${1}E${2}/gx;
s/(\s*) \b West \b (\s*)/${1}W${2}/gx;

s/High School/HS/g;
s/Community College/College/g;
s/Campground/Campgr/g;
s/Post Office/PO/g;
s/Junior/Jr/g;
s/Mountain/Mtn/g;
}

if ($saw{ $_->[2] }++) {
$_->[2] =~ s{^ (.{0,4}) (..)? (.*) $} {
$1 . sprintf("%02d", $saw{ $_->[2] }) . $3
}ex;
}
}

# print to the wpout.txt file

open WPOUT, ">", "WPout.txt" or die "$0: open WPout.txt: $!";

for (@temp) {
print WPOUT join("," => @$_), "\n";;
}

close WPOUT or warn "$0: close WPout.txt: $!";

__END__

: My next step is to push $lat and $long together and


: print a file of records that have the same $lat . $long
: Probably pretty easy, but I may have to do brute force
: way if I don't get smart on hashes real soon.

The trick there will be defining what you mean by the same latitude
and longitude. If it's as simple as the text in their respective
records being identical, then you could do the following in your
processing loop:

push @{ $latlon{ $_->[3], $_->[4] } }, [ @$_ ];

Then print matches with

for (keys %latlon) {
next unless @{ $latlon{$_} } > 1;

my($lat,$lon) = split /$;/o;
print "At $lat, $lon:\n";

for (@{ $latlon{$_} }) {
print " - $_->[2]\n";
}
}

Hope this helps,
Greg
--
Amazon.com status as a dot-com questioned:
http://www.dailyreckoning.com/jokes.cfm?jokeid=348

David K. Wall

unread,
Aug 1, 2003, 1:44:46 PM8/1/03
to
Greg Bacon <gba...@hiwaay.net> wrote:

> s/(\s*) \b North \b (\s*)/${1}N${2}/gx;
> s/(\s*) \b South \b (\s*)/${1}S${2}/gx;
> s/(\s*) \b East \b (\s*)/${1}E${2}/gx;
> s/(\s*) \b West \b (\s*)/${1}W${2}/gx;

How about

s/\b(North|South|East|West)\b/substr($1,0,1)/eg;

?

I don't see a need to remember the spaces since you're not really doing
anything with them.

David K. Wall

unread,
Aug 1, 2003, 1:46:29 PM8/1/03
to
Brian McCauley <nob...@mail.com> wrote:

> Mike Flannigan <mike...@earthlink.net> writes:
>
>> # Rename duplicate name1's
>> for (my $turn = 10; $turn <= $#temp; $turn+=8) {
>> next if ($temp[$turn] ne $temp[$turn-8]);
>> $rep++;
>> if (length $temp[$turn-8] > 3){
>> substr ($temp[$turn-8], 4, 2, "0$rep") if ($rep < 10);
>> substr ($temp[$turn-8], 4, 2, "$rep")if ($rep > 9);
>> $rep = 1 if ($temp[$turn] ne $temp[$turn+8]);
>> }
>> else {
>> $temp[$turn-8] =~ s/^(.{0,4})$/${1}0$rep/ if ($rep < 10);
>> $temp[$turn-8] =~ s/^(.{0,4})$/${1}$rep/ if ($rep > 9);
>> $rep = 1 if ($temp[$turn] ne $temp[$turn+8]);
>> }
>> }
>
> I'm not going to try to understand that.

I "cheated" and looked at the output. Once you have the data in an
array of arrays (call it @AoA), something like this

my %count;
for my $element (@AoA) {
if (++$count{$element->[2]} > 1) {
$element->[2] = substr($element->[2], 0, 4) .
sprintf "%02d", $count{$element->[2]};
}
}

will produce a similar output. Not exactly, but close. I suspect it
*may* be close enough for the OP.


> Actually there was no reason to slurp the whole file - you could
> have processed the file with only one line at a time in memory.
> For a small file this is not important.

Since you've already done the hard part....

use strict;
use warnings;

# I still use the 2-argument open() because I'm used to it :-)
open WPIN, "bran2.csv" or die "Can't create $file: $!";

# print to the wpout.txt file
open WPOUT, ">WPout.txt" or die "Can't create WPout.txt: $!";

my %count;
while (<WPIN>) {
# no need to chomp() because we want the newline when we
# print out the data
my @record = split/,/;
for ($record[7]) {
tr/()#.:\/?'&!//d;


s/\b(North|South|East|West)\b/substr($1,0,1)/eg;

s/\(historical\)/\(hist\)/g;

s/\(abandoned\)/\(aban\)/g;
s/National Recreation Area/NRA/g;
s/Cemetery/Cem/g;


s/High School/HS/g;
s/Community College/College/g;
s/Campground/Campgr/g;
s/Post Office/PO/g;
s/Junior/Jr/g;
s/Mountain/Mtn/g;
}

if (++$count{$record[2]} > 1) {
$record[2] = substr($record[2], 0, 4) .
sprintf "%02d", $count{$record[2]};
}
print WPOUT join ',', @record;
}

close (WPOUT);

Brian McCauley

unread,
Aug 1, 2003, 2:00:40 PM8/1/03
to
"David K. Wall" <use...@dwall.fastmail.fm> writes:

> # I still use the 2-argument open() because I'm used to it :-)
> open WPIN, "bran2.csv" or die "Can't create $file: $!";

I still think the error message could be improved :-(

open WPIN, "<bran2.csv" or die "Can't read bran2.csv: $!";
^^^^

The reason I dislike the 2-argument open is because I think it's
illogical. The file access mode and the file name are logically two
orthogonal arguments. Merging them never made much sense. I used
Perl for a long time before we got the 3-arg open and I always felt
vaguely uncomforatble without it.

I also dislike bareword filehandles because.... er... they're just not
nice. They are, in fact, symrefs! (Waves hands in air and runs off
screeming).

Brian McCauley

unread,
Aug 1, 2003, 2:04:09 PM8/1/03
to
gba...@hiwaay.net (Greg Bacon) writes:

> for ($_->[-1]) {
> s/\(historical\)/(hist)/g;
> s/\(abandoned\)/(aban)/g;

[snip]
> s/Junior/Jr/g;
> s/Mountain/Mtn/g;
> }

Greg, I'm curious. Is this use of for() in place of =~ something you
usually recommend? This is the first time I can ever recall seeing
anyone else suggest it. I'd concluded it was just my own personal
affectation. :-)

Steven Kuo

unread,
Aug 1, 2003, 2:34:24 PM8/1/03
to
On 1 Aug 2003, Brian McCauley wrote:

> gba...@hiwaay.net (Greg Bacon) writes:
>
> > for ($_->[-1]) {
> > s/\(historical\)/(hist)/g;
> > s/\(abandoned\)/(aban)/g;
> [snip]
> > s/Junior/Jr/g;
> > s/Mountain/Mtn/g;
> > }
>
> Greg, I'm curious. Is this use of for() in place of =~ something you
> usually recommend? This is the first time I can ever recall seeing
> anyone else suggest it. I'd concluded it was just my own personal
> affectation. :-)
>

Hope I'm not butting in on a personal conversation here...

I've seen this style advocated by Nathan Torkington (aka gnat). See
http://prometheus.frii.com/~gnat/yapc/2001-idioms/slide026.html

The last few slide of the presentation (pg 24+) contain similarly
interesting idioms.

--
Cheers,
Steven

David K. Wall

unread,
Aug 1, 2003, 2:38:59 PM8/1/03
to
Brian McCauley <nob...@mail.com> wrote:

> "David K. Wall" <use...@dwall.fastmail.fm> writes:
>
>> # I still use the 2-argument open() because I'm used to it :-)
>> open WPIN, "bran2.csv" or die "Can't create $file: $!";
>
> I still think the error message could be improved :-(
>
> open WPIN, "<bran2.csv" or die "Can't read bran2.csv: $!";

Oops. I didn't even read the error message. I plead guilty.

> The reason I dislike the 2-argument open is because I think it's

I meant to ask about that...

> illogical. The file access mode and the file name are logically
> two orthogonal arguments. Merging them never made much sense. I
> used Perl for a long time before we got the 3-arg open and I
> always felt vaguely uncomforatble without it.

Reasonable. Makes much more sense to me than something like scalar
and list context did at first. :-)



> I also dislike bareword filehandles because.... er... they're just
> not nice. They are, in fact, symrefs! (Waves hands in air and
> runs off screeming).

Heh. I hadn't thought of it that way. I bet you REALLY hate the
single-argument open.

$FILE = "/etc/motd";
open FILE or die "can't open $FILE: $!";
while (<FILE>) {
# whatever
}

At the very least you prompted me to reread perlopentut and the
filehandle section of perldata.

Mike Flannigan

unread,
Aug 1, 2003, 2:52:15 PM8/1/03
to

Greg Bacon wrote:

> You were close. See below for that and other improvements. Note that
> mine inverts the order of the repeated locations in the output.

Hey, I appreciate all the work. Your script certainly works, so
now I need to spend a few hours figuring it out. To help me get
started, I'm going to ask just a few questions here, even before
I study it a whole lot.

I sure hope you did that array-of-arrays the other guys were
talking about, cause I want to learn that. On a quick scan
I'm guessing you did.

If I don't attend a good Perl class, it's going to be years before
I start thinking Perl :-)


> #! /usr/local/bin/perl
>
> # This file takes a CSV file,
> # removes offending characters from the name fields
> # and gives the 6 digit names field unique names.
> # It also shortens the long name field a little.
>
> use strict;
> use warnings;
>
> my $file = "bran2.csv";
> my (@temp, $temp);
>
> open WPIN, "<", $file or die "$0: open $file: $!";
>
> while (<WPIN>) {
> my $fields = 8;
> my $count = tr/,//;
>
> if ($count != $fields-1) {
> my $commas = $count == 1 ? "comma" : "commas";
> warn "$0: $file:$.: $count $commas -- bad line?\n";
> }

Nice feature that I didn't have. Checks that only separators
have commas. I'll figure out that


my $commas = $count == 1 ? "comma" : "commas";

line eventually. It's just the ==1 that confuses me.


>
> chomp;
> push @temp => [ split /,/, $_, $fields ];
> }
>
> my %saw;
> for (@temp) {
> # Replace bad characters
> $_->[2] =~ tr/()#.:\/?'&!//d;

What is "$_->[2]"?
I know $_ is the default array, but what does ->[2] do?
Oh wait - that just references the 2nd item in the array,
which is the 3rd column in the A of A, right? Surely
it's something like that.


>
> for ($_->[-1]) {
> s/\(historical\)/(hist)/g;
> s/\(abandoned\)/(aban)/g;
> s/National Recreation Area/NRA/g;
> s/Cemetery/Cem/g;
>
> s/(\s*) \b North \b (\s*)/${1}N${2}/gx;
> s/(\s*) \b South \b (\s*)/${1}S${2}/gx;
> s/(\s*) \b East \b (\s*)/${1}E${2}/gx;
> s/(\s*) \b West \b (\s*)/${1}W${2}/gx;
>
> s/High School/HS/g;
> s/Community College/College/g;
> s/Campground/Campgr/g;
> s/Post Office/PO/g;
> s/Junior/Jr/g;
> s/Mountain/Mtn/g;
> }
>
> if ($saw{ $_->[2] }++) {
> $_->[2] =~ s{^ (.{0,4}) (..)? (.*) $} {
> $1 . sprintf("%02d", $saw{ $_->[2] }) . $3
> }ex;
> }
> }

Ouch, no good questions here. I just need to dive into
hash stuff. This is going to take a few days.


>
> # print to the wpout.txt file
> open WPOUT, ">", "WPout.txt" or die "$0: open WPout.txt: $!";
>
> for (@temp) {
> print WPOUT join("," => @$_), "\n";;
> }
>
> close WPOUT or warn "$0: close WPout.txt: $!";
>
> __END__
>
> : My next step is to push $lat and $long together and
> : print a file of records that have the same $lat . $long
> : Probably pretty easy, but I may have to do brute force
> : way if I don't get smart on hashes real soon.
>
> The trick there will be defining what you mean by the same latitude
> and longitude. If it's as simple as the text in their respective
> records being identical, then you could do the following in your
> processing loop:
>
> push @{ $latlon{ $_->[3], $_->[4] } }, [ @$_ ];
>
> Then print matches with
>
> for (keys %latlon) {
> next unless @{ $latlon{$_} } > 1;
>
> my($lat,$lon) = split /$;/o;
> print "At $lat, $lon:\n";
>
> for (@{ $latlon{$_} }) {
> print " - $_->[2]\n";
> }
> }
>

Yep, identical text is what I want. I'm sure what you have will
work, I just haven't tried it yet.


Thanks so much.


David Wall's suggestion of the regex appears to be good.


>

Tad McClellan

unread,
Aug 1, 2003, 2:54:11 PM8/1/03
to
Mike Flannigan <mike...@earthlink.net> wrote:
>
> I'm just posting some code I wrote and asking for any advice
> you might have.

> Keep in mind I hadn't even


> heard of Perl until about 50 days ago.


Your program is very nice, taking that into consideration.

Congratulations.


> Believe it or not, I'm kinda proud of this code.


Hubris can be virtuous. :-)

(have you read the very last bit of "perldoc perl" yet?)


> If anybody has any suggestions, I want them.


Stand by...


> use strict;
> use warnings;


I like you a lot already.


> my $file = "bran2.csv";


I prefer to use single quotes except when I _need_ one of the
two extra things that double quotes give you (interpolation
and backslash escapes).

It helps me ignore scanning long strings for interpolated variables
when I do debugging.

my $file = 'bran2.csv';


> my (@temp, $temp);


1) what is $temp there for? I don't see you using it anywhere...

2) it is a red flag to me when I find myself choosing those
particular names. Sometimes I really want an intermediate
"temp" thingie, but most of the time I can rearrange things
so as to not need it at all. And I can _always_ at least
think up a more-descriptive name.

3) I almost never use the same name from multiple namespaces
like that. It confuses me, and I'm already easily enough
confused thank you very much.


> while (<WPIN>) {
> my ($wp1, $deg1, $name1, $lat, $long, $date, $time, $name2) = split
> /,/, $_, 8;
> push @temp, ($wp1, $deg1, $name1, $lat, $long, $date, $time, $name2);
>
> }


There is a whole bunch of temp vars that you do not need:

push @fields, split /,/, $_, 8; # "fields" is better than "temp"

you might even like to replace the entire while block above with:

push @fields, split( /,/, $_, 8 ) while <WPIN>;


> $temp[$turn+5] =~ s/\(historical\)/\(hist\)/g;
> $temp[$turn+5] =~ s/\(abandoned\)/\(aban\)/g;
> $temp[$turn+5] =~ s/National Recreation Area/NRA/g;


Typing essentially the same thing over and over is yet another
red flag that often indicates that there might be some better way.

Using the aliasing feature of foreach() is handy for that,
even when used with a one-element list:

foreach ( $temp[$turn+5] ) {
s/\(historical\)/\(hist\)/g; # changes are made back in $temp[$turn+5]


s/\(abandoned\)/\(aban\)/g;

s/National Recreation Area/NRA/g;
# ...
}

> substr ($temp[$turn-8], 4, 2, "0$rep") if ($rep < 10);
> substr ($temp[$turn-8], 4, 2, "$rep")if ($rep > 9);

Treating strings as if they were numbers is a path to madness.

Do you _want_

$rep = -1;

to become

$rep = '0-1';

??

A negative number of repetitions isn't going to happen of course,
but if you get used to using such false cleverness, it will
bite you someday.

Use sprintf() to zero-pad numbers:

$rep = sprintf "%02d", $rep;

> if ($printout =~ /\n$/) {print WPOUT ("$printout")}

^ ^ useless quotes

See this Perl FAQ:

What's wrong with always quoting "$vars"?


--
Tad McClellan SGML consulting
ta...@augustmail.com Perl programming
Fort Worth, Texas

Greg Bacon

unread,
Aug 1, 2003, 3:11:51 PM8/1/03
to
In article <u94r11d...@wcl-l.bham.ac.uk>,
Brian McCauley <nob...@mail.com> wrote:

: gba...@hiwaay.net (Greg Bacon) writes:
:
: > for ($_->[-1]) {
: > s/\(historical\)/(hist)/g;
: > s/\(abandoned\)/(aban)/g;
: [snip]
: > s/Junior/Jr/g;
: > s/Mountain/Mtn/g;
: > }
:
: Greg, I'm curious. Is this use of for() in place of =~ something you
: usually recommend? This is the first time I can ever recall seeing
: anyone else suggest it. I'd concluded it was just my own personal
: affectation. :-)

It's not just in place of =~ (although that is a nice benefit with
operators that default to $_) but for aliasing in general. Further down
the thread, someone cited gnat, and I picked up this idiom (or trick or
whatever you want to call it) from tchrist. When I write code, I like
to eliminate repetition as much as possible. (XPers call this Once And
Only Once.)

Perl 6 will have a general aliasing mechanism that I could have used
to alias the fields of interest rather than big, ugly dereferencing
of array indices.

Greg
--
The glory and mystery of global commerce has been observed for thousands of
years, but it is no less wondrous to see in our everyday lives how it is
that people pursuing their self-interest in peace can only promote the
interest of society. -- Lew Rockwell

Greg Bacon

unread,
Aug 1, 2003, 3:14:49 PM8/1/03
to
In article <Xns93CA8BD544...@216.168.3.30>,
David K. Wall <use...@dwall.fastmail.fm> wrote:

: [...]
:
: I don't see a need to remember the spaces since you're not really
: doing anything with them.

Yes, it looks nicer your way.

Greg
--
The price of empire is terror. The price of occupation is terror. The
price of interventionism is terror. As Barry Goldwater used to say, it
is as simple as that.
-- Pat Buchanan

John W. Krahn

unread,
Aug 1, 2003, 3:19:56 PM8/1/03
to
Mike Flannigan wrote:
>
> I'm just posting some code I wrote and asking for any advice
> you might have.

Ok, you asked for it. :-)


> It may not work perfectly, but it's got me
> fooled into thinking it does. Keep in mind I hadn't even
> heard of Perl until about 50 days ago.
>
> Believe it or not, I'm kinda proud of this code. That's only
> because I know how much improved it is over my first
> draft. If anybody has any suggestions, I want them.

It looks pretty good for someone with only 50 days experience.


> # This file takes a CSV file,
> # removes offending characters from the name fields
> # and gives the 6 digit names field unique names.
> # It also shortens the long name field a little.
>
> use strict;
> use warnings;
>
> my $rep = 1;
> my $file = "bran2.csv";
> my (@temp, $temp);
> open WPIN,"<$file" or die "Can't create $file: $!";

^^^^^^
Opening a file for reading does not "create" the file, it has to already exist.


> while (<WPIN>) {
> my ($wp1, $deg1, $name1, $lat, $long, $date, $time, $name2) = split
> /,/, $_, 8;
> push @temp, ($wp1, $deg1, $name1, $lat, $long, $date, $time, $name2);
> }

You usually don't need to load the entire file into memory unless you are
modifying the original file which you are not doing. You are separating
the records and fields and putting all the fields into one array while it
might make more sense to separate the fields in the next for loop or use
an array of arrays to hold the records and fields.


> # Replace bad characters
> for (my $turn = 10; $turn <= $#temp; $turn+=8) {
> $temp[$turn] =~ tr/()#.:\/?'&!//d;
> $temp[$turn+5] =~ s/\(historical\)/\(hist\)/g;
> $temp[$turn+5] =~ s/\(abandoned\)/\(aban\)/g;
> $temp[$turn+5] =~ s/National Recreation Area/NRA/g;
> $temp[$turn+5] =~ s/Cemetery/Cem/g;
> $temp[$turn+5] =~ s/ North / N /g;
> $temp[$turn+5] =~ s/ North$/ N/g;
> $temp[$turn+5] =~ s/^North /N /g;

You can use the \b word boundary zero width assertion to simplify that.

$temp[$turn+5] =~ s/\bNorth\b/N/g;


> $temp[$turn+5] =~ s/ South / S /g;
> $temp[$turn+5] =~ s/ South$/ S/g;
> $temp[$turn+5] =~ s/^South /S /g;
> $temp[$turn+5] =~ s/ East / E /g;
> $temp[$turn+5] =~ s/ East$/ E/g;
> $temp[$turn+5] =~ s/^East /E /g;
> $temp[$turn+5] =~ s/ West / W /g;
> $temp[$turn+5] =~ s/ West$/ W/g;
> $temp[$turn+5] =~ s/^West /W /g;
> $temp[$turn+5] =~ s/High School/HS/g;
> $temp[$turn+5] =~ s/Community College/College/g;
> $temp[$turn+5] =~ s/Campground/Campgr/g;
> $temp[$turn+5] =~ s/Post Office/PO/g;
> $temp[$turn+5] =~ s/Junior/Jr/g;
> $temp[$turn+5] =~ s/Mountain/Mtn/g;

If you have to modify a single scalar multiple times you can for loop to simplify it.

for ( $scalar ) {
s/1//;
s/2//;
s/etc.//;
}


> }
>
> # Rename duplicate name1's

You should declare $rep here instead of way up there at the beginning.

my $rep = 1;

> for (my $turn = 10; $turn <= $#temp; $turn+=8) {
> next if ($temp[$turn] ne $temp[$turn-8]);
> $rep++;
> if (length $temp[$turn-8] > 3){
> substr ($temp[$turn-8], 4, 2, "0$rep") if ($rep < 10);
> substr ($temp[$turn-8], 4, 2, "$rep")if ($rep > 9);

You can use sprintf to add leading zeros.

substr( $temp[ $turn - 8 ], -2 ) = sprintf '%02d', $rep;


> $rep = 1 if ($temp[$turn] ne $temp[$turn+8]);
> }
> else {
> $temp[$turn-8] =~ s/^(.{0,4})$/${1}0$rep/ if ($rep < 10);
> $temp[$turn-8] =~ s/^(.{0,4})$/${1}$rep/ if ($rep > 9);

You could use concatenation to simplify that.

$temp[ $turn - 8 ] .= sprintf '%02d', $rep;


> $rep = 1 if ($temp[$turn] ne $temp[$turn+8]);
> }
> }
>
> # print to the wpout.txt file
> open WPOUT,">WPout.txt" or die "Can't create WPout.txt: $!";
>
> foreach my $printout ( @temp ) {
> if ($printout =~ /\n$/) {print WPOUT ("$printout")}
> else {print WPOUT ("$printout,")}
> }
>
> close (WPOUT);
>
> __END__
>

> If anybody feels like messing with this, here is some
> good data to put in 'bran2.csv' to test it:
>

> [snip data]


Here is one way to do it using an array of arrays:

#!/usr/bin/perl

# This file takes a CSV file,
# removes offending characters from the name fields
# and gives the 6 digit names field unique names.
# It also shortens the long name field a little.

use strict;
use warnings;

# use names instead of numbers to access array fields
use constant NAME1 => 2;
use constant NAME2 => 7;

my $old_file = 'test11.txt';
my $new_file = 'test13.txt';

open WPIN, $old_file or die "Cannot open $old_file: $!";
my @data = map [ split /,/, $_, 8 ], <WPIN>;
close WPIN;

# Replace bad characters
for my $line ( @data ) {
$line->[ NAME1 ] =~ tr|()#.:/?'&!||d;

for ( $line->[ NAME2 ] ) {


s/\(historical\)/\(hist\)/g;

s/\(abandoned\)/\(aban\)/g;

s/National Recreation Area/NRA/g;
s/Cemetery/Cem/g;
s/\bNorth\b/N/g;
s/\bSouth\b/S/g;
s/\bEast\b/E/g;
s/\bWest\b/W/g;


s/High School/HS/g;
s/Community College/College/g;
s/Campground/Campgr/g;
s/Post Office/PO/g;
s/Junior/Jr/g;

s/Mountain/Mtn/g;
}
}

# Rename duplicate name1's

my $rep = 1;
for my $line ( 1 .. $#data ) {
next if $data[ $line ][ NAME1 ] ne $data[ $line - 1 ][ NAME1 ];
$rep++;
if ( length $data[ $line - 1 ][ NAME1 ] > 3 ) {
substr( $data[ $line - 1 ][ NAME1 ], -2 ) = sprintf '%02d', $rep;
}
else {
$data[ $line - 1 ][ NAME1 ] .= sprintf '%02d', $rep;
}
$rep = 1 if $data[ $line ][ NAME1 ] ne $data[ $line + 1 ][ NAME1 ];
}

# print to the wpout.txt file

open WPOUT, ">$new_file" or die "Cannot open $new_file: $!";
print WPOUT join ',', @$_ for @data;
close WPOUT;

__END__

John
--
use Perl;
program
fulfillment

Juha Laiho

unread,
Aug 1, 2003, 3:22:01 PM8/1/03
to
Mike Flannigan <mike...@earthlink.net> said:

>Greg Bacon wrote:
>> if ($count != $fields-1) {
>> my $commas = $count == 1 ? "comma" : "commas";
>> warn "$0: $file:$.: $count $commas -- bad line?\n";
>> }
>
>Nice feature that I didn't have. Checks that only separators
>have commas. I'll figure out that
>my $commas = $count == 1 ? "comma" : "commas";
>line eventually. It's just the ==1 that confuses me.

With this I can help you; the line

my $commas = $count == 1 ? "comma" : "commas";

is just a shorthand for writing

my $commas;
if ($count == 1) {
$commas="comma";
} else {
$commas="commas";
}

So, assign $commas to singular or plural form depending on the amount of
commas seen. Even though I often advocate against use of the ?: -construct,
here I admit it is the more readable form.
--
Wolf a.k.a. Juha Laiho Espoo, Finland
(GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)

Mike Flannigan

unread,
Aug 1, 2003, 3:36:51 PM8/1/03
to

Tad McClellan wrote:

>
> Hubris can be virtuous. :-)
>
> (have you read the very last bit of "perldoc perl" yet?)
>

I have now.

> I prefer to use single quotes except when I _need_ one of the
> two extra things that double quotes give you (interpolation
> and backslash escapes).
>
> It helps me ignore scanning long strings for interpolated variables
> when I do debugging.
>
> my $file = 'bran2.csv';

Done. Thanks alot.


>
> > my (@temp, $temp);
>
> 1) what is $temp there for? I don't see you using it anywhere...
>
> 2) it is a red flag to me when I find myself choosing those
> particular names. Sometimes I really want an intermediate
> "temp" thingie, but most of the time I can rearrange things
> so as to not need it at all. And I can _always_ at least
> think up a more-descriptive name.
>
> 3) I almost never use the same name from multiple namespaces
> like that. It confuses me, and I'm already easily enough
> confused thank you very much.

I got rid of $temp and changed temp to working everywhere else.
To answer your question, I was following the lead of somebody
else. I saw them do it, so I did too.

Good thing they didn't jump off a cliff :-)


> > while (<WPIN>) {
> > my ($wp1, $deg1, $name1, $lat, $long, $date, $time, $name2) = split
> > /,/, $_, 8;
> > push @temp, ($wp1, $deg1, $name1, $lat, $long, $date, $time, $name2);
> >
> > }
>
> There is a whole bunch of temp vars that you do not need:
>
> push @fields, split /,/, $_, 8; # "fields" is better than "temp"
>
> you might even like to replace the entire while block above with:
>
> push @fields, split( /,/, $_, 8 ) while <WPIN>;

Pretty cool. I like it.


>
>
> > $temp[$turn+5] =~ s/\(historical\)/\(hist\)/g;
> > $temp[$turn+5] =~ s/\(abandoned\)/\(aban\)/g;
> > $temp[$turn+5] =~ s/National Recreation Area/NRA/g;
>
> Typing essentially the same thing over and over is yet another
> red flag that often indicates that there might be some better way.
>
> Using the aliasing feature of foreach() is handy for that,
> even when used with a one-element list:
>
> foreach ( $temp[$turn+5] ) {
> s/\(historical\)/\(hist\)/g; # changes are made back in $temp[$turn+5]
> s/\(abandoned\)/\(aban\)/g;
> s/National Recreation Area/NRA/g;
> # ...
> }

Thanks. Great idea.


>
> > substr ($temp[$turn-8], 4, 2, "0$rep") if ($rep < 10);
> > substr ($temp[$turn-8], 4, 2, "$rep")if ($rep > 9);
>
> Treating strings as if they were numbers is a path to madness.
>
> Do you _want_
>
> $rep = -1;
>
> to become
>
> $rep = '0-1';
>
> ??
>
> A negative number of repetitions isn't going to happen of course,
> but if you get used to using such false cleverness, it will
> bite you someday.
>

I guess you are saying hard code it with functions. If not,
then I'm not up to you yet in that area.


>
> Use sprintf() to zero-pad numbers:
>
> $rep = sprintf "%02d", $rep;
>
> > if ($printout =~ /\n$/) {print WPOUT ("$printout")}
> ^ ^ useless quotes
>
> See this Perl FAQ:
>
> What's wrong with always quoting "$vars"?
>

I see. Thanks for the warning.

Mike Flannigan

unread,
Aug 1, 2003, 3:57:59 PM8/1/03
to

Juha Laiho wrote:

> With this I can help you; the line
>
> my $commas = $count == 1 ? "comma" : "commas";
>
> is just a shorthand for writing
>
> my $commas;
> if ($count == 1) {
> $commas="comma";
> } else {
> $commas="commas";
> }

OK. I was confused partly because I was thinking
$commas could basically never be less than 7. That was
faulty thinking on my part. It certainly could be.

This is the 2nd time this expansion of script has helped
me understand an expression. The first time was with
the union, intersection, difference example in the perlfaq.
The line was:
push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;

Wouldn't it be nice if somebody wrote a perl module
that would break one liners down to easily readable
script :-)


Mike


Mike Flannigan

unread,
Aug 1, 2003, 4:32:45 PM8/1/03
to

"John W. Krahn" wrote:

> > $temp[$turn-8] =~ s/^(.{0,4})$/${1}0$rep/ if ($rep < 10);
> > $temp[$turn-8] =~ s/^(.{0,4})$/${1}$rep/ if ($rep > 9);
>
> You could use concatenation to simplify that.
>
> $temp[ $turn - 8 ] .= sprintf '%02d', $rep;

I'm kinda surprised that worked, but it did. I knew the sprintf
part would work, but the .= I was wary of.

Thanks. That is going to help me figure of the A of A routine.

Rich

unread,
Aug 1, 2003, 5:58:33 PM8/1/03
to
Mike Flannigan wrote:

> # This file takes a CSV file,
> # removes offending characters from the name fields
> # and gives the 6 digit names field unique names.
> # It also shortens the long name field a little.
>
> use strict;
> use warnings;
>
> my $rep = 1;
> my $file = "bran2.csv";
> my (@temp, $temp);
> open WPIN,"<$file" or die "Can't create $file: $!";
>
> while (<WPIN>) {
> my ($wp1, $deg1, $name1, $lat, $long, $date, $time, $name2) = split
> /,/, $_, 8;

Hi Mike

You've already got a lot of useful info from other folks, but one thing
stands out - if you're really parsing a CSV file, you've got to assume you
may get entries such as:

value1,"val,ue2",value3,""val,ue4""

If you use split /,/ you would get this unlovely list of strings:

value1
"val
ue2"
value3
""val
ue4""

When you would have expected:

value1
val,ue2
value3
"val,ue4"

So, always use modules such as Text::CSV for parsing CSV files, since they
handle a number of subtle issues that might not be immediately obvious, but
will bite you at some point.

I do appreciate that this probably won't happen in the case you're outlining
here though.

Cheers,
--
Rich
scrip...@yahoo.co.uk

Greg Bacon

unread,
Aug 1, 2003, 5:15:44 PM8/1/03
to
In article <3F2AB76B...@earthlink.net>,
Mike Flannigan <mike...@earthlink.net> wrote:

: Greg Bacon wrote:
:
: [...]
:
: I sure hope you did that array-of-arrays the other guys were


: talking about, cause I want to learn that. On a quick scan
: I'm guessing you did.

Yes. I'm building an array of arrays with

push @temp => [ split /,/, $_, $fields ];

This makes each slot in @temp be a reference to an array whose thingy
(Perl parlance for the referent or, to be imprecise, what a reference
"points to") is an array containing the fields from some record in your
input.

As you gain more experience with references, the underlying mechanics
become transparent in your mental model, so you'll begin thinking in
terms of records instead of "let's see, I have a reference to an array
containing the fields from some record". This is what psychologists
call chunking.

: [...]
:
: Greg Bacon wrote:
:
: > while (<WPIN>) {


: > my $fields = 8;
: > my $count = tr/,//;
: >
: > if ($count != $fields-1) {
: > my $commas = $count == 1 ? "comma" : "commas";
: > warn "$0: $file:$.: $count $commas -- bad line?\n";
: > }
:
: Nice feature that I didn't have. Checks that only separators
: have commas.

No, I'm making sure that each input record is well-formed, i.e., that
each has seven commas and, thus, eight fields.

: I'll figure out that


: my $commas = $count == 1 ? "comma" : "commas";
: line eventually. It's just the ==1 that confuses me.

There I'm being, as the Aussies say, a pedantic git. I'm making the
noun agree with the number: 1 comma but 2 commas.

: > for (@temp) {


: > # Replace bad characters
: > $_->[2] =~ tr/()#.:\/?'&!//d;
:
: What is "$_->[2]"?
: I know $_ is the default array, but what does ->[2] do?
: Oh wait - that just references the 2nd item in the array,
: which is the 3rd column in the A of A, right? Surely
: it's something like that.

Right. At the first iteration, we've effectively done this

$_ = [ 'Datum', 'WGS84', 'WGS84', 0, 0, 0, 0, 0 ];

which means $_ is a reference to an array with those values. An
equivalent way to write this (you'll see it written the following way
much more often) is

$_ = [ qw/ Datum WGS84 WGS84 0 0 0 0 0 / ];

Naked square brackets are the anonymous array reference constructor
(see the perlref manpage), and qw// splits on whitespace (see the
perlop manpage).

Given that value of $_, $_->[2] means the value at index 2 (i.e., the
third value assuming $[ is 0 -- which it almost always is) of $_'s
thingy: 'WGS84' in this case. The perlref manpage explains all this.

: > if ($saw{ $_->[2] }++) {


: > $_->[2] =~ s{^ (.{0,4}) (..)? (.*) $} {
: > $1 . sprintf("%02d", $saw{ $_->[2] }) . $3
: > }ex;
: > }
:
: Ouch, no good questions here. I just need to dive into
: hash stuff. This is going to take a few days.

That code keeps up with duplicates. You weren't explicit about how
duplicates are supposed to be renamed, so that's my best guess at a
general approach (i.e., one that works for names of any length).

$saw{ $_->[2] }++ will only be true when the current record's name
is one we've already seen. $_->[2] is the third column, remember,
so at any time, $saw{$name} tells us the number of times we've seen
$name.

In the substitution, I'm inserting the number. The sprintf "%02d"
saves you from doing the zero padding by hand as you did in your
code. As I said earlier, doing it this way produces different
output, so you may need to play with the code.

: [...]
:
: Yep, identical text is what I want. I'm sure what you have will


: work, I just haven't tried it yet.

I tried it quickly by doubling the last line of the input, and I
saw the result I expected. Read the perlvar manpage to learn why
I'm splitting on the $; magic variable's value.

: David Wall's suggestion of the regex appears to be good.

I agree.

Hope this helps,
Greg
--

Between a balanced republic and a democracy, the difference is like that
between order and chaos.
-- Chief Justice John Marshall

Tad McClellan

unread,
Aug 1, 2003, 5:33:49 PM8/1/03
to
Mike Flannigan <mike...@earthlink.net> wrote:
> Tad McClellan wrote:


>> > substr ($temp[$turn-8], 4, 2, "0$rep") if ($rep < 10);
>> > substr ($temp[$turn-8], 4, 2, "$rep")if ($rep > 9);
>>
>> Treating strings as if they were numbers is a path to madness.


Errr, I said that backwards.

You are treating a number as if it was a string.


>> Do you _want_
>>
>> $rep = -1;
>>
>> to become
>>
>> $rep = '0-1';
>>
>> ??


(because that is what your code would do for the 4th substr() arg...)


>> A negative number of repetitions isn't going to happen of course,
>> but if you get used to using such false cleverness, it will
>> bite you someday.
>>
>
> I guess you are saying hard code it with functions. If not,
> then I'm not up to you yet in that area.


I'm saying if you mean for $rep to be a number, then treat is
as a number rather than as a string.


>> Use sprintf() to zero-pad numbers:
>>
>> $rep = sprintf "%02d", $rep;


sprintf() is treating $rep as a number, so it won't become
confused when $rep < 0 .

JS Bangs

unread,
Aug 1, 2003, 6:00:13 PM8/1/03
to
Brian McCauley sikyal:

> gba...@hiwaay.net (Greg Bacon) writes:
>
> > for ($_->[-1]) {
> > s/\(historical\)/(hist)/g;
> > s/\(abandoned\)/(aban)/g;
> [snip]
> > s/Junior/Jr/g;
> > s/Mountain/Mtn/g;
> > }
>
> Greg, I'm curious. Is this use of for() in place of =~ something you
> usually recommend? This is the first time I can ever recall seeing
> anyone else suggest it. I'd concluded it was just my own personal
> affectation. :-)

Whenever I need something like this, I'm in the habit of doing an
explicit assignment to $_

$_ = $foo;

Rather than:

for ($foo) {}

If I'm worried about scope, I'll put the whole thing in a bare block. Is
there any tremendous reason not to do this?

--
Jesse S. Bangs jas...@u.washington.edu
http://students.washington.edu/jaspax/
http://students.washington.edu/jaspax/blog

Jesus asked them, "Who do you say that I am?"

And they answered, "You are the eschatological manifestation of the ground
of our being, the kerygma in which we find the ultimate meaning of our
interpersonal relationship."

And Jesus said, "What?"

Jeff 'japhy' Pinyan

unread,
Aug 1, 2003, 6:52:10 PM8/1/03
to JS Bangs
[posted & mailed]

On Fri, 1 Aug 2003, JS Bangs wrote:

>Whenever I need something like this, I'm in the habit of doing an
>explicit assignment to $_
>
>$_ = $foo;
>
>Rather than:
>
>for ($foo) {}
>
>If I'm worried about scope, I'll put the whole thing in a bare block. Is
>there any tremendous reason not to do this?

Assigning to $_ doesn't automagically localize $_. You'd need to say
'local' yourself. Using a for loop DOES automagically localize $_.

--
Jeff Pinyan RPI Acacia Brother #734 2003 Rush Chairman
"And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)

Steve Grazzini

unread,
Aug 1, 2003, 7:10:20 PM8/1/03
to
JS Bangs <jas...@u.washington.edu> wrote:
> Brian McCauley sikyal:

>> this use of for() in place of =~
>
> Whenever I need something like this, I'm in the habit of
> doing an explicit assignment to $_
>
> $_ = $foo;
>
> Rather than:
>
> for ($foo) {}
>
> If I'm worried about scope, I'll put the whole thing in a
> bare block. Is there any tremendous reason not to do this?

The fact that foreach() makes $_ an alias to $foo rather than
a copy of it is crucial in some cases (including this one).

--
Steve

Mike Flannigan

unread,
Aug 2, 2003, 11:24:24 AM8/2/03
to

Brian McCauley wrote:

> Why do you start at 10 not 2? If you want to discard the top row (a
> legend?) then I would take it out of @temp and put it in a diffent
> variable.

Just realized I never answered this one. I'm not interested in
changing the first line of the file, only the following lines.


> Look up what [] mean in regular expressions - then remove them when
> you discover that [] is in no way related to what you were tring to
> do.

"Matches any one of the class of characters contained within
the brackets". That leads me to believe I could remove my
or (pipe) and just use:


$temp[$turn+5] =~ s/([^\s])East(\s)/$1E$2/g;

but that does not work either.

Your word boundary (/b) works great.

Thanks for all the suggestions. I wish I understood this script
100%, but not quite. I'm still working on understanding all
parts of it, but here is what I finally wound up with:

# This file takes a text file in CSV format,


# removes offending characters from the name fields
# and gives the 6 digit names field unique names.
# It also shortens the long name field a little.

use strict;
use warnings;

use constant COL1 => 0;
use constant COL2 => 1;
use constant COL3 => 2;
use constant COL4 => 3;
use constant COL5 => 4;
use constant COL6 => 5;
use constant COL7 => 6;
use constant COL8 => 7;

my $old_file = 'bran2.csv';
my $new_file = 'WPout.txt';

open WPIN, $old_file or die "Cannot open $old_file: $!";
my @data = map [ split /,/, $_, 8 ], <WPIN>;
close WPIN;

# Replace bad characters


for my $line ( @data ) {

$line->[ COL3 ] =~ tr|()#.:/?'&!||d;

for ( $line->[ COL8 ] ) {


s/\(historical\)/\(hist\)/g;

s/\(abandoned\)/\(aban\)/g;
s/National Recreation Area/NRA/g;
s/Cemetery/Cem/g;

s/\b(North|South|East|West)\b/substr($1,0,1)/eg;

s/High School/HS/g;
s/Community College/College/g;
s/Campground/Campgr/g;
s/Post Office/PO/g;
s/Junior/Jr/g;
s/Mountain/Mtn/g;
}
}

# Rename duplicate name1's

my %saw;
for my $line (@data) {
if ($saw{ $line->[ COL3 ] }++) {
# $line->[ COL3 ] =~ s|{^(.{0,4}) (..)? (.*) $}|
# {$1 . sprintf("%02d", $saw{ $line->[ COL3 ] }) . $3
# |ex;
$line->[ COL3 ] =~ s{^(.{0,4}) (..)? (.*) $} {
$1 . sprintf("%02d", $saw{ $line->[ COL3 ] }) . $3
}ex;
}
}

# print to the wpout.txt file

open WPOUT, ">", $new_file or die "$0: open $new_file: $!";

for (@data) {
print WPOUT join("," => @$_);
}

close WPOUT or warn "$0: close $new_file: $!";

__END__

Jeff 'japhy' Pinyan

unread,
Aug 2, 2003, 11:59:53 AM8/2/03
to Mike Flannigan
[posted & mailed]

On Sat, 2 Aug 2003, Mike Flannigan wrote:

>> Look up what [] mean in regular expressions - then remove them when
>> you discover that [] is in no way related to what you were tring to
>> do.
>
>"Matches any one of the class of characters contained within
>the brackets". That leads me to believe I could remove my
>or (pipe) and just use:
>$temp[$turn+5] =~ s/([^\s])East(\s)/$1E$2/g;
>but that does not work either.

You need to read more -- the '^' symbol, when the first character of a
char-class, means "match any character except those following...". So
that means [abc] matches a, b, or c, while [^abc] matches any character
except a, b, or c. Even if that wasn't the case, a character class is
used for matching CHARACTERS, and in a regex, ^ doesn't match a character,
it matches a location.

>use constant COL1 => 0;
>use constant COL2 => 1;
>use constant COL3 => 2;
>use constant COL4 => 3;
>use constant COL5 => 4;
>use constant COL6 => 5;
>use constant COL7 => 6;
>use constant COL8 => 7;

Eww. Eww. I would remove that from my code. Either that, or change the
names of the constants to something meaningful. I'd rather write '5' than
'COL6'.

Greg Bacon

unread,
Aug 2, 2003, 12:12:12 PM8/2/03
to
In article <Pine.SGI.3.96.1030802...@vcmr-64.server.rpi.edu>,
Jeff 'japhy' Pinyan <pin...@rpi.edu> wrote:

: [...]
: >use constant COL8 => 7;


:
: Eww. Eww. I would remove that from my code. Either that, or change the
: names of the constants to something meaningful. I'd rather write '5' than
: 'COL6'.

Agreed, those constants should be the names of the columns, not more
magic numbers that require more typing.

Greg
--
Generally speaking, bugs don't go away just because you document them.
-- Abigail

Brian McCauley

unread,
Aug 4, 2003, 12:45:57 PM8/4/03
to
JS Bangs <jas...@u.washington.edu> writes:

> Whenever I need something like this, I'm in the habit of doing an
> explicit assignment to $_
>
> $_ = $foo;

As others have pointed out, that makes a copy of $foo not an alias to
$foo.

> Rather than:
>
> for ($foo) {}
>
> If I'm worried about scope, I'll put the whole thing in a bare block. Is
> there any tremendous reason not to do this?

Yes and no.

Personally I think

{
local *_ = \$foo;
# Do stuff
}

Is more ugly than

for ( $foo ) {
# Do stuff
}

It also has the disadvantage of localising %_ and @_ which you may not
want localised.

Of course, if you don't want aliasing, you might think you could
safely use:

{
local $_ = $foo;
# Do stuff
}

But you can't! There is a bug in Perl's tied variables that will bite
you sooner or later:

use Tie::Array;

tie my @q1, 'Tie::StdArray';

sub foo1 {
local $_ = 'BAZ';
print "@q1 $_\n";
}

@q1 = ('foo','bar');

for ( @q1 ) {
foo1;

JS Bangs

unread,
Aug 4, 2003, 3:30:20 PM8/4/03
to
Brian McCauley sikyal:

> JS Bangs <jas...@u.washington.edu> writes:
>
> > Whenever I need something like this, I'm in the habit of doing an
> > explicit assignment to $_
> >
> > $_ = $foo;
>
> As others have pointed out, that makes a copy of $foo not an alias to
> $foo.

This has never mattered to me in the past, but I can definitely see where
it might. Thanks to all for pointing this out.

> > Rather than:
> >
> > for ($foo) {}
> >
> > If I'm worried about scope, I'll put the whole thing in a bare block. Is
> > there any tremendous reason not to do this?
>
> Yes and no.
>
> Personally I think
>
> {
> local *_ = \$foo;
> # Do stuff
> }
>
> Is more ugly than
>
> for ( $foo ) {
> # Do stuff
> }

I agree. Now that I know this trick, I intend to start using it
immediately :).

Tina Mueller

unread,
Aug 4, 2003, 7:39:49 PM8/4/03
to
JS Bangs wrote:

> Whenever I need something like this, I'm in the habit of doing an
> explicit assignment to $_

> $_ = $foo;

> Rather than:

> for ($foo) {}

another point: that would leave $foo unchanged.
you'd have to do:
$foo = $_;
at the end of the block, anyway.

regards, tina
--
http://www.tinita.de/ \ enter__| |__the___ _ _ ___
http://Movies.tinita.de/ \ / _` / _ \/ _ \ '_(_-< of
http://www.perlquotes.de/ \ \ _,_\ __/\ __/_| /__/ perception
- my mail address expires end of august 2003 -

0 new messages