faster execution

18 views
Skip to first unread message

venus

unread,
Sep 24, 2001, 4:16:27 AM9/24/01
to
Hi all,

I wrote a perl script but I'm not sure if there's a better way of
doing it.
The script runs very slow because the input file (txt file) I'm
reading from is very big. The script reads the input text file and if
it finds any location that equals to the list in array, it will
replace it with location code.
Here's a part of the script:

...
....

%LOCATION = (
'Gaborone,,BC', "GBE",
'Cairo,,EG', "CAI",
'Nairobi,,KN', "NBO",
'Dar es Salaam,,TN', "DAR",
'Kampala,,UG', "KLA",
'Cape Town,,ZA', "CPT",
'Johannesburg,,ZA', "JNB",
'Pretoria,,ZA', "PRY",
'Harare,,ZW', "HRE",
'Windhoek,,NM', "ERS",

'London,,UK', "LON",
'Vienna,,OS', "VIE",
'Brussels,,BX', "BRU",
'Copenhagen,,DN', "CPH",
'Berlin,,DL', "BER",
'Frankfurt,,DL', "FRA",
'Athens,,GR', "ATH",
'Tallinn,,EO', "TLL",
'Dublin,,IE', "DUB",
'Madrid,,SP', "MAD",
);


while (<INPUT_FILE>) {
chomp;
( $city, $state, $country, @data ) = split(/,/);
$location = "$city,$state,$country";
if ( defined @data[0] && ! defined $StartDD ) {
$StartDD = @data[0];
}
$day3 = $data[8]; $max3 = $data[9]; $min3 = $data[10];
$wx3 = $CONDITION{$data[11]};

foreach $location_name (%LOCATION ) {
$location_code = $LOCATION{$location_name};

if ($location_name eq $location ) {

printf FILE "$location_code\,";
printf FILE "%4s", "$min3\,";
printf FILE "%4s", "$max3\,";
printf FILE "%1s", "$wx3\n";

}

}
close(INPUT_FILE);
close(FILE);
}

Thanks in advance.
Regards,
Venus

Thomas Bätzler

unread,
Sep 24, 2001, 5:00:57 AM9/24/01
to
On 24 Sep 2001 01:16:27 -0700, train...@hotmail.com (venus) wrote:
>I wrote a perl script but I'm not sure if there's a better way of
>doing it.
[...]

> %LOCATION = (
> 'Gaborone,,BC', "GBE",
[...]
> );
[...]

> foreach $location_name (%LOCATION ) {
> $location_code = $LOCATION{$location_name};
>
> if ($location_name eq $location ) {
[...]

You don't need that loop - all you have to do is to see wether you
have an element with key $location in %LOCATION:

if( exists $LOCATION{$location} ){
[...]

HTH,
--
use strict;my($i,$t,@r)=(0,'5 -.@BHJPT4acd6e2hk2lmn2o4r2s3tuz',map{ord}
split//,unpack('u*','L#`T&)QD5#0`#!!`#%1D)#08`#P05!!(3``$$"``#"0L&``('.
'"`P<!`````0$`'));$t=~s/(\d)(.)/$2x$1/eg;map{$t.=substr$t,$i,1,''while
$_--;$i++}@r;print"$t\n";# Tho...@Baetzler.de - http://baetzler.de/perl

Rainer Klier

unread,
Sep 24, 2001, 7:24:09 AM9/24/01
to
Hi,

venus wrote:
>
[...]


> reading from is very big. The script reads the input text file and if
> it finds any location that equals to the list in array, it will
> replace it with location code.

[...]
> $location = "$city,$state,$country";
[...]



> foreach $location_name (%LOCATION ) {
> $location_code = $LOCATION{$location_name};

[...]
> }

Shouldn't this do it instead of the foreach loop:

$location_code = $LOCATION{$location};
if(defined($location_code)) {


printf FILE "$location_code\,";
printf FILE "%4s", "$min3\,";
printf FILE "%4s", "$max3\,";
printf FILE "%1s", "$wx3\n";
}

Regards, Rainer

nob...@mail.com

unread,
Sep 24, 2001, 7:12:58 AM9/24/01
to
train...@hotmail.com (venus) writes:

> foreach $location_name (%LOCATION ) {
> $location_code = $LOCATION{$location_name};
>
> if ($location_name eq $location ) {
>


This is like going up to a group of people and trying to find John by
asking each person in turn to tell you their name rather than simply
asking John to step forward.

if (my $location_code = $LOCATION{$location}) {

Also your use of printf is also pathological:

> printf FILE "$location_code\,";
> printf FILE "%4s", "$min3\,";
> printf FILE "%4s", "$max3\,";
> printf FILE "%1s", "$wx3\n";

Is location_code really a printf() template?

Why are you putting fixed punctuation in the data rather than the
template?

Why are you using 4 printf()s?

What't the point of specifying a minimum field width of one for the
last field when the data you are putting therein is certain to be
non-null anyhow? ("$wx3\n" must at least contain "\n"!).

printf FILE "%s,%3s,%3s,%s\n", $location_code, $min3, $max3, $wx3;

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\

Bart Lateur

unread,
Sep 24, 2001, 8:26:58 AM9/24/01
to
Rainer Klier wrote:

>$location_code = $LOCATION{$location};
>if(defined($location_code)) {
> printf FILE "$location_code\,";
> printf FILE "%4s", "$min3\,";
> printf FILE "%4s", "$max3\,";
> printf FILE "%1s", "$wx3\n";
>}

Don't use printf where you don't have to.

print FILE "$location_code,";

And why not put it all in one statement?

if(defined(my $code = $LOCATION{$location})) {
printf FILE "%s,%3s,%3s,%1s\n", $code, $min3, $max3, $wx3;
}

--
Bart.

Rainer Klier

unread,
Sep 24, 2001, 10:55:44 AM9/24/01
to
Bart Lateur wrote:
>
> Rainer Klier wrote:
>
> >$location_code = $LOCATION{$location};
> >if(defined($location_code)) {
> > printf FILE "$location_code\,";
> > printf FILE "%4s", "$min3\,";
> > printf FILE "%4s", "$max3\,";
> > printf FILE "%1s", "$wx3\n";
> >}
>
> Don't use printf where you don't have to.
>
> print FILE "$location_code,";

Just copied the inner loop of the original author.
Did not think longer about this.

> And why not put it all in one statement?

> if(defined(my $code = $LOCATION{$location})) {
> printf FILE "%s,%3s,%3s,%1s\n", $code, $min3, $max3, $wx3;
> }

Agreed for the printfs. For my point of view
on readability, $a = $b; if(defined($a)) { ..
is ok too.

Regards, Rainer

venus

unread,
Sep 25, 2001, 4:53:07 AM9/25/01
to
Hi...

What if I have an array that has different keys pointing to the same
value?

%LOCATION = (
"GBE", 'Gaborone,,BC',
"GBC", 'Gaborone,,BC',
"ADL", 'Amman,,JD', );

instead of :


%LOCATION = (
'Gaborone,,BC', "GBE",

'Amman,,JD', "ADL", );


I tried this:

%LOCATIONS = reverse %LOCATION;
while (<INPUT_FILE>) {
($city, $statecode, $countrycode, $date1, $max1, $min1,
$cond1, $date2, $max2, $min2, $cond2,
$date3, $max3, $min3, $cond3, $date4, $max4, $min4, $cond4,
$date5, $max5, $min5, $cond5, $date6, $max6, $min6, $cond6 )
= split (/,/);

$location = "$city,$statecode,$countrycode";

if( exists $LOCATIONS{$location} ){
print OUTPUT_FILE "$LOCATIONS{$location},$max1,$min1,$cond1\n";

but this will eliminate those repeated values and only prints one of
the associated key. Any other way to do it?

Appreciate if you could help.
regards,
venus

Bart Lateur

unread,
Sep 25, 2001, 9:17:14 AM9/25/01
to
venus wrote:

> What if I have an array that has different keys pointing to the same
>value?
>
> %LOCATION = (
> "GBE", 'Gaborone,,BC',
> "GBC", 'Gaborone,,BC',
> "ADL", 'Amman,,JD', );

> %LOCATIONS = reverse %LOCATION;

...

>but this will eliminate those repeated values and only prints one of
>the associated key. Any other way to do it?
>
>Appreciate if you could help.

Indeed. That's the effect you get if you do:

%hash = ( a => 1, a => 2, a => 3 );

all that is left is $hash{a} = 3 (the rightmost entry for "a" on the
list).

Instead, try something like:

while(my($k, $v) = each %LOCATION) {
push @{$LOCATIONS{$v}}, $k;
}


This will make an anonymous array entry for each old value, with all
associated keys pushed onto each.

You'll have to loop through them:

foreach (@{$LOCATIONS{$location}} {
print OUTPUT_FILE "$_,$max1,$min1,$cond1\n";
}

--
Bart.

Benjamin Goldberg

unread,
Sep 25, 2001, 3:49:38 PM9/25/01
to
venus wrote:
>
> Hi all,
>
> I wrote a perl script but I'm not sure if there's a better way of
> doing it.
> The script runs very slow because the input file (txt file) I'm
> reading from is very big. The script reads the input text file and if
> it finds any location that equals to the list in array, it will
> replace it with location code.
> Here's a part of the script:

my %LOCATION = ....;

my $StartDD;
while( <INPUT_FILE> ) {
chomp;
my @data = split /,/;
my $loc_code = $LOCATION{join ",", splice(@data, 0, 3)};
my ($min3, $max3, $wx3) = (@data[10,9], $CONDITION{$data[11]});
$StartDD = $data[0] if !defined $StartDD;
printf FILE "%s,%4s,%4s,%1s\n", $loc_code, $min2, $max3, $wx3;
}
close INPUT_FILE;
close FILE;


--
"I think not," said Descartes, and promptly disappeared.

venus

unread,
Sep 25, 2001, 11:16:34 PM9/25/01
to
Hi...

The script runs halfway and stopped with the message:

Segmentation fault (core dumped)

while (<INPUT_FILE>) {


s/^\s+//;
s/\r?\n$//;


($city, $statecode, $countrycode, $date1, $max1, $min1, $cond1,
$date2, $max2, $min2, $cond2,
$date3, $max3, $min3, $cond3, $date4, $max4, $min4, $cond4,
$date5, $max5, $min5, $cond5, $date6, $max6, $min6, $cond6 )
= split (/,/);

$location = "$city,$statecode,$countrycode";

while(my($k, $v) = each %LOCATION) {


push @{$LOCATIONS{$v}}, $k;
}

foreach (@{$LOCATIONS{$location}}) {
if(exists $LOCATIONS{$location}){
print OUTPUT_FILE"$_,$date1,$max1,$min1,$cond1\n";
}
}


Am I missing something here?

Thank you and regards...

Martien Verbruggen

unread,
Sep 25, 2001, 11:43:55 PM9/25/01
to
[you're responding to some other post, but my server doesn't have it,
and you don't leave any context in your post, so I'll have to treat it
as a new post]

On 25 Sep 2001 20:16:34 -0700,


venus <train...@hotmail.com> wrote:
> Hi...
>
> The script runs halfway and stopped with the message:
>
> Segmentation fault (core dumped)

This means that your Perl is broken. Perl should never bomb out like
that, and if it does, it is either a bug in Perl, a bug in your OS
libraries, a bug in a module that uses external libraries, or internal
compiled machine code [1]. Since I can't see any use of external modules
in your code, I can only assume that your Perl is broken (or some
system library it relies upon).

Very hard to say anything sensible about it. I would advise you to
recompile Perl.

> while (<INPUT_FILE>) {
>
>
> s/^\s+//;
> s/\r?\n$//;
> ($city, $statecode, $countrycode, $date1, $max1, $min1, $cond1,
> $date2, $max2, $min2, $cond2,
> $date3, $max3, $min3, $cond3, $date4, $max4, $min4, $cond4,
> $date5, $max5, $min5, $cond5, $date6, $max6, $min6, $cond6 )
>= split (/,/);

yuck. Why don't you use an array and indexes?

> $location = "$city,$statecode,$countrycode";
>
> while(my($k, $v) = each %LOCATION) {
> push @{$LOCATIONS{$v}}, $k;
> }

What is this supposed to do? It more or less inverts what is in
%LOCATION, and puts it in %LOCATIONS, but why is that done for each
line of input? The least that you should do is move this out of the
loop.

> foreach (@{$LOCATIONS{$location}}) {
> if(exists $LOCATIONS{$location}){
> print OUTPUT_FILE"$_,$date1,$max1,$min1,$cond1\n";
> }
> }
>
> Am I missing something here?

Nothing that has anything to do with the core dump. As I said, a core
dump in Perl most likely signifies that there is something wrong with
your perl binary.

Martien

[1] I suspect you can probably get this stuff as well if you're not
careful in signal handlers, and there may be one or more other, more
obscure, possibilities.
--
Martien Verbruggen |
Interactive Media Division | Hi, Dave here, what's the root
Commercial Dynamics Pty. Ltd. | password?
NSW, Australia |

Dave Tweed

unread,
Sep 26, 2001, 8:16:05 AM9/26/01
to
venus wrote:
> The script runs halfway and stopped with the message:
> Segmentation fault (core dumped)

Not too surprising, given that %LOCATIONS must be pretty huge by
then. You're stuffing new values into it on every iteration of the
outer loop. Two changes:

1. The inner while loop should only be done once, before you enter
the "while (<INPUT_FILE>)" loop.

2. You should invert the if and the foreach. There's no need to do
the test on every iteration if the thing you're testing doesn't
change:

while(my($k, $v) = each %LOCATION) {
push @{$LOCATIONS{$v}}, $k;
}

while (<INPUT_FILE>) {
s/^\s+//;
s/\r?\n$//;
($city, $statecode, $countrycode, $date1, $max1, $min1, $cond1,
$date2, $max2, $min2, $cond2,
$date3, $max3, $min3, $cond3, $date4, $max4, $min4, $cond4,
$date5, $max5, $min5, $cond5, $date6, $max6, $min6, $cond6 )
= split (/,/);

$location = "$city,$statecode,$countrycode";

if (exists $LOCATIONS{$location}) {


foreach (@{$LOCATIONS{$location}}) {

print OUTPUT_FILE "$_,$date1,$max1,$min1,$cond1\n";
}
}
}

-- Dave Tweed

Martien Verbruggen

unread,
Sep 26, 2001, 8:42:03 AM9/26/01
to
On Wed, 26 Sep 2001 12:16:05 GMT,
Dave Tweed <dtw...@acm.org> wrote:
> venus wrote:
>> The script runs halfway and stopped with the message:
>> Segmentation fault (core dumped)
>
> Not too surprising, given that %LOCATIONS must be pretty huge by
> then. You're stuffing new values into it on every iteration of the
> outer loop. Two changes:

Still surprising. Perl should not dump core simply because it can't get
memory anymore. In fact, it should exit with the message

Out of memory!

or one of the related messages documented in perldiag. If it dumps core
because malloc returned NULL, it is a bug.

Martien
--
Martien Verbruggen |
Interactive Media Division | Little girls, like butterflies, need
Commercial Dynamics Pty. Ltd. | no excuse - Lazarus Long
NSW, Australia |

venus

unread,
Sep 27, 2001, 11:15:31 PM9/27/01
to
Hi...

Yes, you are right. I've tried your suggestion and it works.
Thanks to you and others as well for giving me many options...

Reply all
Reply to author
Forward
0 new messages