In article <
16b26474-e6b9-4ed3...@googlegroups.com>,
Hein RMS van den Heuvel <
heinvand...@gmail.com> wrote:
> because I knew the records were in order, and sql did not know this, I
> could get the answer in a few minutes with a perl hack, including the time to
> write that program :-)
>
> use VMS::Stdio qw( &vmsopen );
> my $temp = shift or die "please provide a file";
> # perl IO fails on fixed length records... too easy? Use system calls
> my $fh = vmsopen($temp, "ctx=rec", "mbc=120", "mbf=4") or die "Could not open
> $temp for input. $!\n";
> while (sysread ($fh,$_,34)) { # unfortunately need to know the record size
> $a = substr($_,0,8); # key size
> next if $a eq $o; # new = old/prior?
> $i++; # count it
> $o = $a; # what was new is now old.
> }
> print "uniq=$i\n"
Nice to see Perl in action. Here are two alternative implementations
that will give you the same answer. The first one uses the input record
separator variable $/ to limit record reads to the size of the record.
Normally when set to a character string, this variable replaces \n with
something else that is considered the division between "records" in a
text file. But when set to a reference to an integer, it indicates the
maximum number of characters to read in a single read operation, which
on true record-oriented files gives you one record per read. This
method doesn't get you the buffer control you have from vmsopen, but
it's simpler:
$ type
readhsh.pl
use strict;
$/=\34;
my $keycount;
binmode STDIN; # disable encodings
my $oldkey = '';
while (<>) {
my $key = substr($_,0,8);
$keycount++ if $key ne $oldkey;
$oldkey = $key;
}
print "uniq=$keycount\n";
Run it like:
$ perl
readhsh.pl < foo.tmp
The second alternative goes the other direction. It's more complicated
and is overkill for this example, but could also be scaled up to more
complicated problems. It uses the VMS::IndexedFile extension, which
creates a hash tied to the indexes of an indexed file. Referencing a
key in the hash means doing an indexed read of the file. So, for
example, if the file were not in the same order as the key of interest,
you will still get the right answer because it would be reading down the
index you specify.
$ type
readhsh2.pl
use strict;
use VMS::IndexedFile;
my $fdl = "FILE; ORGANIZATION indexed; RECORD; CARRIAGE_CONTROL none; "
. "FORMAT fixed; SIZE 34; KEY 0; CHANGES no; DUPLICATES yes; "
. "SEG0_POSITION 0; SEG0_LENGTH 8; TYPE string;";
my %h;
tie (%h, 'VMS::IndexedFile', $ARGV[0], 0, O_RDONLY, $fdl)
or die "failed to tie $ARGV[0]\n";
my $keycount = 0;
my $oldkey = '';
for my $key (keys %h) {
$keycount++ if $key ne $oldkey;
$oldkey = $key;
}
untie %h;
print "uniq=$keycount\n";
Run it like:
$ perl
readhsh2.pl foo.tmp
If you have Perl 5.10.0 or later, write access to your Perl
installation, and an Internet connection, you can get VMS::IndexedFile
by doing:
$ cpanp install "VMS::IndexedFile"