I want to scanf, dammit!

Dmitry Epstein

unread,

Mar 25, 2004, 2:41:13 AM3/25/04

to

I know this has been asked before, mostly by novice percolytes (is
that what you call Perl adherents?), but the sages just replied:
scanf sucks, we don't need it. Well, here is a real-life situation
that I have to deal with quite frequently. See if you can suggest
an acceptable solution using the all-mighty pattern search.

I have a file with floats separated by spaces and newlines. There
can be several numbers per line, but I don't know and don't care
about exactly how they are written because that has no relation to
the logic of the problem. Suppose the file contains 2500 numbers
and I need to read the first 500, or better yet, read just the
number #500. This number can be in the middle of a line, for all I
know.

Using something like scanf or the C++ stream input operator, the
solution is ludicrously simple: just read the requisite number of
times in a loop using a single statement. But with Perl I just
don't see a simple way to do it that does not involve slurping the
entire file into memory first (and then possibly duplicating it
while applying split). Oh sure, I could read line by line, split
each line, and count the number of words read. But that's not
nearly as simple, and besides, if I have to continue working with
the file, then the issue of having read more numbers than was
needed may come up (remember, that target number may have been in
the middle of a line).

So there.

--
Dmitry Epstein
Northwestern University, Evanston, IL. USA
mitia(at)northwestern(dot)edu

Malte Ubl

unread,

Mar 25, 2004, 3:11:56 AM3/25/04

to

Dmitry Epstein wrote:
> Using something like scanf or the C++ stream input operator, the
> solution is ludicrously simple:

> So there.

http://search.cpan.org/search?query=scanf&mode=all

Does that help?

If it doesnt, just use Inline::C.

malte

Anno Siegel

unread,

Mar 25, 2004, 3:33:27 AM3/25/04

to

Dmitry Epstein <mitia....@northwestern.edu.invalid> wrote in comp.lang.perl.misc:

my $n = 500;
my @line;
while ( <DATA> ) { last if ( $n -= @line = split) < 0 }
print "$line[ $n]\n";

So there :)

Anno

Dmitry Epstein

unread,

Mar 25, 2004, 1:56:20 PM3/25/04

to

anno...@lublin.zrz.tu-berlin.de (Anno Siegel) wrote in
news:c3u5gn$elj$1...@mamenchi.zrz.TU-Berlin.DE:

> Dmitry Epstein <mitia....@northwestern.edu.invalid> wrote
> in comp.lang.perl.misc:

[snip]

>> possibly duplicating it while applying split). Oh sure, I
>> could read line by line, split each line, and count the
>> number of words read. But that's not nearly as simple, and
>> besides, if I have to continue working with the file, then
>> the issue of having read more numbers than was needed may
>> come up (remember, that target number may have been in the
>> middle of a line).
>>
>> So there.
>
> my $n = 500;
> my @line;
> while ( <DATA> ) { last if ( $n -= @line = split) < 0 }
> print "$line[ $n]\n";
>

I think your $n is negative by the time you get to the last
statement, no?

Dmitry Epstein

unread,

Mar 25, 2004, 2:15:49 PM3/25/04

to

Malte Ubl <m...@lteubl.de> wrote in news:c3u793$76k$1...@news.dtag.de:

> Dmitry Epstein wrote:
>> Using something like scanf or the C++ stream input operator,
>> the solution is ludicrously simple:
>
>> So there.
>
> http://search.cpan.org/search?query=scanf&mode=all
>
> Does that help?

String::scanf is indeed nearly useless, as the sages say. It does
not extract from the input stream directly, so I still need to read
a line or the entire file first. In that case I may as well use
split or regex.

> If it doesnt, just use Inline::C.

Er.. what's that?

Anno Siegel

unread,

Mar 25, 2004, 2:30:22 PM3/25/04

to

Dmitry Epstein <mitia....@northwestern.edu.invalid> wrote in comp.lang.perl.misc:
> anno...@lublin.zrz.tu-berlin.de (Anno Siegel) wrote in
> news:c3u5gn$elj$1...@mamenchi.zrz.TU-Berlin.DE:
> > Dmitry Epstein <mitia....@northwestern.edu.invalid> wrote
> > in comp.lang.perl.misc:

[find the nth number in a file]

> > my $n = 500;
> > my @line;
> > while ( <DATA> ) { last if ( $n -= @line = split) < 0 }
> > print "$line[ $n]\n";
> >
>
> I think your $n is negative by the time you get to the last
> statement, no?

Yes, it is, in exactly the right way :) Look for "negative" in perldata.

Anno

Paul Lalli

unread,

Mar 25, 2004, 2:52:05 PM3/25/04

to

On Thu, 25 Mar 2004, Dmitry Epstein wrote:

> I know this has been asked before, mostly by novice percolytes (is
> that what you call Perl adherents?), but the sages just replied:
> scanf sucks, we don't need it. Well, here is a real-life situation
> that I have to deal with quite frequently. See if you can suggest
> an acceptable solution using the all-mighty pattern search.
>
> I have a file with floats separated by spaces and newlines. There
> can be several numbers per line, but I don't know and don't care
> about exactly how they are written because that has no relation to
> the logic of the problem. Suppose the file contains 2500 numbers
> and I need to read the first 500, or better yet, read just the
> number #500. This number can be in the middle of a line, for all I
> know.
>

use strict;
use warnings;
use File::Stream;

open my $fh, $ARGV[0] or die "Cannot open $ARGV[0]: $!";
my $fs = File::Stream->new($fh);
local $/ = qr/\s+/;

my $i;
while (my $word = <$fs>){
next if ++$i != 500;
$word =~ s|$/||g;
print "$word\n";
}

Paul Lalli

unread,

Mar 25, 2004, 2:55:31 PM3/25/04

to

Bah. Sent the wrong version. Switch that while loop to:

my ($i, $word);
while ($word = <$fs>){
last if ++$i == 500;

Tassilo v. Parseval

unread,

Mar 25, 2004, 4:33:08 PM3/25/04

to

Also sprach Dmitry Epstein:

> Malte Ubl <m...@lteubl.de> wrote in news:c3u793$76k$1...@news.dtag.de:
>> Dmitry Epstein wrote:
>>> Using something like scanf or the C++ stream input operator,
>>> the solution is ludicrously simple:
>>
>>> So there.
>>
>> http://search.cpan.org/search?query=scanf&mode=all
>>
>> Does that help?
>
> String::scanf is indeed nearly useless, as the sages say. It does
> not extract from the input stream directly, so I still need to read
> a line or the entire file first. In that case I may as well use
> split or regex.
>
>> If it doesnt, just use Inline::C.
>
> Er.. what's that?

Inline::C allows you to embed C code into your Perl script. Here's a
super-simplistic scanner on top of fscanf() available from Perl:

#! /usr/bin/perl -w

use strict;
use Inline Config => BUILD_NOISY => 1;
use Inline 'C';

my ($a, $b) = my_scanf(\*STDIN, "%f %i");
print "float: $a\n";
print "int: $b\n";

__END__
__C__
void my_scanf (PerlIO *io, char *fmt) {
FILE *f = PerlIO_findFILE(io);
int n = 0;
double d;
int i;
Inline_Stack_Vars;
Inline_Stack_Reset;
while (*fmt) {
if (*fmt++ == '%') {
switch(*fmt) {
case 'f':
fscanf(f, "%lf", &d);
Inline_Stack_Push(sv_2mortal(newSVnv(d)));
n++;
break;
case 'i':
fscanf(f, "%i", &i);
Inline_Stack_Push(sv_2mortal(newSViv(i)));
n++;
}
}
}
Inline_Stack_Done;
Inline_Stack_Return(n);
}

The problem with scanf-alike functions is that they use variadic
arguments. That means, you have to write your own little parser for the
format string. The above only recognizes '%f' for doubles and '%i' for
integers. It shouldn't be too hard to implement a more generous subset
of a scanf-format.

Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval

Uri Guttman

unread,

Mar 25, 2004, 3:11:23 PM3/25/04

to

>>>>> "PL" == Paul Lalli <itty...@yahoo.com> writes:

PL> my ($i, $word);
PL> while ($word = <$fs>){
PL> last if ++$i == 500;
PL> }

couldn't that be just

<$fs> for 1 .. 500 ;

for provides void context so it will read one record at a time as
defined by $/.

it could be off by 1 so that might be 1 .. 499.

uri

Dmitry Epstein

unread,

Mar 26, 2004, 3:41:24 AM3/26/04

to

anno...@lublin.zrz.tu-berlin.de (Anno Siegel) wrote in

news:c3vc0e$a0d$1...@mamenchi.zrz.TU-Berlin.DE:

Well, you live and learn:

"Normal arrays are ordered lists of scalars indexed by number,
starting with 0 and with negative subscripts counting from the
end."

Dmitry Epstein

unread,

Mar 26, 2004, 3:44:59 AM3/26/04

to

"Tassilo v. Parseval" <tassilo....@rwth-aachen.de> wrote in
news:c3vj6k$5c$1...@nets3.rz.RWTH-Aachen.DE:

>>> If it doesnt, just use Inline::C.
>>
>> Er.. what's that?
>
> Inline::C allows you to embed C code into your Perl script.
> Here's a super-simplistic scanner on top of fscanf() available
> from Perl:
>
> #! /usr/bin/perl -w
>
> use strict;
> use Inline Config => BUILD_NOISY => 1;
> use Inline 'C';

[snip]

I am sorry, I can't find any mention of Inline in my Perl docs...

Dmitry Epstein

unread,

Mar 26, 2004, 3:56:57 AM3/26/04

to

Paul Lalli <itty...@yahoo.com> wrote in
news:2004032514...@dishwasher.cs.rpi.edu:

> On Thu, 25 Mar 2004, Paul Lalli wrote:
>
>> On Thu, 25 Mar 2004, Dmitry Epstein wrote:
>>
>> > I have a file with floats separated by spaces and newlines.
>> > There can be several numbers per line, but I don't know
>> > and don't care about exactly how they are written because
>> > that has no relation to the logic of the problem. Suppose
>> > the file contains 2500 numbers and I need to read the first
>> > 500, or better yet, read just the number #500. This number
>> > can be in the middle of a line, for all I know.
>> >
>>
>> use strict;
>> use warnings;
>> use File::Stream;
>>
>> open my $fh, $ARGV[0] or die "Cannot open $ARGV[0]: $!";
>> my $fs = File::Stream->new($fh);
>> local $/ = qr/\s+/;

[snip]

Thanks! Din't know about that Stream module. Will download.

Anno Siegel

unread,

Mar 26, 2004, 4:22:30 AM3/26/04

to

Dmitry Epstein <mitia....@northwestern.edu.invalid> wrote in comp.lang.perl.misc:

It's a standard module, "perldoc Inline" should show it. If not,
http://cpan.uwinnipeg.ca/htdocs/Inline/, but you should check your
installation.

Anno

Vetle Roeim

unread,

Mar 26, 2004, 4:29:29 AM3/26/04

to

* Dmitry Epstein
(...)

> I am sorry, I can't find any mention of Inline in my Perl docs...

<URL:http://search.cpan.org>. Search for Inline and Inline::C.

--
#!/usr/bin/vr

Joe Smith

unread,

Mar 26, 2004, 4:42:20 AM3/26/04

to

Dmitry Epstein wrote:

>>>>If it doesnt, just use Inline::C.
>>>
>>>Er.. what's that?

> I am sorry, I can't find any mention of Inline in my Perl docs...

The double colons indicate it is a perl module.
If you download the perl sources, you will get a bunch of core modules
with docs. If you download a distribution (like ActiveState's), you
will get more modules with docs. But that doesn't cover all possible
modules.

If it's not in your docs, and not on your disk, the thing to do is
go fetch it from CPAN. http://search.cpan.org/
-Joe

Tassilo v. Parseval

unread,

Mar 26, 2004, 7:24:41 AM3/26/04

to

Also sprach Anno Siegel:

Not quite yet. It's one of the reasons why I prefer plain XS over
Inline::C.

There are rumors that Inline is going to be standard in perl5.10,
though.

Paul Lalli

unread,

Mar 26, 2004, 7:40:35 AM3/26/04

to

Well, that would certainly read the first 500, but it wouldn't be storing
them anywhere. < > doesn't automatically assign the 'line' read into $_
unless it's used as the conditional for a while(). When used in the body
of the for loop, it reads a line and throws it away. So it would still
have to be something like:

<$fs> for 1 .. 499;
$line = <$fs>;
print $line;

Paul Lalli

Anno Siegel

unread,

Mar 26, 2004, 7:52:54 AM3/26/04

to

Tassilo v. Parseval <tassilo....@post.rwth-aachen.de> wrote in comp.lang.perl.misc:

> Also sprach Anno Siegel:
>
> > Dmitry Epstein <mitia....@northwestern.edu.invalid> wrote in
> comp.lang.perl.misc:
> >> "Tassilo v. Parseval" <tassilo....@rwth-aachen.de> wrote in
> >> news:c3vj6k$5c$1...@nets3.rz.RWTH-Aachen.DE:
> >> >>> If it doesnt, just use Inline::C.
> >> >>
> >> >> Er.. what's that?
> >> >
> >> > Inline::C allows you to embed C code into your Perl script.
> >> > Here's a super-simplistic scanner on top of fscanf() available
> >> > from Perl:
> >> >
> >> > #! /usr/bin/perl -w
> >> >
> >> > use strict;
> >> > use Inline Config => BUILD_NOISY => 1;
> >> > use Inline 'C';
> >> [snip]
> >>
> >> I am sorry, I can't find any mention of Inline in my Perl docs...
> >
> > It's a standard module, "perldoc Inline" should show it. If not,
> > http://cpan.uwinnipeg.ca/htdocs/Inline/, but you should check your
> > installation.
>
> Not quite yet. It's one of the reasons why I prefer plain XS over
> Inline::C.
>
> There are rumors that Inline is going to be standard in perl5.10,
> though.

Sorry. I could have sworn...

Anno

Uri Guttman

unread,

Mar 26, 2004, 9:58:32 AM3/26/04

to

>>>>> "PL" == Paul Lalli <itty...@yahoo.com> writes:

PL> On Thu, 25 Mar 2004, Uri Guttman wrote:
>> couldn't that be just
>>
>> <$fs> for 1 .. 500 ;

PL> Well, that would certainly read the first 500, but it wouldn't be storing
PL> them anywhere. < > doesn't automatically assign the 'line' read into $_
PL> unless it's used as the conditional for a while(). When used in the body
PL> of the for loop, it reads a line and throws it away. So it would still
PL> have to be something like:

PL> <$fs> for 1 .. 499;
PL> $line = <$fs>;
PL> print $line;

the OP said this so i was answering the 'better yet' part. :)

> > the logic of the problem. Suppose the file contains 2500 numbers
> > and I need to read the first 500, or better yet, read just the
> > number #500. This number can be in the middle of a line, for all I
> > know.

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

ctc...@hotmail.com

unread,

Mar 26, 2004, 11:46:56 AM3/26/04

to

Dmitry Epstein <mitia....@northwestern.edu.invalid> wrote:
> I know this has been asked before, mostly by novice percolytes (is
> that what you call Perl adherents?), but the sages just replied:
> scanf sucks, we don't need it. Well, here is a real-life situation
> that I have to deal with quite frequently. See if you can suggest
> an acceptable solution using the all-mighty pattern search.
>
> I have a file with floats separated by spaces and newlines. There
> can be several numbers per line, but I don't know and don't care
> about exactly how they are written because that has no relation to
> the logic of the problem.

Well, there's your problem. Why store your data in some random,
unproductive fashion?

perl -lne'print foreach split' < randomly_formatted_file > one_per_line

(Alas, this does assume each line is short enough to comfortably fit into
memory.)

> Suppose the file contains 2500 numbers
> and I need to read the first 500, or better yet, read just the
> number #500. This number can be in the middle of a line, for all I
> know.
>
> Using something like scanf or the C++ stream input operator, the
> solution is ludicrously simple: just read the requisite number of
> times in a loop using a single statement. But with Perl I just
> don't see a simple way to do it that does not involve slurping the
> entire file into memory first (and then possibly duplicating it
> while applying split). Oh sure, I could read line by line, split
> each line, and count the number of words read. But that's not
> nearly as simple, and besides, if I have to continue working with
> the file, then the issue of having read more numbers than was
> needed may come up (remember, that target number may have been in
> the middle of a line).

If you don't want to simply store the file in a sane format, you
can convert it on the fly from within your main program:

open FH, "perl -lne'print foreach split' $filename |" or die $!;

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB