Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Stop Reading Input Without EOF/nextfile/getline

150 views
Skip to first unread message

Ed Morton

unread,
Sep 4, 2012, 4:53:43 PM9/4/12
to
I was recently trying to answer a question and came across a problem I
don't know if there is an answer to: how can you stop reading from an
input stream without waiting for EOF or invoking getline or (in gawk)
nextfile?

For example, let's say you have a script that prompts a user for some
numeric value and then uses that value to multiply a set of other values
stored in a file. e.g.:

$ cat file
4
5
2

$ awk -f mult.awk file
Enter multiplier: 3
3 * 4 = 12
3 * 5 = 15
3 * 2 = 6

$ awk -f mult.awk file
Enter multiplier: 4
4 * 4 = 16
4 * 5 = 20
4 * 2 = 8

That could be implemented as either of these:

-------
$ cat mult_getline.awk
BEGIN {
printf "Enter multiplier: "
getline mult < "-"
}

{ printf "%d * %d = %d\n", mult, $0, mult * $0 }
-------
$ cat mult_nextfile.awk
BEGIN {
ARGV[ARGC++] = ARGV[1]; ARGV[1] = "-"
printf "Enter multiplier: "
}

NR==FNR {
mult = $0
nextfile
}

{ printf "%d * %d = %d\n", mult, $0, mult * $0 }
-------

Is there any way to implement it without using getline or nextfile and
without the user explicitly having to enter a control-D or anything else
after their multiplier number, and without reading the whole data file
into an array? The main point is to be able to stop reading the first
input stream before an EOF is encountered.

Ed.


Posted using www.webuse.net

Janis Papanagnou

unread,
Sep 4, 2012, 6:57:35 PM9/4/12
to
Well, how should awk know about end of file? (Using close() doesn't seem
to work.)

[OT] You can force EOF at shell level...

awk '
BEGIN { printf "Enter multiplier: " }
NR==FNR { mult = $0 ; next }
{ printf "%d * %d = %d\n", mult, $0, mult * $0 }
' <(head -1) file

No pure awk solution, but a concise and simple pattern at least.

Janis

>
> Ed.
>
>
> Posted using www.webuse.net
>

Ed Morton

unread,
Sep 5, 2012, 8:58:48 AM9/5/12
to
Awk wouldn't know at run-time but the programmer knows when writing the script
when they have read the desired N lines of input.

> (Using close() doesn't seem to work.)

Right, that was my first thought too. The man page explains that by saying
close() is only for redirected input. I also tried manipulating ARGV[] but no
luck there either.

> [OT] You can force EOF at shell level...
>
> awk '
> BEGIN { printf "Enter multiplier: " }
> NR==FNR { mult = $0 ; next }
> { printf "%d * %d = %d\n", mult, $0, mult * $0 }
> ' <(head -1) file
>
> No pure awk solution, but a concise and simple pattern at least.

Thanks for that. It's hard to believe there's no way to implement "nextfile"
using "standard" awk constructs but I guess that is the truth.

Ed.

Janis Papanagnou

unread,
Sep 5, 2012, 9:33:15 AM9/5/12
to
Am 05.09.2012 14:58, schrieb Ed Morton:
> On 9/4/2012 5:57 PM, Janis Papanagnou wrote:
>> On 04.09.2012 22:53, Ed Morton wrote:
>>> I was recently trying to answer a question and came across a problem I
>>> don't know if there is an answer to: how can you stop reading from an
>>> input stream without waiting for EOF or invoking getline or (in gawk)
>>> nextfile?
>>>
[...]
>>>
>>> Is there any way to implement it without using getline or nextfile and
>>> without the user explicitly having to enter a control-D or anything else
>>> after their multiplier number, and without reading the whole data file
>>> into an array? The main point is to be able to stop reading the first
>>> input stream before an EOF is encountered.
>>
>> Well, how should awk know about end of file?
>
> Awk wouldn't know at run-time but the programmer knows when writing the
> script when they have read the desired N lines of input.
>
>> (Using close() doesn't seem to work.)
>
> Right, that was my first thought too. The man page explains that by
> saying close() is only for redirected input. I also tried manipulating
> ARGV[] but no luck there either.

I think that it would be nice if "close(FILENAME)" would make the
(non-standard) 'nextfile' obsolete. (Can't tell, though, whether
such a change would break any existing programs.)

>
>>
[...]
>> No pure awk solution, but a concise and simple pattern at least.
>
> Thanks for that. It's hard to believe there's no way to implement
> "nextfile" using "standard" awk constructs but I guess that is the truth.

The process redirection pattern that I offered as a possible answer to
your question (to avoid Ctrl-D, nextfile, getline) is inappropriate
for the question of the other original thread; the solution requires
to know in advance how many lines of input are necessary, and in case
of user-input you never know how many mistypes will occur that you have
to parse and skip with a retry. My opinion here is that getline is the
option to choose (in case that we want an awk-only solution[*]).

Janis

[*] Typically I often do the parameter input on shell level and pass
the input via option -v to awk.

>
> Ed.
>

Kenny McCormack

unread,
Sep 5, 2012, 9:37:12 AM9/5/12
to
In article <k27kap$p0m$1...@speranza.aioe.org>,
Janis Papanagnou <janis_pa...@hotmail.com> wrote:
...
>I think that it would be nice if "close(FILENAME)" would make the
>(non-standard) 'nextfile' obsolete. (Can't tell, though, whether
>such a change would break any existing programs.)

FWIW, TAWK documents "close(FILENAME)" as the "skip the rest of this file"
method (equivalent of "nextfile" in GAWK).

--
Faced with the choice between changing one's mind and proving that there is
no need to do so, almost everyone gets busy on the proof.

- John Kenneth Galbraith -

Manuel Collado

unread,
Sep 5, 2012, 5:12:58 PM9/5/12
to
El 05/09/2012 14:58, Ed Morton escribi�:
> ...
> Thanks for that. It's hard to believe there's no way to implement
> "nextfile" using "standard" awk constructs but I guess that is the truth.

Please look at the Gawk manual:

"12.2.1 Implementing nextfile as a Function"

Regards,
--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado



Janis Papanagnou

unread,
Sep 5, 2012, 6:57:21 PM9/5/12
to
On 05.09.2012 23:12, Manuel Collado wrote:
> El 05/09/2012 14:58, Ed Morton escribi�:
>> ...
>> Thanks for that. It's hard to believe there's no way to implement
>> "nextfile" using "standard" awk constructs but I guess that is the truth.
>
> Please look at the Gawk manual:
>
> "12.2.1 Implementing nextfile as a Function"

As far as I see, neither an efficient version (doesn't skip the file,
rather it reads every line), nor usable with a non-terminated stdin,
as had been requested in the other thread.

Janis

Ed Morton

unread,
Sep 5, 2012, 8:51:58 PM9/5/12
to
On 9/5/2012 4:12 PM, Manuel Collado wrote:
> El 05/09/2012 14:58, Ed Morton escribi�:
>> ...
>> Thanks for that. It's hard to believe there's no way to implement
>> "nextfile" using "standard" awk constructs but I guess that is the truth.
>
> Please look at the Gawk manual:
>
> "12.2.1 Implementing nextfile as a Function"

Yes, I saw that but it doesn't implement the nextfile functionality I'm looking
for in this context, i.e. open the next file before hitting EOF in the current file.

Ed.


Manuel Collado

unread,
Sep 6, 2012, 6:19:02 AM9/6/12
to
El 06/09/2012 2:51, Ed Morton escribi�:
I see.

Perhaps the conclusion is that the implicit data input loop of awk is
not intended for interactive use.

So there is a need for an alternative interactive input mechanism based
on getline or on the non-portable system() or pipe command.

Ed Morton

unread,
Sep 6, 2012, 9:47:45 AM9/6/12
to
Manuel Collado <m.co...@domain.invalid> wrote:

> El 06/09/2012 2:51, Ed Morton escribiᅵ:
> > On 9/5/2012 4:12 PM, Manuel Collado wrote:
> >> El 05/09/2012 14:58, Ed Morton escribiᅵ:
> >>> ...
> >>> Thanks for that. It's hard to believe there's no way to implement
> >>> "nextfile" using "standard" awk constructs but I guess that is the
> >>> truth.
> >>
> >> Please look at the Gawk manual:
> >>
> >> "12.2.1 Implementing nextfile as a Function"
> >
> > Yes, I saw that but it doesn't implement the nextfile functionality I'm
> > looking for in this context, i.e. open the next file before hitting EOF
> > in the current file.
>
> I see.
>
> Perhaps the conclusion is that the implicit data input loop of awk is
> not intended for interactive use.

It's more than an interactive use problem though, the same problem arises if you
have a 10 gig file and only want to read up until you find a keyword or only the
first N lines or something in the initial part of the file. Without nextfile or
getline you're stuck reading (and ignoring) the remaining bulk of the input file
after you got what you wanted out of it.

> So there is a need for an alternative interactive input mechanism based
> on getline or on the non-portable system() or pipe command.

Intuitively "close(FILENAME)" should just behave as nextfile does, as Janis had
mentioned, so we wouldn't need nextfile or to write getline loops as a workaround.

The getline workaround would get particularly ugly if the file in question was
the middle one of several input files. Since you'd need to deal with it before
awk opens it naturally, you'd need to look for that file in ARGV[] in the BEGIN
section or while parsing a previous file before you get to the target one, parse
it there, store it's contents in an array for accessing later (or otherwise
handle it), move all the other files up a slot in ARGV[] so that file doesn't
get parsed again, etc.

Ed.


Posted using www.webuse.net

Ed Morton

unread,
Sep 6, 2012, 1:35:32 PM9/6/12
to
Kenny McCormack <gaz...@shell.xmission.com> wrote:

> In article <k27kap$p0m$1...@speranza.aioe.org>,
> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
> ....
> >I think that it would be nice if "close(FILENAME)" would make the
> >(non-standard) 'nextfile' obsolete. (Can't tell, though, whether
> >such a change would break any existing programs.)
>
> FWIW, TAWK documents "close(FILENAME)" as the "skip the rest of this file"
> method (equivalent of "nextfile" in GAWK).
>

I'm curious - what does tawk do with this code:

$ cat file
line 1
line 2
line 3

$ awk '{ getline var < FILENAME; close(FILENAME); print $0, var }' file
line 1 line 1
line 2 line 1
line 3 line 1

It's ambiguous whether the close(FILENAME) would mean "close the file named
FILENAME opened by getline" or "close the file named FILENAME opened by the main
awk work loop". The above is what a POSIX awk should do.

Ed.




Posted using www.webuse.net

Aharon Robbins

unread,
Sep 7, 2012, 6:30:35 AM9/7/12
to
Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>>I think that it would be nice if "close(FILENAME)" would make the
>>(non-standard) 'nextfile' obsolete. (Can't tell, though, whether
>>such a change would break any existing programs.)

It undoubtedly would, and it would not be portable amongst awks,
even though it doesn't use any non-standard keywords.

Kenny McCormack <gaz...@shell.xmission.com> wrote:
>FWIW, TAWK documents "close(FILENAME)" as the "skip the rest of this file"
>method (equivalent of "nextfile" in GAWK).

Again - only in tawk. This caused some confusion many years ago, since
in awk/gawk it doesn't return an error (closing a file that isn't open
is a no-op) at which point I documented it explicitly that close()
is just for redirections.

But as Ed surmised, there is no way in standard awk to emulate
nextfile without reading the records. The gawk manual has a nextfile
function to do this. However, it relies upon being able to call next
from a function, and not all awks support that either. :-(
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL

Janis Papanagnou

unread,
Sep 7, 2012, 4:58:10 PM9/7/12
to
On 07.09.2012 12:30, Aharon Robbins wrote:
> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>>> I think that it would be nice if "close(FILENAME)" would make the
>>> (non-standard) 'nextfile' obsolete. (Can't tell, though, whether
>>> such a change would break any existing programs.)
>
> It undoubtedly would,

While I can't tell I still have my doubts. I am trying to think of a
situation where one would have programmed close(FILENAME) or close(var)
where var is corresponding to FILENAME, a file name passed as argument.
And function close() operates on "a file or pipe opened for output".
So to get some conflicting situation you'd have to close a redirected
file that's also a parameter file that you want to skip. A potentially
useful redirection into a file that's also a parameter might be, e.g.,

awk 'NR==FNR { ... } NR!=FNR { print > "param1" }' param1 param2

Now where would an added close(FILENAME) break such programs? Hmm...
Or is there maybe some simpler example that would obviously break?
Can anyone suggest some sensible construct which would break?

> and it would not be portable amongst awks,
> even though it doesn't use any non-standard keywords.

Well, in the context of non-standard 'nextfile' portability is not my
primary concern. Generally, whenever we add a new function either way,
as an added functionality for close(), or an added keyword nextfile,
it's not portable. Any new non-standard function that has some effect
is not portable; is it?

> [...]
> Again - only in tawk. This caused some confusion many years ago, since
> in awk/gawk it doesn't return an error (closing a file that isn't open
> is a no-op) at which point I documented it explicitly that close()
> is just for redirections.

I don't recognise it, but does that fact (the error return code) break
an added close(FILENAME) "skip file" functionality?

Reading in A.W.K.'s original book about close() also doesn't indicate
any problem for such a close(FILENAME) extension.

But since Aharon seems to have a strong opinion on that, I assume that
I still missed something, something that's maybe even obvious after some
more coffee. Any enlightenments appreciated.

Janis

> [...]

Kenny McCormack

unread,
Sep 8, 2012, 4:44:14 AM9/8/12
to
In article <201209061...@webuse.net>,
The results are pretty much what you'd expect - and, yes, I do think this is
less than optimal behavior (by TAWK). Having a dedicated "nextfile"
command in the language is much better.

--
"We should always be disposed to believe that which appears to us to be
white is really black, if the hierarchy of the church so decides."

- Saint Ignatius Loyola (1491-1556) Founder of the Jesuit Order -

Manuel Collado

unread,
Sep 8, 2012, 1:25:46 PM9/8/12
to
El 08/09/2012 10:44, Kenny McCormack escribi�:
> In article <201209061...@webuse.net>,
> Ed Morton <morto...@gmail.com> wrote:
>> Kenny McCormack <gaz...@shell.xmission.com> wrote:
>> ...
>> I'm curious - what does tawk do with this code:
>>
>> $ cat file
>> line 1
>> line 2
>> line 3
>>
>> $ awk '{ getline var < FILENAME; close(FILENAME); print $0, var }' file
>> line 1 line 1
>> line 2 line 1
>> line 3 line 1
>>
>> It's ambiguous whether the close(FILENAME) would mean "close the file named
>> FILENAME opened by getline" or "close the file named FILENAME opened by the
>> main awk work loop". The above is what a POSIX awk should do.
>
> The results are pretty much what you'd expect - and, yes, I do think this is
> less than optimal behavior (by TAWK). Having a dedicated "nextfile"
> command in the language is much better.

I've always seen redirection as follows:

1.- The exact string used to designate the file is used as the name of the
file descriptor used for i/o.

2.- The same file can be opened several times at a given time if is it
named by several distinct equivalent file designators.

3.- close("file") just closes the file descriptor whose name is exactly "file".

Example:

--- data1234.txt
one
two
three
four

--- redirect.awk
{
getline s < "data1234.txt"
getline t < "./data1234.txt"
print
print s
print t
print ""
if (NR==2) close("data1234.txt")
}

--- invocation
awk -f redirect.awk data1234.txt
one
one
one

two
two
two

three
one
three

four
two
four


The data file is opened three times simultaneously. The implicit input loop
uses an unnamed descriptor the cannot be closed, because there is no
descriptor name to be given as argument to close().

With this mental model, the natural way to close the current input file
should be just close() and not close(FILENAME). Of course, close() without
arguments is not a valid awk code.

Janis Papanagnou

unread,
Sep 8, 2012, 1:56:58 PM9/8/12
to
On 08.09.2012 19:25, Manuel Collado wrote:
>> [...]
> I've always seen redirection as follows:
>
> 1.- The exact string used to designate the file is used as the name of the
> file descriptor used for i/o.
>
> 2.- The same file can be opened several times at a given time if is it named
> by several distinct equivalent file designators.
>
> 3.- close("file") just closes the file descriptor whose name is exactly "file".
>
> Example:
[...]
>
>
> The data file is opened three times simultaneously. The implicit input loop
> uses an unnamed descriptor the cannot be closed, because there is no
> descriptor name to be given as argument to close().
>
> With this mental model, the natural way to close the current input file should
> be just close() and not close(FILENAME). Of course, close() without arguments
> is not a valid awk code.

Good points!

Allowing a simple close() for that purpose as an awk extension seems to not
break existing programs as well.

Janis

Aharon Robbins

unread,
Sep 8, 2012, 2:26:34 PM9/8/12
to
>On 08.09.2012 19:25, Manuel Collado wrote:
>> With this mental model, the natural way to close the current input file should
>> be just close() and not close(FILENAME). Of course, close() without arguments
>> is not a valid awk code.

This is a nice way to think about it. Unfortunately, we're about 25
years too late.

In article <k2g0s4$m0a$1...@news.m-online.net>,
Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>Good points!
>
>Allowing a simple close() for that purpose as an awk extension seems to not
>break existing programs as well.

It will break any new program that relies on these semantics when
run on an implementation that doesn't support it.

At this point, since all of gawk, Brian Kernihan's awk, and current
mawk support `nextfile', the right thing to do would be to try to
get POSIX to add nextfile to the standard, since there is now lots
of existing practice to justify it.

In addition, nextfile is exceedingly clearer as to what's going on than
`close()' or `close(FILENAME)' is.

Janis Papanagnou

unread,
Sep 8, 2012, 2:58:12 PM9/8/12
to
On 08.09.2012 20:26, Aharon Robbins wrote:
>> On 08.09.2012 19:25, Manuel Collado wrote:
>>> With this mental model, the natural way to close the current input file should
>>> be just close() and not close(FILENAME). Of course, close() without arguments
>>> is not a valid awk code.
>
> This is a nice way to think about it. Unfortunately, we're about 25
> years too late.
>
> In article <k2g0s4$m0a$1...@news.m-online.net>,
> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>> Good points!
>>
>> Allowing a simple close() for that purpose as an awk extension seems to not
>> break existing programs as well.
>
> It will break any new program that relies on these semantics when
> run on an implementation that doesn't support it.

This is true for every new feature, running on any non-standard awk.

When I am saying "breaking existing programs" I am thinking of changed
behaviour WRT the operational semantics of the discussed construct.
In this respect, I think, extending "close(var)" could break existing
programs, 'nextfile' in the general case would break existing programs,
close() would not break programs.

>
> At this point, since all of gawk, Brian Kernihan's awk, and current
> mawk support `nextfile',

Haven't we read here in c.l.a just recently that 'nextfile' was not
available for some poster?

So...

> the right thing to do would be to try to
> get POSIX to add nextfile to the standard, since there is now lots
> of existing practice to justify it.

...but WRT extensions through POSIX, I think adding nextfile alone is
not enough; in that case all those related extensions from gawk, line
BEGINFILE etc., should be added as well.

>
> In addition, nextfile is exceedingly clearer as to what's going on than
> `close()' or `close(FILENAME)' is.

Probably. But close() is not bad either, IMO.

Janis

0 new messages