I understood that article that the OP wanted to jump to a specific
_record_ ("by telling it a starting point setting the NR"), not to
a byte offset, as usually used in seek functions).
> I do not know, if this really is a problem even for very large files,
> gawk should be able to skip to a given position quite quickly. For
> instance if we would like to start at line number 20 million:
>
> NR>=20000000 { do something }
>
> should be fast? Anyway, it could still be interesting to test this.
>
> Here is how I would like it to work: In the BEGIN block you write
>
> @load "fseek"
> BEGIN {
> fseek(20000000)
> }
>
> then everything else works as usual. So in effect, the only difference
> is that gawk believes that the input file is much smaller than it
> really is, and starts at byte position 20000000 instead of at the
> beginning of the file.. (So NR now counts from that position)
You seem to be mixing concepts; byte positions and record numbers.
The point with seek is that you can directly address the position
and jump to it, while with a record number you have no chance to
do that, even in the simple non-regexp RS case.
It's an open question how interrogate the (byte-)position in case
that the awk program wants to interrupt processing of the current
slice, how to pass it to the environment (to be able to pass it
back again for a later invocation). If you'd use stdout you could
hardly use stdout for regular output; you'd be forced to write
files then. Not very nice.
BTW, the OP at stackoverflow may also not have a good approach if
he loops in shell instead of letting awk do the looping. The OP
has also the option to do the processing in shell alone. The newer
versions of ksh93, for instance, support seek operations with its
redirection operators.
Yet again it's worth to ask whether there's really any performance
issue. I checked it on my system with a test file of size 860332414
bytes that contains 2951832 lines of text. The positioning simply
done with
NR<2000000 { next }
{ exit(0) }
at about 2/3rd of the file required less than a second.
Janis