Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Per-stream RS (and RT)

55 views
Skip to first unread message

Kenny McCormack

unread,
Aug 10, 2017, 2:01:30 PM8/10/17
to
Note: This discussion is all GAWK-specific.

Who hasn't been in this situation? You develop a nice algorithm that
involves changing RS and everything is fine until you need to read a file
the normal way (i.e., line-by-line). Now you have to write code to save
and restore RS - and your formerly nice clean algorithm starts getting
ugly. And these same comments apply to RT, which is no longer pure and
clean like before.

Well, where I am going with this is that it would be nice to have RS/RT
(and some others - see later on) be local to each stream. Note that GAWK
already has some stream-specific stuff in PROCINFO: e.g.,

PROCINFO["input_name", "READ_TIMEOUT"] and others like that.

There's room in the spec for having a per-stream RS implemented the same
way.

Note: It is also common to want a per-stream FS. This could be implemented
similarly.

--
He must be a Muslim. He's got three wives and he doesn't drink.

Joe User

unread,
Aug 10, 2017, 4:18:57 PM8/10/17
to
Kenny McCormack wrote:

> Well, where I am going with this is that it would be nice to have RS/RT
> (and some others - see later on) be local to each stream.

It's hard to get something simpler than this untested thing:

BEGIN {stdRS = RS }
BEGINFILE { if (FILENAME ~ /.*txt$/) RS = "^$";
else RS = stdRS;}

It seems like no matter what you do, you'll still have to specify RS, using
ARGIND or FILENAME, or something similar, before file reading starts.


Janis Papanagnou

unread,
Aug 10, 2017, 8:45:28 PM8/10/17
to
This is the straightforward way to "reset" those variables in gawk.

The simple requirement to have local definitions for variables can
be achieved even with old fashioned command line definitions...

awk ' ... ' RS=A file_1 RS=$'\n' file_2 file_3 RS=C file_4


Janis

Joe User

unread,
Aug 10, 2017, 9:18:31 PM8/10/17
to
Janis Papanagnou wrote:

> awk ' ... ' RS=A file_1 RS=$'\n' file_2 file_3 RS=C file_4
>

That's the simplest, if you know what files you are processing ahead of time.


Kenny McCormack

unread,
Aug 11, 2017, 3:10:32 AM8/11/17
to
In article <97b35$598d05a0$adf2c163$12...@API-DIGITAL.COM>,
But totally not the point, of course.

--
"Insisting on perfect safety is for people who don't have the balls to live
in the real world."

- Mary Shafer, NASA Ames Dryden -

Kenny McCormack

unread,
Jan 5, 2018, 12:46:25 PM1/5/18
to
In article <omi71o$1f0$1...@news.xmission.com>,
Kenny McCormack <gaz...@shell.xmission.com> wrote:
>Note: This discussion is all GAWK-specific.
>
>Who hasn't been in this situation? You develop a nice algorithm that
>involves changing RS and everything is fine until you need to read a file
>the normal way (i.e., line-by-line). Now you have to write code to save
>and restore RS - and your formerly nice clean algorithm starts getting
>ugly. And these same comments apply to RT, which is no longer pure and
>clean like before.
>
>Well, where I am going with this is that it would be nice to have RS/RT
>(and some others - see later on) be local to each stream. Note that GAWK
>already has some stream-specific stuff in PROCINFO: e.g.,
>
> PROCINFO["input_name", "READ_TIMEOUT"] and others like that.
>
>There's room in the spec for having a per-stream RS implemented the same
>way.

Well, it's been 5 months since I posted this, and the responses have been
typically useless. All of the responses (workarounds) have both:
1) Utterly missed the point.
and
2) Been so trivially obvious as to have obviously already been in the
knowledge base of the OP (me). Or, to put it more bluntly, "No
sh*t, Sherlock!".

That all said, let me note that the OP *was* a little shy on examples (OP
naively assumed that people would be able to work out the examples on their
own), so let me add one now. When using the GAWK networking functionality
to interact with HTTP sites, it is often convenient to change RS and/or
ORS, due to the rather Unix-unfriendly way that the HTTP protocol was
specified (read: It is defined to use what-has-come-to-be-known-as
DOS/Windows line endings rather than Unix style).

The effect of this is that once you've changed RS and ORS, you find that
standard normal things stop working - and you scratch your head - and
eventually you work it out that (OMG!) the reason things aren't working is
because of the changed RS/ORS. And then you have to retro-fit to make the
normal stuff (that expects normal RS/ORS) work as expected without breaking
the networking stuff (that needs the modified RS/ORS). This is a situation
where it would be really nice to be able to specify that the HTTPService
stream use the modified RS/ORS, but that the rest of the program should
continue to behave as expected.

I ran into this today, when I was working on a simple HTTP GET script,
using the "geturl.awk" script from the distro as the basis. I got it all
working (of course), but it certainly would have been nice to have the
functionality as described in this thread.

--
"Women should not be enlightened or educated in any way. They should be
segregated because they are the cause of unholy erections in holy men.

-- Saint Augustine (354-430) --
0 new messages