The use of $1 = $1

Marc de Bourget

unread,

May 6, 2017, 3:55:53 PM5/6/17

to

BEGIN {
FS = ";"
OFS = ","
}

{
$1 = $1
print $0
}

I have noticed that without the $1 = $1 line OFS isn't used.
Is this the recommended way to use OFS or are there caveats?
Why doesn't AWK reevaluate $0 without this ugly AWK idiom?
Generally, I don't like code specific to only one language.

Ed Morton

unread,

May 6, 2017, 7:11:41 PM5/6/17

to

On 5/6/2017 2:55 PM, Marc de Bourget wrote:
> BEGIN {
> FS = ";"
> OFS = ","
> }
>
> {
> $1 = $1
> print $0
> }
>
> I have noticed that without the $1 = $1 line OFS isn't used.

In that script you mean? Right. If you removed `$1 = $1` and added `print "foo",
"bar"` then OFS would also be used.

> Is this the recommended way to use OFS or are there caveats?

It's a perfectly common way in which OFS is used.

> Why doesn't AWK reevaluate $0 without this ugly AWK idiom?

How would you write code to just print the input line if it did?

> Generally, I don't like code specific to only one language.

If all languages shared the same syntax and semantics then guess how many
languages there'd be :-).

Ed.

Marc de Bourget

unread,

May 7, 2017, 5:35:25 PM5/7/17

to

Yes, you are completely right but I still think
$1 = $1
looks like nonsense code written by an idiot :-)
Of course it is not - but only a few AWK gurus will understand the sense.

I'm beginning to understand why Perl, Ruby etc. renounce automatic input
parsing and use split and join instead. This makes the code much clearer.
Since a while I do everything in the BEGIN section without the main loop.
This makes programs more readable, better portable and not too specific.

Janis Papanagnou

unread,

May 8, 2017, 1:53:15 AM5/8/17

to

On 07.05.2017 23:35, Marc de Bourget wrote:
>> [...]

>
> Yes, you are completely right but I still think
> $1 = $1
> looks like nonsense code written by an idiot :-)

Consider it as an awk idiom, established as a _consequence_ of how awk is
defined to work. Don't look at it with the eyes of a <name some favourite
programming language here> programmer. (There's really a lot that looks
like "code written by an idiot" not only in awk, also in other languages,
whether based on C pr not.)

> Of course it is not - but only a few AWK gurus will understand the sense.

Awk is a very small and conceptually quite simple language, so it's quite
easy to learn and understand. (There's certainly pitfalls and dark corners,
but that's true for quite any language, and the larger the worse usually.)
This specific contruct is comparably easy to understand. Certainly people
who are comming from another programming language will be surprised first,
but that doesn't make its sense per se difficult to grasp. (YMMV.) I think
it's necessary in any programming language to understand its paradigms and
concepts. (And in awk that is very easy.)

>
> I'm beginning to understand why Perl, Ruby etc. renounce automatic input
> parsing and use split and join instead. This makes the code much clearer.

I'm sure you're basically right. (Although I think that perl's syntax is
quite a cryptic mess; there's much more than in awk that you will have to
get used to.) With the given construct ($1 = $1) you actually make use of
a technical side-effect; and side effects are in most cases just bad.

> Since a while I do everything in the BEGIN section without the main loop.

(I think this is a bad idea, see below, but anyway.)

> This makes programs more readable, better portable and not too specific.

Doing everything in the BEGIN section means that you abandon some of the
pros that you get with the awk language; including readability (given the
code bloat you get by this habit) or safety (e.g. see Ed's getline post).

(Certainly I understand the motivation to do what seems is fitting you.)

You obviously assume here only a specific kind of "portability"; the one
that exists - but only to a very restricted degree! -, in the same family
of [procedural (maybe even C based?)] languages. Between awk versions the
above idiom should be fairly portable. And WRT other language paradigms,
like Prolog's, Lisp's, any OO's, there's anyway no real comparison or a
simple "portability" possible.

The point of having (and using) different languages and various language
families is that you can often do specific tasks in one language(-family)
better than in another. From the final system's view it's should be anyway
irrelevant in what language the components have been written.

Janis

Ed Morton

unread,

May 8, 2017, 10:36:28 AM5/8/17

to

On 5/7/2017 4:35 PM, Marc de Bourget wrote:
> Le dimanche 7 mai 2017 01:11:41 UTC+2, Ed Morton a écrit :
>> On 5/6/2017 2:55 PM, Marc de Bourget wrote:
>>> BEGIN {
>>> FS = ";"
>>> OFS = ","
>>> }
>>>
>>> {
>>> $1 = $1
>>> print $0
>>> }
>>>
>>> I have noticed that without the $1 = $1 line OFS isn't used.
>>
>> In that script you mean? Right. If you removed `$1 = $1` and added `print "foo",
>> "bar"` then OFS would also be used.
>>
>>> Is this the recommended way to use OFS or are there caveats?
>>
>> It's a perfectly common way in which OFS is used.
>>
>>> Why doesn't AWK reevaluate $0 without this ugly AWK idiom?
>>
>> How would you write code to just print the input line if it did?
>>
>>> Generally, I don't like code specific to only one language.
>>
>> If all languages shared the same syntax and semantics then guess how many
>> languages there'd be :-).
>>
>> Ed.
>
> Yes, you are completely right but I still think
> $1 = $1
> looks like nonsense code written by an idiot :-)

No more than:

x = (foo > bar ? 3 : 27)

or many other programming constructs.

> Of course it is not - but only a few AWK gurus will understand the sense.

It's one of the first things I learned when I started using awk and I think
almost anyone who uses awk will understand it, not just a few gurus.

> I'm beginning to understand why Perl, Ruby etc. renounce automatic input
> parsing and use split and join instead. This makes the code much clearer.

Awk is a tiny language compared to those other too (I've recently been battling
the nightmare of legacy Ruby code so I know...), and is designed precisely and
solely for text manipulation unlike those other 2. For awk to have brief,
idiomatic ways to do incredibly common text manipulation operations (like
automatically splitting input into fields and recompiling records using an
output separator) makes perfect sense while that doesn't make sense in a large,
general purpose language like Perl or Ruby.

> Since a while I do everything in the BEGIN section without the main loop.

You must have a very niche usage for awk then as otherwise that doesn't make sense.

> This makes programs more readable, better portable and not too specific.

I assume by portable you mean to other languages not to awk running on other
platforms as the latter isn't true. That's like saying "I program in C++ but I
avoid all object oriented language constructs and write strictly procedural code
as This makes programs more readable, better portable and not too specific".
Well, sure that's probably true but then you're completely missing the point of
using the language that you have available.

Ed.

Marc de Bourget

unread,

May 8, 2017, 12:20:43 PM5/8/17

to

Thank you Janis and Ed for your input.

I don't have any problems with doing everything in the BEGIN section.
It works the same as for Ruby, Python, Perl etc. so there aren't any issues.
I have everything available except (F)NR which is easy to simulate:

BEGIN {
fnr = 0
while ((getline < ARGV[1]) > 0) {
++fnr
print fnr, $0, $1
}
close(ARGV[1])
}

Whilst I used to think for a long time the main input stream without getline
was the best of AWK, I now think it was a horrible design decision - error
prone and ugly without having exact control of the currently used input file.

I have written a 2000 lines AWK script with sole use of the BEGIN section.
Of course I do know this is a matter of taste.

BTW, I can't see anything strange with your code example:

x = (foo > bar ? 3 : 27)

What's wrong with it? Assigning $1 to $1 is much stranger, isn't it?

Ed Morton

unread,

May 8, 2017, 12:57:22 PM5/8/17

to

There aren't any issues just letting awk work as designed either.

> I have everything available except (F)NR which is easy to simulate:
>
> BEGIN {
> fnr = 0
> while ((getline < ARGV[1]) > 0) {
> ++fnr
> print fnr, $0, $1
> }
> close(ARGV[1])
> }
>
> Whilst I used to think for a long time the main input stream without getline
> was the best of AWK, I now think it was a horrible design decision - error
> prone and ugly without having exact control of the currently used input file.

Thats fine, everyone is welcome to have an opinion. I completely and utterly
disagree of course since I'd MUCH rather write:

{print FNR, $0, $1}

than the above 8 lines.

What you wrote doesn't emulate what awk provides for free though and you said
you don't like the automatic splitting into fields that awk does so instead of this:

{print FILENAME, NR, FNR, NF, $0, $1}

I assume what you write in your scripts is something like this:

BEGIN {
nr = 0
for ( i=1; i<ARGC; i++ ) {
fnr = 0
filename = ARGV[i]
if ( (filename !~ /=/) || (filename ~ /^\.\//) ) {
while ((getline line < filename) > 0) {
++fnr
++nr
nf = split(line,f,FS)
print filename, nr, fnr, nf, line, f[1]
}
close(filename)
}
else {
# somehow do magic to set variable "FS" to "," if this arg is "FS=,"
# and add similar tests for all other variables that might be set
# between files being read.
split(fileOrVar,a,/=/)
if (a[1] == "FS") {
FS = a[2]
}
}
}
}

I'm not even sure THAT logic is complete as I'm probably missing some other
cases that awk just handles for us by default.

> I have written a 2000 lines AWK script with sole use of the BEGIN section.
> Of course I do know this is a matter of taste.

Exactly.

> BTW, I can't see anything strange with your code example:
> x = (foo > bar ? 3 : 27)
> What's wrong with it? Assigning $1 to $1 is much stranger, isn't it?

No, once you know the syntax and understand the semantics none of it is strange,
it's just what the language defines it to be.

Ed.

Marc de Bourget

unread,

May 8, 2017, 2:32:17 PM5/8/17

to

Ok. And yes, of course I do use FS and OFS since it is more efficient than
manually splitting/unsplitting but I understand why Perl etc. don't use it.
Further, I don't deal with various numbers of input files (most times only
one or two) so my code doesn't get as complicated as you have illustrated.

It is ok. For me, speed matters most. If there are no speed issues and the
code is clear to the person who writes it, different coding styles are ok.

Janis Papanagnou

unread,

May 9, 2017, 12:16:31 AM5/9/17

to

On 08.05.2017 16:36, Ed Morton wrote:
> On 5/7/2017 4:35 PM, Marc de Bourget wrote:
>>
>> Yes, you are completely right but I still think
>> $1 = $1
>> looks like nonsense code written by an idiot :-)
>
> No more than:
>
> x = (foo > bar ? 3 : 27)
>
> or many other programming constructs.

I think there is a difference in the "quality of obscureness" here.
Both constructs can be understood on a primitive semantical level;
* you assign the value on the right to the object on the left, and
* you assign a conditional expression on the right to the object on
the left.
The difference is that the latter - even if it appears to be more
complex - evaluates in a straightforward way while the former has
implicit side-effects (reorganisation of $0, etc.) and in addition
the assignment of the same value looks strange and is (or at least
can be considered) a hack. Even as a friend of awk and its concepts
I cannot whitewash that fact.

Personally I'd have liked to see expressions like $i = j behave
differently (i.e. without side effects), to just replace $i without
changing the field separators. For a reorganisation of $0 and the $i
there could indeed be an own (and then even more flexible) function.

That all said, we should not forget, though, that the behaviour of
implicitly changing $0 as a consequence of changing any $i is a well
described behaviour of awk since the beginning of its existence, so
there's no choice but to accept it as it is.

Janis

PS (with respect to the conditional expression '? :', maybe somewhat
[OT] but anyway): Other languages don't introduce different syntaxes
for conditional expressions and conditional statements (as C did and
awk "inherited"); in, e.g., Simula or Algol you can write code like
a := IF foo > bar THEN 3 ELSE 27 FI and in Algol there's the
additional abbreviated form b := ( foo > bar | 3 | 27 ) which is
also possible to use as a statement ( foo > bar | b:=3 | b:=27 ) .
But given the eminence of C-based languages the '?:' conditional is
obviously accepted as an additional specific construct.

Marc de Bourget

unread,

May 9, 2017, 11:48:07 AM5/9/17

to

Exactly. I totally agree with everything you have written.