Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

gawk 4.1.3 gensub() warning?

1,340 views
Skip to first unread message

Ed Morton

unread,
Sep 2, 2015, 11:51:32 PM9/2/15
to
Anyone else seeing this warning with gawk 4.1.3:

$ echo 'a' | gawk '{print gensub(/a/,"b","")}'
gawk: cmd. line:1: (FILENAME=- FNR=1) warning: gensub: third argument `' treated
as 1
b

Any workaround other than changing the 3rd arg to 1 in all of my scripts?

Ed.

Josef Frank

unread,
Sep 3, 2015, 3:56:45 AM9/3/15
to
On 2015-09-03 05:51 Ed Morton wrote:

> Anyone else seeing this warning with gawk 4.1.3:
>
> $ echo 'a' | gawk '{print gensub(/a/,"b","")}'
> gawk: cmd. line:1: (FILENAME=- FNR=1) warning: gensub: third argument `'
> treated as 1
> b
>

Same here.

This is documented behavior (but not quite, to be honest):

"If how is a string beginning with ‘g’ or ‘G’, then it replaces all
matches of regexp with replacement. Otherwise, how is treated as a
number that indicates which match of regexp to replace. [...] If the how
argument is a string that does not begin with ‘g’ or ‘G’, or if it is a
number that is less than or equal to zero, only one substitution is
performed. If how is zero, gawk issues a warning message."

in addition: "the empty string [...] is zero if converted to a number"


The only thing departing from documentation is that in addition to only
making one substitution the warning also is given when using a string as
the "how" argument that has a value other than 0 if converted to a
number e.g. "3aif" (i.e. starting with a digit). So maybe the docs might
be changed to "... only one substitution is performed, and gawk issues a
warning message."

This behavior is different from older versions (e.g. 3.1.6) where the
warning was only given when the how argument was explicitly 0 or "0".

> Any workaround other than changing the 3rd arg to 1 in all of my scripts?
>
> Ed.

Using an older version? (Just joking ;-)


jf

Ed Morton

unread,
Sep 3, 2015, 7:49:43 AM9/3/15
to
Thanks for the info. This new warning is like if sed suddenly started issuing
warnings if you wrote `s/foo/bar/` instead of `s/foo/bar/1`. Since others are
seeing it too I'll follow up with bug...@gnu.org.

Ed.

Kenny McCormack

unread,
Sep 3, 2015, 8:01:08 AM9/3/15
to
In article <ms9c19$hus$1...@dont-email.me>,
Ed Morton <morto...@gmail.com> wrote:
...
>Thanks for the info. This new warning is like if sed suddenly started issuing
>warnings if you wrote `s/foo/bar/` instead of `s/foo/bar/1`. Since others are
>seeing it too I'll follow up with bug...@gnu.org.
>
> Ed.

It's not a bug. The documentation was always clear that the acceptable
values were either 1 or G/g. You were always in violation of that. Your
sed analogy is completely bogus (as expected).

BTW, I lobbied for (and got) this better error reporting into gensub().

--
Mike Huckabee has yet to consciously uncouple from Josh Duggar.

Kenny McCormack

unread,
Sep 3, 2015, 11:13:58 AM9/3/15
to
In article <ms9cq3$6td$1...@news.xmission.com>,
Kenny McCormack <gaz...@shell.xmission.com> wrote:
...
>It's not a bug. The documentation was always clear that the acceptable
>values were either 1 or G/g.

Correction: This should, of course, be "a positive integer" not (only) "1".
Also, I think any string that begins with either "G" or "g" is OK as well.

--
"There's no chance that the iPhone is going to get any significant market share. No chance." - Steve Ballmer

Kaz Kylheku

unread,
Sep 3, 2015, 1:55:43 PM9/3/15
to
On 2015-09-03, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> In article <ms9cq3$6td$1...@news.xmission.com>,
> Kenny McCormack <gaz...@shell.xmission.com> wrote:
> ...
>>It's not a bug. The documentation was always clear that the acceptable
>>values were either 1 or G/g.
>
> Correction: This should, of course, be "a positive integer" not (only) "1".
> Also, I think any string that begins with either "G" or "g" is OK as well.

Indeed, the documentation says that "if how is a string beginning with ‘g’ or
‘G’ (short for “global”), then replace all matches of regexp with replacement".

That doesn't mean it's a good idea to put something after that 'g' or 'G',
because that could be an area of future extension.

(If there is no such extension now, the documentation should just say, 'the
string "g" or "G"'. It's a mistake to give a seeming assurance that trailing
characters are ignored, if that assurance might be taken away.)

Ed Morton

unread,
Sep 3, 2015, 10:16:01 PM9/3/15
to
On 9/3/2015 7:01 AM, Kenny McCormack wrote:
> In article <ms9c19$hus$1...@dont-email.me>,
> Ed Morton <morto...@gmail.com> wrote:
> ...
>> Thanks for the info. This new warning is like if sed suddenly started issuing
>> warnings if you wrote `s/foo/bar/` instead of `s/foo/bar/1`. Since others are
>> seeing it too I'll follow up with bug...@gnu.org.
>>
>> Ed.
>
> It's not a bug. The documentation was always clear that the acceptable
> values were either 1 or G/g. You were always in violation of that. Your
> sed analogy is completely bogus (as expected).
>
> BTW, I lobbied for (and got) this better error reporting into gensub().
>

Great work Kenny, a valuable contribution to the community. Maybe you now could
lobby for a warning message when the 2nd arg to substr() is out of range:

$ awk 'BEGIN{print substr("foobar",-127,3)}'
foo

$ awk 'BEGIN{print substr("foobar","",3)}'
foo

since that would create even more unnecessary work for even more people (though
in the substr() case it might actually be finding a problem, unlike the gensub()
case).

Ed.

Janis Papanagnou

unread,
Sep 4, 2015, 4:28:26 AM9/4/15
to
Curious; what was your primary intention to use an empty string
in gensub and not one of the documented/defined parameter values?

Your substr statement made me curious, and I noticed that it's
not directly possible to use negative indices to implement a
sliding substring window across a string:

$ awk 'BEGIN{s="ABCD"; for (i=-2;i<=length(s);i++) print substr(s,i,3)}'
ABC
ABC
ABC
ABC
BCD
CD
D

At the end of the string the data is trimmed, but not at the front
of the string. So, yes, a warning - assuming unchanged behaviour -
might even be appropriate.

Janis

>
> Ed.

Ed Morton

unread,
Sep 4, 2015, 8:42:08 AM9/4/15
to
I don't know where I got the idea but I suspect I assumed gensub() was the same
as sed in that that arg should be g/G for global, or a number for a specific
occurrence or empty for the first occurrence. I may even have googled examples
and seen it used that way, e.g. http://pbraun.nethence.com/unix/lang/awk.html or
https://www.bignerdranch.com/blog/a-crash-course-in-awk/.

Whatever got me started, I've been doing it since I first started using gawk and
since printing a warning for the "how" being empty is new functionality, I
suspect the gawk man page didn't contain the statement:

If how is zero, gawk issues a warning message.

until recently and without that I would have had no reason to think I HAD to put
a 1 in there since there's no example given in the manual of altering just the
first match and the rest of the info says:

how is treated as a number indicating which match of regexp to replace

and

If the how argument ... is a number that is less than or equal to zero,
only one substitution is performed.

so while glancing through the manual none of that would have lead me to think I
was wrong in just using "".

Even if the statement "If how is zero, gawk issues a warning message." had
always been present but that functionality not implemented I could easily have
seen myself thinking it made sense to report gensub(/o/,"x",0,"foobar") since
that'd also be an error in sed:

$ echo 'foobar' | sed 's/o/x/0'
sed: -e expression #1, char 7: number option to `s' command may not be zero

but not relating that to gensub(/o/,"x","","foobar") being a problem since it's
not a problem in sed to not specify that qualifier.

It may all come back to me thinking that fundamentally gensub() is giving us
sed-like functionality in awk.

> Your substr statement made me curious, and I noticed that it's
> not directly possible to use negative indices to implement a
> sliding substring window across a string:

Not entirely sure what that means but I threw a negative value in there just to
show it's possible, the buggy code I typically do come across is:

$ awk 'BEGIN{s="ABCD"; for (i=length(s);i>=0;i--) print substr(s,i,1)}'
D
C
B
A
A

It's easy to see the bug with a small string and printing the values each
iteration but of course that's not always what's happening in the loop.

Regards,

Ed.

Janis Papanagnou

unread,
Sep 4, 2015, 9:19:01 AM9/4/15
to
Am 04.09.2015 um 15:42 schrieb Ed Morton:
> On 9/4/2015 3:28 AM, Janis Papanagnou wrote:
>>
>> Curious; what was your primary intention to use an empty string
>> in gensub and not one of the documented/defined parameter values?
>
> I don't know where I got the idea but I suspect I assumed gensub() was
> the same as sed in that that arg should be g/G for global, or a number
> for a specific occurrence or empty for the first occurrence. [...]
[...]
> It may all come back to me thinking that fundamentally gensub() is
> giving us sed-like functionality in awk.

I don't think that comparisons of awk with sed are helpful here. But
would anything speak against allowing "" as third parameter in gawk's
gensub() with the meaning of "g" (which IMO makes more sense than
being interpreted as 1)? Well, okay; it would then behave differently
than before, but if that value was undefined anyway... - I'm not sure.

It would also be useful to have an extension to support replacements
counted from the rear of the string, say, gensub(/a/,"b",-1) . It's
cumbersome to work around it.

>
>> Your substr statement made me curious, and I noticed that it's
>> not directly possible to use negative indices to implement a
>> sliding substring window across a string:
>
> Not entirely sure what that means [...]

I meant to replace (or iterate over) - instead of the output quoted
below - a sequence: A, AB, ABC, BCD, CD, D.

Graphically: "SOME STRING"
Substring: |---|
|---|
...
|---|
|---|

The "|---|" is a "window" (substring) sliding across the data.

Reading through iterations like substr(var,-2,3) will also always start
from 1 if negative values are provided for the start index, so that an
iteration starting the loop in the negative range will provide strange
looking results. This is contrary to reading past the end of a string.

I presume that the long existing substr() behaviour will make that and
the corresponding gensub() behaviour immutable.

Janis

Kenny McCormack

unread,
Sep 5, 2015, 1:23:35 PM9/5/15
to
In article <msaupi$jas$1...@dont-email.me>,
Ed Morton <morto...@gmail.com> wrote:
>On 9/3/2015 7:01 AM, Kenny McCormack wrote:
>> In article <ms9c19$hus$1...@dont-email.me>,
>> Ed Morton <morto...@gmail.com> wrote:
>> ...
>>> Thanks for the info. This new warning is like if sed suddenly started issuing
>>> warnings if you wrote `s/foo/bar/` instead of `s/foo/bar/1`. Since others are
>>> seeing it too I'll follow up with bug...@gnu.org.
>>>
>>> Ed.
>>
>> It's not a bug. The documentation was always clear that the acceptable
>> values were either 1 or G/g. You were always in violation of that. Your
>> sed analogy is completely bogus (as expected).
>>
>> BTW, I lobbied for (and got) this better error reporting into gensub().
>>
>
>Great work Kenny, a valuable contribution to the community. Maybe you now could
>lobby for a warning message when the 2nd arg to substr() is out of range.

Incidentally, just for the benefit of those whose "Google-foo" may not be
quite up-to-snuff, here's the post where I noted that this bug had been
fixed. Note that the script in question, the very useful "comma" function,
was, in fact, written by Ed. So, we have Ed to thank for the finding and
fixing of this little bug. As George Carlin says in one of his routines,
"Thanks, Ed!"

--- Cut Here ---
In article <m51vh4$bcd$2...@news.xmission.com>,
Kenny McCormack <gaz...@shell.xmission.com> wrote:
>Some things I noticed about gensub()'s "how" arg:
>
>1) In 3.1.4, it seems like the "number other than 1" functionality is not
>implemented. I.e., it looks like if the value is anything other than a
>string matching /[Gg].*/, it is taken as being "1". In 3.1.8, it works
>correctly, so I assume it got fixed betwen those two versions. Can anyone
>confirm this for me? Note that although I have binaries (executables) for
>these old versions, I don't have man pages for them, so I can't check for
>myself. Actually, what I'm really asking is "Does the man page for 3.1.4
>document it as doing what it should be doing or does it document what it
>actually was doing (in that version) ?"
>
>2) In the current version, if you pass "0" for "how", you get a warning,
>but passing a negative value (or any other "garbage" string) generates no
>warning (but all such values are treated as if they were 1):
>
>$ enhance gawk4 '{ print gensub($1,$2,$3,$4) }'
>abc DEF 1 abcabcdefghi
>DEFabcdefghi
>abc DEF 2 abcabcdefghi
>abcDEFdefghi
>abc DEF 0 abcabcdefghi
>gawk4: cmd. line:1: (FILENAME=- FNR=3) warning: gensub: third argument of 0 treated as 1
>DEFabcdefghi
>abc DEF -1 abcabcdefghi
>DEFabcdefghi
>abc DEF garbage abcabcdefghi
>DEFDEFdefghi
>abc DEF rubbish abcabcdefghi
>DEFabcdefghi
>$

An update:

This seems to be fixed in 4.1.3. It complains about anything unexpected
(i.e., other than 1 or "G" or "g") in the third arg. See below:

('gawk4' is version 4.1.3, 'gawk41' is version 4.1.1)
% gawk4 'BEGIN { print gensub("abc","DEF",-1,"abcdefghikl")}'
gawk4: cmd. line:1: warning: gensub: third argument -1 treated as 1
DEFdefghikl
% gawk41 'BEGIN { print gensub("abc","DEF",-1,"abcdefghikl")}'
DEFdefghikl
%

Yey! It seems complaints here do get noticed (sometimes...)

Also note, I have the following script installed on various of my systems
to "comma-ize" numbers. I copied it from this board some time back:

--- Cut Here ---
function comma(num) {
if (num < 0)
return "-" comma(-num)
while (num != (num=gensub(/([0-9])([0-9][0-9][0-9])($|[,.])/,"\\1,\\2\\3","",num)));
return num
}
--- Cut Here ---

Note the use of "" as the 3rd arg to gensub(). I'm sure that was in the
original (i.e., whoever originally wrote and posted the script had that).
The funny thing is that this script worked just fine right up until I
compiled and started using 4.1.3, which as we've seen now generates a
warning for that usage. So, I had to change it to use 1 instead of "".

--
The scent of awk programmers is a lot more attractive to women than
the scent of perl programmers.

(Mike Brennan, quoted in the "GAWK" manual)

--- Cut Here ---

0 new messages