Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Should there be a FIELDWIDTHS wild-card?

103 views
Skip to first unread message

Ed Morton

unread,
May 12, 2017, 1:34:41 PM5/12/17
to
I often have to deal with fixed width fields like:

$ cat file
this is the winter ofour discontent
wee sleekit cowerin' timrous beastie

$ awk -v FIELDWIDTHS='11 10 999' -v OFS=, '{$1=$1}1' file
this is the, winter of,our discontent
wee sleekit, cowerin' , timrous beastie

Notice the "999" at the end of FIELDWIDTHS that I'm using to mean "whatever else
is on the line". Coming up with some arbitrarily large number that you hope will
be big enough for your data feels really cludgy. I could write code to sum up
the numbers in FIELDWIDTHS and then use substr($0,length()-that value) but
that's getting kinda ridiculous for such a conceptually trivial problem.

Is there a concise way to do what I want?

If not, would a "*" or "0" or something to mean "whatever else is left" when
used at the end of FIELDWIDTHS make sense?

Ed.

Janis Papanagnou

unread,
May 12, 2017, 2:36:27 PM5/12/17
to
On 12.05.2017 19:34, Ed Morton wrote:
> I often have to deal with fixed width fields like:
>
> $ cat file
> this is the winter ofour discontent
> wee sleekit cowerin' timrous beastie
>
> $ awk -v FIELDWIDTHS='11 10 999' -v OFS=, '{$1=$1}1' file
> this is the, winter of,our discontent
> wee sleekit, cowerin' , timrous beastie
>
> Notice the "999" at the end of FIELDWIDTHS that I'm using to mean "whatever
> else is on the line". Coming up with some arbitrarily large number that you
> hope will be big enough for your data feels really cludgy. I could write code
> to sum up the numbers in FIELDWIDTHS and then use substr($0,length()-that
> value) but that's getting kinda ridiculous for such a conceptually trivial
> problem.
>
> Is there a concise way to do what I want?

Not that I know of.

>
> If not, would a "*" or "0" or something to mean "whatever else is left" when
> used at the end of FIELDWIDTHS make sense?

IMO, yes. I myself had the same issue in the past (and implemented the same
cludgy solution). From other programming languages' idioms my first thought
would have been to use "-1" (for "unlimited") but I also wouldn't mind if 0
or * would make that possible. Any wildcard used will require to define how
that should be handled if placed somewhere in the mid of the width-spec but
that should not be a problem; an error could be reported.

I also thought about whether it would make sense to allow 0-values in the
mid of the width-spec to not assign to that specific field; frankly I'm not
quite sure it may be useful in practice (but it certainly wouldn't hurt to
support it - folks are creative with uses for features). In this case a
special value of * or -1 would probably be better for "the rest" feature.

Janis

>
> Ed.

Janis Papanagnou

unread,
May 12, 2017, 2:47:17 PM5/12/17
to
Just after pressing the send-button it occurred to me that it might also be a
rare but possibly useful extension to define fields "assigning" from the rear

FIELDWIDTHS = { "1 0 1 *" }
"ABCDE" -> $1="A", $3="B", $4="CDE"

FIELDWIDTHS = { "* 1 0 1" }
"ABCDE" -> $1="ABC", $2="D", $4="E"

or even a wildcard in the mid

FIELDWIDTHS = { "1 * 0 1" }
"ABCDE" -> $1="A", $2="BCD", $4="E"

but then we're maybe getting (with the first case) close to something like

scanf("%1s%1s%s", $1, $3, $4)


Janis

>
> Janis
>
>>
>> Ed.
>

Kaz Kylheku

unread,
May 12, 2017, 3:11:41 PM5/12/17
to
I saw this flaw when I was looking at the specification for FW, so
I implemented fw slightly differently in the awk macro. Any left over
material after the last fixed-width field is assigned to an extra field.

If the last field spans exactly to the end of the input, then no extra
field is generated.

$ txr -e "(awk (:set fw '(11 10) ofs \",\") (t (set f f) (prn)))"
this is the winter ofour discontent
this is the, winter of,our discontent
wee sleekit cowerin' timrous beastie
wee sleekit, cowerin' , timrous beastie

However, a subtlety/quirk is that the "nf" variable doesn't count this
field:

$ txr -e "(awk (:set fw '(5 5)) (t (prn nf (length f))))"

0 0
aaa
1 1
aaaaaa
2 2
asasfasdf
2 2
asdfasdfasdf
2 3

The length of the field list f is 3, of course, but nf tops out at 2.
Not sure if that's a good thing or a bad thing.

OTOH, in GNU Awk, NF appears to be "tone deaf" under FIELDWIDTHS. It reports as
zero if the record is empty, otherwise its value corresponds to the number of
fields that are expected by the fixed-width extraction, rather than that are
actually in the data:

$ gawk -v 'FIELDWIDTHS=5 5' '{ print NF }'

0
aa
2
aaaaaa
2
aaaaaaaaaaa
2
aaaaaaaaaaaaaaaaaaaa
2

This is suboptimal because if the script doesn't care about those short
cases and just always wants two fields, it can just perform NF = 2.
It's also inconsistent with the ordinary field extraction semantics.

Markus Gnam

unread,
May 12, 2017, 7:03:32 PM5/12/17
to
This function should suit your needs:

function fieldwiths(line, n, c, i) {
delete cols
c = 1

for (i=1; i<=n; i++) {
cols[i] = field[i] ? substr(line, c, field[i]) : substr(line, c)
c += field[i]
}
}

BEGIN {
OFS = ","

field[1] = 11
field[2] = 10
field[3] = 0

n = length(field)
}

{
fieldwiths($0, n)
for (i=1; i<=n; i++)
printf("%s%s", cols[i], (i<n ? OFS : "\n"))
}

HTH,
Markus.

Ed Morton

unread,
May 12, 2017, 10:28:38 PM5/12/17
to
Thanks Markus but I'm looking for a concise solution to the problem and best I
can come up with so far is something like:

$ awk -v fw="11 10" -v OFS=, '{FIELDWIDTHS=fw" "length(); $0=$0; $1=$1}1' file
this is the, winter of,our discontent
wee sleekit, cowerin' , timrous beastie

Which, though it'll work and is brief, it's hardly clear and I wouldn't bother
writing it, I'd just stick with FIELDWIDTHS="11 10 999" or similar.

Ed.

Kenny McCormack

unread,
May 13, 2017, 2:06:29 PM5/13/17
to
In article <of4rh8$1dq$1...@dont-email.me>,
Ed Morton <morto...@gmail.com> wrote:
>I often have to deal with fixed width fields like:
>
>$ cat file
>this is the winter ofour discontent
>wee sleekit cowerin' timrous beastie
>
>$ awk -v FIELDWIDTHS='11 10 999' -v OFS=, '{$1=$1}1' file
>this is the, winter of,our discontent
>wee sleekit, cowerin' , timrous beastie
>
>Notice the "999" at the end of FIELDWIDTHS that I'm using to mean "whatever else
>is on the line".

I take it that your primary "reason for posting" is that you don't like
sticking in that "999". I agree that it looks weird and there's always the
nagging fear that it isn't high enough (what if I hit an input line greater
than 1000-ish characters long?). I also agree that your idea of being able
to put in "*" is a good one (see below).

For what it is worth, the largest value you can put in there is the C
constant "INT_MAX", which on most modern/normal systems is 2**31-1
(2147483647). So, if you wanted, you could just stick that value in.
Note that I verified this value by trial-and-error, before looking it up in
the code (field.c).

With that all said, we come to the main points of this posting:

1) I went ahead and implemented your suggestion. The following 5 line
patch to field.c brings it home (this is in "diff -c" format):

(This is in the "set_FIELDWIDTHS" function)
--- Cut Here ---
*** field.c 2017-05-13 09:53:39.000000000 -0400
--- field.orig 2016-08-24 15:31:55.000000000 -0400
***************
*** 1159,1169 ****
/* Detect an invalid base-10 integer, a valid value that
is followed by something other than a blank or '\0',
or a value that is not in the range [1..INT_MAX]. */
- if (*scan == '*') {
- FIELDWIDTHS[i] = INT_MAX;
- scan++;
- goto skip2MyLoo;
- }
errno = 0;
tmp = strtoul(scan, &end, 10);
if (errno != 0
--- 1159,1164 ----
***************
*** 1175,1181 ****
}
FIELDWIDTHS[i] = tmp;
scan = end;
- skip2MyLoo:
/* Skip past any trailing blanks. */
while (is_blank(*scan)) {
++scan;
--- 1170,1175 ----
--- Cut Here ---

2) I understand the reluctance by you (and at least one other frequent
comp.lang.awk poster) to admit the existence of GAWK source code patches.
I do realize that it can be a maintenance problem - I experience it myself
given that I have several needed source code patches and several
machines/versions of the GAWK executable to maintain. Therefore, it
occurred to me that it would be kinda cool if GAWK could be re-engineered
to have a more "micro-kernel" type of architecture. The goal of this is
that we could replace any "core GAWK" function (e.g., "set_FIELDWIDTHS") at
runtime, without needing to recompile the core executable. This would make
it easier to supply an alternate version of the function without enduring
the maintenance nightmare involved in recompiling GAWK itself. All the
user would have to do is recompile "set_FIELDWIDTHS".

This would all be done through the magic of shared libraries. What I'm
imagining is something like:

1) Very small core program (basically, just a main() to call the rest).
2) libgawklib.so which contains everything else (including the existing
main(), which is in main.c).
3) A mechanism to "interpose" a user.so in-between the above two listed
pieces. Note that I put "interpose" in quotes for a reason; I do not want
that to imply that I am necessarily talking about the technical sense in
which that word is used, although that may one possible way of achieving
what is being discussed here.

Anyway, that's my idea. I may do some experimentation on this at some
point.

--
It's possible that leasing office space to a Starbucks is a greater liability
in today's GOP than is hitting your mother on the head with a hammer.

Bruce Horrocks

unread,
May 15, 2017, 5:37:02 PM5/15/17
to
For me it's a non-problem. I've used GAWK a number of times to help
migrate fixed width data from legacy systems into more modern systems
and the only value that matters is 1, i.e. FIELDWIDTHS="11 10 1".

Why? Because if the last field is anything other than an empty string
you have a corrupt file. In which case I would just print $NR, $0 to a
logfile, so 1, 9 or 999 makes no difference.

--
Bruce Horrocks
Surrey
England
(bruce at scorecrow dot com)

Kaz Kylheku

unread,
May 15, 2017, 5:42:27 PM5/15/17
to
On 2017-05-15, Bruce Horrocks <07....@scorecrow.com> wrote:
> On 12/05/2017 18:34, Ed Morton wrote:
>> If not, would a "*" or "0" or something to mean "whatever else is left"
>> when used at the end of FIELDWIDTHS make sense?
>
> For me it's a non-problem. I've used GAWK a number of times to help
> migrate fixed width data from legacy systems into more modern systems
> and the only value that matters is 1, i.e. FIELDWIDTHS="11 10 1".
>
> Why? Because if the last field is anything other than an empty string
> you have a corrupt file.

Says who? No file with fixed-width fields can correctly have a "ragged
right" column that is variable up to some unspecified length?

Ed Morton

unread,
May 15, 2017, 6:40:11 PM5/15/17
to
On 5/15/2017 4:37 PM, Bruce Horrocks wrote:
> On 12/05/2017 18:34, Ed Morton wrote:
>> I often have to deal with fixed width fields like:
>>
>> $ cat file
>> this is the winter ofour discontent
>> wee sleekit cowerin' timrous beastie
>>
>> $ awk -v FIELDWIDTHS='11 10 999' -v OFS=, '{$1=$1}1' file
>> this is the, winter of,our discontent
>> wee sleekit, cowerin' , timrous beastie
>>
>> Notice the "999" at the end of FIELDWIDTHS that I'm using to mean
>> "whatever else is on the line". Coming up with some arbitrarily large
>> number that you hope will be big enough for your data feels really
>> cludgy. I could write code to sum up the numbers in FIELDWIDTHS and then
>> use substr($0,length()-that value) but that's getting kinda ridiculous
>> for such a conceptually trivial problem.
>>
>> Is there a concise way to do what I want?
>>
>> If not, would a "*" or "0" or something to mean "whatever else is left"
>> when used at the end of FIELDWIDTHS make sense?
>
> For me it's a non-problem. I've used GAWK a number of times to help migrate
> fixed width data from legacy systems into more modern systems and the only value
> that matters is 1, i.e. FIELDWIDTHS="11 10 1".
>
> Why? Because if the last field is anything other than an empty string you have a
> corrupt file.

What makes you say that? If I have an input file with 1 line containing
"a b foo" how does the last field being "foo" make it a corrupt file?

Ed.

Bruce Horrocks

unread,
May 16, 2017, 6:45:09 PM5/16/17
to
In my book, if the last field is variable length (i.e. terminated by the
end of line or record separator) rather than space padded, then it's not
a fixed width file.

Bruce Horrocks

unread,
May 16, 2017, 7:02:15 PM5/16/17
to
On 15/05/2017 23:39, Ed Morton wrote:
>>
>> For me it's a non-problem. I've used GAWK a number of times to help migrate
>>
>> fixed width data from legacy systems into more modern systems and the only value
>>
>> that matters is 1, i.e. FIELDWIDTHS="11 10 1".
>>
>> Why? Because if the last field is anything other than an empty string you have a
>>
>> corrupt file.
>
> What makes you say that? If I have an input file with 1 line containing
> "a b foo" how does the last field being "foo" make it a corrupt file?

I don't understand the point you are making with your example. If
FIELDWIDTHS = "11 10 1" then "a b foo" is corrupt because the first
field is truncated and the second is missing.

Hopefully I can make myself clearer. Imagine a fixed field width file
containing

aabb
ccdd
eeff

which can processed using FIELDWIDTHS="2 2" and all is well.

Now if this file becomes corrupted to

aabb
cfoocdd
eeff

perhaps by someone manually editing it to 'fix' a record prior to
importing into a new system, then FIELDWIDTHS="2 2" won't help because
the second record will just give $1="cf", $2="oo" and the "cdd" is
silently ignored.

If, instead, you use FIELDWIDTHS="2 2 1" then the "c" of "cdd" goes into
$3 and you know that there was a problem. For rows 1 and 3, $3 is blank
which tells you that there wasn't a problem.

Kenny McCormack

unread,
May 16, 2017, 7:08:42 PM5/16/17
to
In article <0d7cd596-47e6-995c...@scorecrow.com>,
Bruce Horrocks <07....@scorecrow.com> wrote:
...
>> Says who? No file with fixed-width fields can correctly have a "ragged
>> right" column that is variable up to some unspecified length?
>
>In my book, if the last field is variable length (i.e. terminated by the
>end of line or record separator) rather than space padded, then it's not
>a fixed width file.

An example of what the other posters are talking about would be if you have
a record with a bunch of fixed width fields ("normal" data), followed by a
arbitrary length comment field at the end. Then you would use something
like:

FIELDWIDTHS="10 20 30 25 25 30 99999"

to read all the fields, assuming that the last field is free form text and
could be any length.

It's not that far-fetched. In fact, it is quite reasonable.

You can say that (to you) it is not a "fixed width file", but it is still
reasonable to parse the file with FIELDWIDTHS.

--
"Insisting on perfect safety is for people who don't have the balls to live
in the real world."

- Mary Shafer, NASA Ames Dryden -

Kaz Kylheku

unread,
May 16, 2017, 7:27:21 PM5/16/17
to
So what?

Data structures basedon this pattern: have fixed parts and variable parts:

struct log_msg {
time_t stamp;
char data[1]; /* dynamic */
}

All sorts of stuff fits this. For instance, network datagrams (IP and
other). You have header with some fixed fields and then a variable
"payload".

Data of this sort can sometimes be found rendered into textual
representations that have fixed-width columns, and then a variable part
on the right.

If the fixed part is badly behaved, such as featuring fields that have
a variable number of spaces, or fields that can be blank entirely,
then you can use fixed width extraction.

Kaz Kylheku

unread,
May 16, 2017, 7:31:16 PM5/16/17
to
On 2017-05-16, Bruce Horrocks <07....@scorecrow.com> wrote:
> On 15/05/2017 23:39, Ed Morton wrote:
>>>
>>> For me it's a non-problem. I've used GAWK a number of times to help migrate
>>>
>>> fixed width data from legacy systems into more modern systems and the only value
>>>
>>> that matters is 1, i.e. FIELDWIDTHS="11 10 1".
>>>
>>> Why? Because if the last field is anything other than an empty string you have a
>>>
>>> corrupt file.
>>
>> What makes you say that? If I have an input file with 1 line containing
>> "a b foo" how does the last field being "foo" make it a corrupt file?
>
> I don't understand the point you are making with your example. If
> FIELDWIDTHS = "11 10 1" then "a b foo" is corrupt because the first
> field is truncated and the second is missing.

People who do this sort of shit can't just say "the file is corrupt;
I'm going home; I will bill you less today (just for issuing the
preceding valuable opinion)."


>
> Hopefully I can make myself clearer. Imagine a fixed field width file
> containing
>
> aabb
> ccdd
> eeff

Imagine a file containing

aa b
ccdddextra
e ff
g hhhh

Random holes in the leading four-character part from which four
fields must be extracted, and then a variable part.

This is where FIELDWIDTHS="1 1 1 1 99999" would be useful.

It's not a file of *fixed records*, clearly, but the records have fixed
fields.

Ed Morton

unread,
May 16, 2017, 11:19:10 PM5/16/17
to
OK, I understand where you're coming from now, your point is that if the last
field on each line is actually variable width then the line does not consist
entirely of fixed-width fields and so you don't have fixed-width input.

While I appreciate the theory and I'm sure that's true in some specific cases as
you've shown above, for the rest of us we daily have to deal with input files like:

John Shriely Smith Carpenter 1215 Surrey Ln
Bill Bottomwiggle Dancer 19 5th Avenue Gardens By The Sea

where you can tell from that sample what the width of the first 2 fields are but
you cannot predict what the max width of the 3rd field is from that or any other
sample. Today we deal with that by setting the width of the 3rd field to some
number (e.g. 999) that we hope will be large enough to accommodate whatever
input we get next, but of course that is just a kludge which is why some kind of
wild card character to represent "whatever is left" would be useful.

Ed.

Janis Papanagnou

unread,
May 17, 2017, 2:57:39 AM5/17/17
to
Yes, that is not a fixed width field with fixed record lengths as we are used
to (or rather have been used to) from historic/legacy mainframe data records.
Those mainframe records may probably even have been the reason to introduce
FIELDWIDTHS in the first place in awk (I can't tell), and this would explain
why (at that time) there was no need (and obviously no prospect) to foresee
that there are other useful applications for that feature in text processing.

Currently we find all sorts of data where the fieldwidths feature is useful;
date(10) key(12) specifier(2) filename-with-spaces(N)
FIELDWIDTHS = "10 12 2 *"
or
description(N) date-from(10) date-to(10) key(12) class(4) specifier(2)
FIELDWIDTHS = "* 10 10 12 4 2"

These are arbitrary (made up) examples (and we can discuss whether the data
structures shouldn't be defined in a better way), but I've seen a lot of such
or similar data where using FIELDWIDTHS is (or would be) a straigthforward
and appropriate way.

Janis

Kaz Kylheku

unread,
May 17, 2017, 8:47:00 AM5/17/17
to
On 2017-05-17, Janis Papanagnou <janis_pa...@hotmail.com> wrote:
> On 17.05.2017 00:45, Bruce Horrocks wrote:
>> On 15/05/2017 22:42, Kaz Kylheku wrote:
>>> On 2017-05-15, Bruce Horrocks <07....@scorecrow.com> wrote:
>>>> On 12/05/2017 18:34, Ed Morton wrote:
>>>>> If not, would a "*" or "0" or something to mean "whatever else is left"
>>>>> when used at the end of FIELDWIDTHS make sense?
>>>>
>>>> For me it's a non-problem. I've used GAWK a number of times to help
>>>> migrate fixed width data from legacy systems into more modern systems
>>>> and the only value that matters is 1, i.e. FIELDWIDTHS="11 10 1".
>>>>
>>>> Why? Because if the last field is anything other than an empty string
>>>> you have a corrupt file.
>>>
>>> Says who? No file with fixed-width fields can correctly have a "ragged
>>> right" column that is variable up to some unspecified length?
>>
>> In my book, if the last field is variable length (i.e. terminated by the end
>> of line or record separator) rather than space padded, then it's not a fixed
>> width file.
>
> Yes, that is not a fixed width field with fixed record lengths as we are used
> to (or rather have been used to) from historic/legacy mainframe data records.

Like, oh, utmp and wtmp. :)

Ben Bacarisse

unread,
May 17, 2017, 9:16:25 AM5/17/17
to
Ed Morton <morto...@gmail.com> writes:

> On 5/16/2017 6:02 PM, Bruce Horrocks wrote:
<snip>
>> Hopefully I can make myself clearer. Imagine a fixed field width file containing
>>
>> aabb
>> ccdd
>> eeff
>>
>> which can processed using FIELDWIDTHS="2 2" and all is well.
>>
>> Now if this file becomes corrupted to
>>
>> aabb
>> cfoocdd
>> eeff
>>
>> perhaps by someone manually editing it to 'fix' a record prior to importing into
>> a new system, then FIELDWIDTHS="2 2" won't help because the second record will
>> just give $1="cf", $2="oo" and the "cdd" is silently ignored.
>>
>> If, instead, you use FIELDWIDTHS="2 2 1" then the "c" of "cdd" goes into $3 and
>> you know that there was a problem. For rows 1 and 3, $3 is blank which tells you
>> that there wasn't a problem.
>
> OK, I understand where you're coming from now, your point is that if
> the last field on each line is actually variable width then the line
> does not consist entirely of fixed-width fields and so you don't have
> fixed-width input.
>
> While I appreciate the theory and I'm sure that's true in some
> specific cases as you've shown above, for the rest of us we daily have
> to deal with input files like:
>
> John Shriely Smith Carpenter 1215 Surrey Ln
> Bill Bottomwiggle Dancer 19 5th Avenue Gardens By The Sea

Another point is that when the file is corrupted in the sense being used
by Bruce you might want to write an awk script to fix it. So even in
the world of fixed-width fields in fix-width records you might need what
you are proposing. (I know there are file systems where these sorts of
corrupted files make no sense but there must be some where it does).

<snip>
--
Ben.

Janis Papanagnou

unread,
May 17, 2017, 9:32:24 AM5/17/17
to
On 17.05.2017 14:46, Kaz Kylheku wrote:
[...]
>>
>> Yes, that is not a fixed width field with fixed record lengths as we are used
>> to (or rather have been used to) from historic/legacy mainframe data records.
>
> Like, oh, utmp and wtmp. :)

Yes, but those are binary data. You can also find such binary data in (e.g.
IETF) protocol specifications. The historic mainframe structures were very
often text structures, similarily to what we traditionally process with awk.
Remember (e.g.) Fortran program records on 80 column Hollerith punch cards?
(https://upload.wikimedia.org/wikipedia/commons/8/84/Hollerith_card.jpg)

Janis

Bruce Horrocks

unread,
May 18, 2017, 7:54:29 PM5/18/17
to
On 17/05/2017 04:18, Ed Morton wrote:
> OK, I understand where you're coming from now, your point is that if the
> last field on each line is actually variable width then the line does
> not consist
> entirely of fixed-width fields and so you don't have fixed-width input.
>
> While I appreciate the theory and I'm sure that's true in some specific
> cases as
> you've shown above, for the rest of us we daily have to deal with input files like:
>
>
> John Shriely Smith Carpenter 1215 Surrey Ln
> Bill Bottomwiggle Dancer 19 5th Avenue Gardens By The Sea
>
> where you can tell from that sample what the width of the first 2 fields
> are but you cannot predict what the max width of the 3rd field is from
> that or any other sample. Today we deal with that by setting the width
> of the 3rd field to some number (e.g. 999) that we hope will be large
> enough to accommodate whatever input we get next, but of course that is
> just a kludge which is why some kind of
> wild card character to represent "whatever is left" would be useful.

And likewise, I see what you mean now. I don't often come across
trailing variable fields like this because it forces programs that read
the file to employ a sequential scan, something which was avoided as
much as possible because it was so slow.

I'm not sure that FIELDWIDTHS="10 11 *" versus FIELDWIDTHS="10 11 9999"
makes much difference - the latter is idiomatic enough to alert any
programmer picking-up the code later.

Incidentally, what do you propose should happen if FIELDWIDTHS="*" is used?

$0 set to the whole record but $1, $2 etc left empty? Or $1 also set to
the whole record?

If there is to be a change to FIELDWIDTHS then it might be worth
incorporating another change as well to cope with multiple record types
within a file. I'll start another thread.

Ed Morton

unread,
May 18, 2017, 10:49:49 PM5/18/17
to
And what if in your next file you get a final field that's 10,000 characters
long? We could keep throwing 9s at it but in reality a "*" would be clearer and
simpler.

> Incidentally, what do you propose should happen if FIELDWIDTHS="*" is used?
>
> $0 set to the whole record but $1, $2 etc left empty? Or $1 also set to the
> whole record?

$1 and $0 would both contain the whole record and NF would be 1.

Ed.

Ed Morton

unread,
May 22, 2017, 11:07:20 PM5/22/17
to
I just heard from Arnold and a "*" will be allowed as a wild-card for the final
field width (only) in the next gawk release.

Regards,

Ed.

Kenny McCormack

unread,
May 23, 2017, 4:25:40 AM5/23/17
to
In article <og08qq$p6e$1...@dont-email.me>,
Ed Morton <morto...@gmail.com> wrote:
...
>I just heard from Arnold and a "*" will be allowed as a wild-card for the
>final field width (only) in the next gawk release.

Great! I assume he will be using my patch (posted here about a week ago).

Well done!

P.S. (JIC it isn't obvious) Yes, if I were the type to use "smileys",
there'd be "smileys" all over this post.

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/ModernXtian

Bruce Horrocks

unread,
May 23, 2017, 3:18:38 PM5/23/17
to
On 23/05/2017 09:25, Kenny McCormack wrote:
> In article <og08qq$p6e$1...@dont-email.me>,
> Ed Morton <morto...@gmail.com> wrote:
> ...
>> I just heard from Arnold and a "*" will be allowed as a wild-card for the
>> final field width (only) in the next gawk release.
>
> Great! I assume he will be using my patch (posted here about a week ago).
>
> Well done!

And your reward (both of you) is to write a test case and a
documentation 'patch'. :-)
0 new messages