In article <of4rh8$1dq$
1...@dont-email.me>,
Ed Morton <
morto...@gmail.com> wrote:
>I often have to deal with fixed width fields like:
>
>$ cat file
>this is the winter ofour discontent
>wee sleekit cowerin' timrous beastie
>
>$ awk -v FIELDWIDTHS='11 10 999' -v OFS=, '{$1=$1}1' file
>this is the, winter of,our discontent
>wee sleekit, cowerin' , timrous beastie
>
>Notice the "999" at the end of FIELDWIDTHS that I'm using to mean "whatever else
>is on the line".
I take it that your primary "reason for posting" is that you don't like
sticking in that "999". I agree that it looks weird and there's always the
nagging fear that it isn't high enough (what if I hit an input line greater
than 1000-ish characters long?). I also agree that your idea of being able
to put in "*" is a good one (see below).
For what it is worth, the largest value you can put in there is the C
constant "INT_MAX", which on most modern/normal systems is 2**31-1
(2147483647). So, if you wanted, you could just stick that value in.
Note that I verified this value by trial-and-error, before looking it up in
the code (field.c).
With that all said, we come to the main points of this posting:
1) I went ahead and implemented your suggestion. The following 5 line
patch to field.c brings it home (this is in "diff -c" format):
(This is in the "set_FIELDWIDTHS" function)
--- Cut Here ---
*** field.c 2017-05-13 09:53:39.000000000 -0400
--- field.orig 2016-08-24 15:31:55.000000000 -0400
***************
*** 1159,1169 ****
/* Detect an invalid base-10 integer, a valid value that
is followed by something other than a blank or '\0',
or a value that is not in the range [1..INT_MAX]. */
- if (*scan == '*') {
- FIELDWIDTHS[i] = INT_MAX;
- scan++;
- goto skip2MyLoo;
- }
errno = 0;
tmp = strtoul(scan, &end, 10);
if (errno != 0
--- 1159,1164 ----
***************
*** 1175,1181 ****
}
FIELDWIDTHS[i] = tmp;
scan = end;
- skip2MyLoo:
/* Skip past any trailing blanks. */
while (is_blank(*scan)) {
++scan;
--- 1170,1175 ----
--- Cut Here ---
2) I understand the reluctance by you (and at least one other frequent
comp.lang.awk poster) to admit the existence of GAWK source code patches.
I do realize that it can be a maintenance problem - I experience it myself
given that I have several needed source code patches and several
machines/versions of the GAWK executable to maintain. Therefore, it
occurred to me that it would be kinda cool if GAWK could be re-engineered
to have a more "micro-kernel" type of architecture. The goal of this is
that we could replace any "core GAWK" function (e.g., "set_FIELDWIDTHS") at
runtime, without needing to recompile the core executable. This would make
it easier to supply an alternate version of the function without enduring
the maintenance nightmare involved in recompiling GAWK itself. All the
user would have to do is recompile "set_FIELDWIDTHS".
This would all be done through the magic of shared libraries. What I'm
imagining is something like:
1) Very small core program (basically, just a main() to call the rest).
2) libgawklib.so which contains everything else (including the existing
main(), which is in main.c).
3) A mechanism to "interpose" a user.so in-between the above two listed
pieces. Note that I put "interpose" in quotes for a reason; I do not want
that to imply that I am necessarily talking about the technical sense in
which that word is used, although that may one possible way of achieving
what is being discussed here.
Anyway, that's my idea. I may do some experimentation on this at some
point.
--
It's possible that leasing office space to a Starbucks is a greater liability
in today's GOP than is hitting your mother on the head with a hammer.