Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Compiling AWK

1,306 views
Skip to first unread message

Dwain

unread,
Mar 31, 1999, 3:00:00 AM3/31/99
to
I have a copy of AWK for Dos. It comes with AWKC witch I think is compiling
AWK script files?

How do use this? Is there any other to compile AWK?

Thanks
Dwain

Jim Monty

unread,
Apr 1, 1999, 3:00:00 AM4/1/99
to

You're using MKS awk from the MKS Toolkit. I've appended the first page
of the man page awkc(1) to this message.

--
Jim Monty
mo...@primenet.com
http://www.primenet.com/~monty/
Tempe, Arizona USA


awkc(1) MKS Toolkit awkc(1)
-------------------------------------------------------------------------------

NAME
awkc -- compile AWK programs into executables

SYNOPSIS
awkc [-F ere] [-f prog] -o awkprgm [-O] [-P proto] [-v var=value ...]
[program]

DESCRIPTION
awkc is a compiler/linker for awk programs. awkc links the compiled awk
code to a prototype module which makes the program executable. You spec-
ify the name of the executable program. For a description of awk, see
awk(1). awkc should only be used after the given program has been suc-
cessfully tested with awk.

Options

-F ere
compiles to use the given extended regular expression ere to delimit
input fields.

-f prog
reads the awk program from the file prog.

-O optimizes the awk program's code by merging concatenated string liter-
als (constants), and reducing constant integer expressions to their
values.

-o awkpgm
writes the awk program to the file awkpgm. This option is always
required. On DOS, OS/2, NT, and Windows 95 systems, awkpgm must have
the extension .exe.

-P proto
replaces the prototype module with the specified prototype file,
proto. The default prototype file is $ROOTDIR/etc/awkrun. sys, where
sys is dos, os2, nt, or nta (as appropriate).

-v var=value
compiles the initial value of var to the given value.

EXAMPLES
The commands

awkc -o cmd.exe '{print NR ":" $0}'
cmd.exe input1

are equivalent to the single command:

awk '{print NR ":" $0}' input1

This example prints the contents of the file input1 with line numbers
prepended to each line.

Copyright (c) 1985,1998 Mortice Kern Systems Inc.
Page 1
-----------------------------------------------------------------------------

Kenny McCormack

unread,
Apr 1, 1999, 3:00:00 AM4/1/99
to
In article <7dvunu$ah9$1...@nnrp02.primenet.com>,

Jim Monty <mo...@primenet.com> wrote:
>Dwain <Dw...@netset.com> wrote:
>> I have a copy of AWK for Dos. It comes with AWKC witch I think is
>> compiling AWK script files?
>>
>> How do use this? Is there any other to compile AWK?
>
>You're using MKS awk from the MKS Toolkit. I've appended the first page
>of the man page awkc(1) to this message.

FWIW, the awk compiler in Thompson AWK is also called AWKC.EXE (*)

(*) That is, the real mode DOS version is. There is also AWKCP for OS/2 and
AWKCW for Windoze.

BTW, has anyone done a feature comparison between MKS AWK and Thompson AWK?
I'm a long time T-AWK user and am quite happy with its feature set, but I
got to wondering how the MKS offering compares.

Hrl...@aol.com

unread,
Apr 1, 1999, 3:00:00 AM4/1/99
to
In article <7e08b2$5ma$1...@yin.interaccess.com>,

MKS awk includes a built in ord() function that gives the ASCII value of the
first character of its argument. Its symbol table is visible as the array
SYMTAB. It includes the ability to use \1 . . \9 in substitution patterns as
aliases for parenthesized subexpressions in the search pattern. In other words

echo "The quick brown fox jumped over the lazy dog." | awk '{ sub(/(fox)(.*)
(dog)/, "\3\2\1"); print }'

gives output: The quick brown dog jumped over the lazy fox.

On the negative side, there's a bug in MKS awk: strings with positive lengths
aren't evaluated as TRUE if they're the only statement used as the pattern in
a pattern-action pair.

echo "should display this" | awk '"test it"'

writes nothing to standard output, but both of the following

echo "should display this" | awk '"test it"&&1'
echo "should display this" | awk '{ if ("test it") print }'

give output: should display this

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own

pcanagno...@gmail.com

unread,
Apr 9, 2014, 8:41:42 PM4/9/14
to
On Thursday, April 1, 1999 4:00:00 AM UTC-4, Kenny McCormack wrote:
> In article <7dvunu$ah9$1...@nnrp02.primenet.com>,
> Jim Monty <mo...@primenet.com> wrote:
> >Dwain <Dw...@netset.com> wrote:
> >> I have a copy of AWK for Dos. It comes with AWKC witch I think is
> >> compiling AWK script files?
> >>
> >> How do use this? Is there any other to compile AWK?
> >
> >You're using MKS awk from the MKS Toolkit. I've appended the first page
> >of the man page awkc(1) to this message.
>
> FWIW, the awk compiler in Thompson AWK is also called AWKC.EXE (*)
>
> (*) That is, the real mode DOS version is. There is also AWKCP for OS/2 and
> AWKCW for Windoze.
>
> BTW, has anyone done a feature comparison between MKS AWK and Thompson AWK?
> I'm a long time T-AWK user and am quite happy with its feature set, but I
> got to wondering how the MKS offering compares.

A friend of mine and I have produced version 6 of TAWK, enhanced from the code base of version 5. Send me an email if you would like a copy.

~~ Paul

Janis Papanagnou

unread,
Apr 9, 2014, 9:06:15 PM4/9/14
to
On 10.04.2014 02:41, pcanagno...@gmail.com wrote:
> On Thursday, April 1, 1999 4:00:00 AM UTC-4, Kenny McCormack wrote:
^^^^^^^^^^^^^
>> [...]
>
> A friend of mine and I have produced version 6 of TAWK, enhanced from the code base of version 5. Send me an email if you would like a copy.

Finally, after 15 years, we have a solution! :-)

Janis

>
> ~~ Paul
>

Kenny McCormack

unread,
Apr 9, 2014, 9:55:53 PM4/9/14
to
In article <li4qq7$7m2$1...@news.m-online.net>,
Yes. I am flabbergasted to read this.

I'd like to know more about/from how/why/when/who this new version comes.

Paul, please clarify!

For example, how did you get hold of the code? Is Pat back from the wilds?

--
Religion is regarded by the common people as true,
by the wise as foolish,
and by the rulers as useful.

(Seneca the Younger, 65 AD)

Anton Treuenfels

unread,
Apr 9, 2014, 11:15:24 PM4/9/14
to

"Kenny McCormack" <gaz...@shell.xmission.com> wrote in message
news:li4tn9$a86$1...@news.xmission.com...
>>> A friend of mine and I have produced version 6 of TAWK, enhanced from
>>> the code
>>base of version 5. Send me an email if you would like a copy.
>>
> I'd like to know more about/from how/why/when/who this new version comes.

Ditto!

- Anton Treuenfels

Aharon Robbins

unread,
Apr 10, 2014, 7:00:58 AM4/10/14
to
In article <f7e7e77b-ea48-4b0b...@googlegroups.com>,
<pcanagno...@gmail.com> wrote:
>A friend of mine and I have produced version 6 of TAWK, enhanced from
>the code base of version 5. Send me an email if you would like a copy.

If this is for real, please post more details.

Otherwise, it sounds like an April 1 posting that hit the net a
week and a half late.

Thanks,

Arnold
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon
D.N. Shimshon 9978500 ISRAEL

pcanagno...@gmail.com

unread,
Apr 10, 2014, 8:46:19 PM4/10/14
to
Pat Thompson was kind enough to give me the source code for TAWK. My plan was to design a programming language that compiled into TAWK, but to do so I needed some changes and speed improvements. My friend and I have been working on TAWK V6 for a few years. I now write all my business and personal code in Hearsay.

Here is an overview of the enhancements in TAWK v6.4, which runs only on Windows.

A new -XPATH command option.

The path buffer is enlarged to 1024 bytes.

Two new expression operators, unary * and binary @.

The filesize(), ftell(), and fseek() functions are enhanced to support files larger than 2 GB.

The timetab() function is enhanced to return the timezone and daylight savings time flag.

There are about nine new built-in constants, including PI.

There are about 40 new built-in functions (requires -V6FUNCTIONS option).

There are five semantic changes (requires -ALTSEMANTICS option) pertaining to fetching nonexistent keys, arithmetic on nil, comparison to nil, promotion of arguments to tables, and typeof() external data.

Various optimizations, plus the "free" optimization due to compiling TAWK with a modern C compiler.

~~ Paul

Ed Morton

unread,
Apr 10, 2014, 8:54:54 PM4/10/14
to
Please keep this to yourself. The last thing we need is yet another awk version
with it's own extensions and caveats to have to consider. If you want to
contribute to the awk user community, just contribute some enhancements to gawk.

One awk to rule them all... Oh wait, that doesn't end well... :-)

Ed.

Kenny McCormack

unread,
Apr 10, 2014, 8:58:55 PM4/10/14
to
In article <li7eh3$mt9$1...@dont-email.me>,
Ed Morton <morto...@gmail.com> wrote:
...
>Please DON'T keep this to yourself. The last thing we need is yet another
>awk version with it's own extensions and caveats to have to consider. If
>you want to contribute to the awk user community, just contribute some
>enhancements to gawk. NOT!

-1

Paul, I'm very interested in what you've got. I think you should just post
it somewhere and let us take a look at it. It should be open-source. I
assume it is...

--
For instance, Standard C says that nearly all extensions to C are prohibited. How
silly! GCC implements many extensions, some of which were later adopted as part of
the standard. If you want these constructs to give an error message as
“required” by the standard, you must specify ‘--pedantic’, which was
implemented only so that we can say “GCC is a 100% implementation of the
standard”, not because there is any reason to actually use it.

Anton Treuenfels

unread,
Apr 10, 2014, 11:53:09 PM4/10/14
to

<pcanagno...@gmail.com> wrote in message
news:4809dd8f-1332-40d0...@googlegroups.com...
Pat Thompson was kind enough to give me the source code for TAWK. My plan
was to design a programming language that compiled into TAWK, but to do so I
needed some changes and speed improvements. My friend and I have been
working on TAWK V6 for a few years. I now write all my business and personal
code in Hearsay.

Here is an overview of the enhancements in TAWK v6.4, which runs only on
Windows.

[...]

~~ Paul
--------------------------

Did you pay any attention to V5 bugs? There aren't many, but I know of a
few.

One has to do with arrays being sorted in the wrong order (there may be a
message thread archived here about that - years and years old by now).
Actually it affects both V4 and V5.

V5 has a habit of trying to guess function argument types and then promoting
them to that type no matter what they actually are. This conflicts with a
tendency of mine to take V4's laxity regarding this issue to pass different
arguments types to the same function, then figure out inside the function
what to do based on those types. Perhaps not so much a bug as a "feature".

There is also V5's int() function, which is unable to convert the minimum
signed integer value (0x80000000) from a double to an integer (V4 can).

Not a bug, but it would be nice if the regex() function returned a null of
some sort if it's given something that can't be converted to a regular
expression, rather than aborting the whole program with an error message.

- Anton Treuenfels



pcanagno...@gmail.com

unread,
Apr 11, 2014, 10:35:04 AM4/11/14
to
On Thursday, April 10, 2014 8:58:55 PM UTC-4, Kenny McCormack wrote:
> Paul, I'm very interested in what you've got. I think you should just post
> it somewhere and let us take a look at it. It should be open-source. I
> assume it is...

No, it's still copyrighted to Thompson Automation. I didn't feel it was my right to change the copyright. Let's see if I can include the V6 release notes in the next post.

You can email me for a copy of the V6 files.

~~ Paul

pcanagno...@gmail.com

unread,
Apr 11, 2014, 10:36:46 AM4/11/14
to
Release Notes for Thompson AWK Version 6.4 [10 March 2014]
------------------------------------------


This document contains the release notes for Thompson AWK version 6.4. This
new version runs only on Microsoft Windows XP, Vista, 7, and 8. It includes the
following enhancements:

o New command options

o New expression operators

o Enhanced run-time functions

o New constants

o New run-time functions

o Alternate semantics for certain AWK basic features

TAWK version 6 was developed by Heathcliffe Software of Cranston, RI, and
Windfall Software of Carlisle, MA. Please direct questions and problems to
Paul Anagnostopoulos. We recommend that you save your
current version of TAWK before installing this new version, in case you run
into any incompatibilities that we missed.

In the following text, the term "nil" is used to mean an uninitialized
value. You can define nil as follows:

local nil;


New Command Options
-------------------

The following new options are available on the awkw and awkcw commands:

-ALTSEMANTICS This option enables the alternate semantics of certain
AWK features (see below).

-V6FUNCTIONS This option enables the new functions (see below).

-XPATH This option specifies a list of paths to search for library
functions. When specified, it replaces TAWK's usual search
using the AWKPATH environment variable and the -p option.


In addition, the buffer for the PATH environment variable has been increased
from 260 to 1024 bytes.


New Expression Operators
------------------------

TAWK version 6 includes two new expression operators.

unary * The unary * operator may be applied to a string or a table. It
results in the length of the string or the number of entries in the table.
It is faster than using the length() function.

binary @ The binary @ operator takes a string as its left operand and an
integer index as its right operand. It results in the character of the
string at the specified position. If the index is out of range, the result
is the null string. It is much faster than using the substr() function. The
@ operator has precedence just above exponentiation (^).


Enhanced Functions
------------------

filesize()

The filesize() function has been enhanced to return a float if the specified
file's size is too large to represent as an integer. This allows you to use
this function with files larger than 2 GB.


ftell(), fseek()

The ftell() function has been enhanced to return a float if the current file
position is too large to represent as an integer. Likewise, fseek() has been
enhanced to accept either an integer or a float for the seek position. This
allows you to use these functions with files larger than 2 GB.


timetab()

In addition to the entries described in the TAWK manual, the table returned
by this function also includes the following entries. This feature requires
the -ALTSEMANTICS option (see "Alternate Semantics" below).

"TIMEZONE" Time zone relative to GMT, in seconds
"DST" 1 if daylight savings time, 0 if not


New Constants
-------------

TAWK now provides the following constants as built-in "variables."

INTMAX Maximum positive integer
INTMIN Minimum negative integer

FLOATMAX Maximum positive floating-point number
FLOATMIN Minimum negative float
FLOATPRECISION Number of binary digits in the float mantissa
FLOATEPSILON Difference between two consecutive floats

PI pi

CLOCK_TICKS_PER_SEC Number of clock ticks per second in the value
returned by the processor_time() function; see below

MAXARGS Maximum arguments that can be passed to a function


New Functions
-------------

TAWK version 6 provides the following new built-in functions. To prevent
name conflicts in existing programs, these functions are made available only
when the -V6FUNCTIONS option is specified on the awkw or awkcw commands.


abs(x)

The absolute value of the numeric argument x is returned.


bitclear(int, mask) [v6.4]

The integer is ANDed with the complement of the mask, so that any 1 bits in
the mask cause the corresponding bits in the integer to be cleared. The
resulting integer is returned. This function is simply a more efficient
method of performing:

and(int, not(mask))


bitextract(int, lsb, size) [v6.4]

A bit field in the integer is extracted and return as an integer. The lsb
and size arguments specify the least significant bit and size of the field.


bitmerge(int, lsb, size, value) [v6.4]

A bit field in the integer is replaced by the value and the resulting
integer is returned. The lsb and size arguments specify the least
significant bit and size of the field.


bittest(int, mask)

The integer is ANDed with the mask. If the result is nonzero, true (1) is
returned; otherwise, false (0) is returned. This function is simply a more
efficient method of performing:

and(int, mask) != 0


ceiling(x)
floor(x)
round(x)
truncate(x)

If x is an integer, it is returned unchanged. If it is a float, then it is
rounded in the following manner and the integer result is returned.

ceiling toward positive infinity
floor toward negative infinity
round to the nearest integer; rounding of x.5 depends on C library
truncate toward zero


global_function(name)

The name argument must be a string. If there is a global function by the
specified name, the function returns true (1); otherwise it returns false
(0).


inverse_lookup(tbl, val [, default])

The table tbl is scanned to find an entry whose value (not key) is equal to
val. If such an entry is found, the corresponding key is returned. If no
such value is found, the default is returned (or nil if no default is
specified). If the table has more than one entry whose value is val, it is
undefined which key is returned.


irand(low, high) [v6.2]

A random integer between low and high (inclusive) is calculated and
returned. If low is greater than high, than the two arguments are swapped.
This function updates the same random seed as the rand() function.


is_array(x) [v6.3]
is_float(x)
is_integer(x)
is_regex(x)
is_string(x)

is_array() returns true (1) if its argument is an array, false (0)
otherwise. The remaining functions do the equivalent thing for their
datatypes. These functions are significantly faster than using typeof().


key_table(k1, k2, ...)

A new table is created whose keys are the values k1, k2, etc., and whose
corresponding values are all nil. The table is returned. This function
provides an efficient way of creating a set represented by a table; it is
faster than doing the equivalent thing with table().


log10(x)

The log base 10 of x is returned.


max(x, y, ...)

The maximum value of the specified numbers is returned. The numbers can be
integers or floats.


min(x, y, ...)

The minimum value of the specified numbers is returned. The numbers can be
integers or floats.


processor_time()

The amount of processor time consumed by the current process is returned as
an integer. This must be interpreted using the constant CLOCK_TICKS_PER_SECOND.


scr_make_attr(foreground, background) [v6.2]

This function constructs a screen package attribute byte from the specified
foreground and background colors and returns it as a single-character
string. The byte is constructed as follows:

and(shiftl(background, 4), foreground)


select_arg(i, value1, value2, value3, ..., valuen) [v6.2]

The value argument selected by the i argument is returned. If i is outside the
range 1..n, then valuen is returned.


set_equal(s1, s2)

The two table arguments are treated as sets: The keys represent the strings
in the set and the values are ignored. If the sets contain exactly the same
keys, true (1) is returned. Otherwise, false (0) is returned.


set_subset(s1, s2)

The two table arguments are treated as sets. If the keys in s1 are a subset
of the keys in s2, true is returned. Otherwise, false is returned. The keys
in s1 are a subset of those in s2 if every key in s1 also appears in s2.


set_proper_subset(s1, s2)

The two table arguments are treated as sets. If the keys in s1 are a proper subset
of the keys in s2, true is returned. Otherwise, false is returned. The keys
in s1 are a proper subset of those in s2 if every key in s1 also appears in
s2, but the sets are not equal.


shiftextend(x, size)

The integer value x is shifted left logically and then right arithmetically
in order to extend the sign of the low-order field of the specified size.
The size must be in the range 0 .. 32. The resulting integer is returned.
This function is equivalent to:

shiftr(shiftl(x, 32 - size), 32 - size)


shiftrl(x, n)

The integer value x is shifted right logically the number of bit positions
specified by n. Vacated high-order bits are replaced with 0s. The resulting
integer is returned. Compare to shiftr().


signum(x)

The numeric argument x is tested and an integer is returned based on its
sign:

x < 0 return -1
x = 0 return 0
x > 0 return +1


split_float(x [, n_var])

The float x is split into its fraction f and exponent n. The fraction is
either 0.0 or 0.5 <= f < 1.0. The fraction is returned. The exponent is such
that f * 2^n is equal to x. If the n_var parameter is specified in the
call, then it must be a variable and is set to n.


substre(string, start [, end])

This function is like substr() except that the third argument is the end
index of the substring, rather than the length. The end index is inclusive,
specifying the index of the last character in the desired substring. If the
end index is negative, it specifies the index counting from the end of the
string instead of the beginning (last character has index -1, next-to-last
character has index -2, etc.).


tan(x)

The tangent of x is returned.


trim(str)

Leading and trailing whitespace is removed from the specified string and the
trimmed string is returned. Whitespace includes carriage return, linefeed,
formfeed, and space.


tuple_assign(tbl, var1, var2, ...)

The specified table is assumed to be a tuple (short vector) with keys 1, 2,
etc. Variable var1 is assigned the value of key 1, var2 is assigned the
value of key 2, etc. If there are more entries in the table than there are
variables, the extra entries are ignored. If there are not enough entries
for all the variables, the extra variables are assigned nil.

This function makes it reasonably efficient to return multiple values from a
function, as a tuple, and then assign those values to variables:

tuple_assign(func(x, y), sum, diff)

function func (x, y) {
...
return vector(x+y, x-y);
}


vector(v1, v2, ...)

A new table is created whose keys are the integers 1, 2, etc., and whose
corresponding values are specified by the arguments v1, v2, etc. The table
is returned. This is faster than doing the equivalent thing with table().


New Window Functions
~~~~~~~~~~~~~~~~~~~~

win_get_title() [v6.2]

This function returns the title of the DOS box window in which TAWK is
running.


win_set_title(title) [v6.2]

The title of the current DOS box window is set from the specified string.


win_get_show_state() [v6.2]

This function returns the "show state" of the current DOS box window as an
integer:

0 window is minimized
1 window is normalized
2 window is maximized
-1 an error occurred


win_set_show_state(state) [v6.2]

The show state of the current DOS box window is set from the specified
integer. The valid values are given above.


Alternate Semantics
-------------------

TAWK version 6 includes the capability to alter the semantics of certain
fundamental AWK features. For compatibility reasons, these alternate
semantics must be enabled with the -ALTSEMANTICS option on the awkw or awkcw
commands.

Fetching nonexistent keys. When a table fetch is performed using a key that
does not exist in the table, the key is added to the table with a value of
nil and the nil is returned. Under alternate semantics, nil is returned but
the key is not added to the table. Fetches of nonexistent keys do not alter
the table.

Arithmetic on nil. Nil is converted to 0 when used as an operand of a
numeric operator. Under alternate semantics, a warning message is issued when
this conversion is performed.

Comparison to nil. The equal (==) and not equal (!=) operators can be used to
compare integers, floats, strings, and nil. Comparing these scalars to a
table results in a run-time error. Under alternate semantics, these
operators may be used to compare scalars to tables. In other words, any data
types may be compared. A table is never equal to another value except for
the same table. The primary point of this change is to allow any value to be
checked for nil without causing run-time errors.

Promotion of arguments to tables. AWK scans defined functions to determine
whether any parameters are treated as tables (e.g., indexed with a
subscript). If so, it arranges for the corresponding arguments to be
promoted to empty tables before the function is called. This "feature" was
included in the original specification of AWK because it supported only call
by value, not call by reference. Because TAWK supports call by reference,
this feature is unnecessary. Under alternate semantics, arguments are never
promoted to tables regardless of how the corresponding parameters are used
in the functions. Of course, this does not prevent you from assigning a
table to a parameter passed by reference, thus replacing the argument value
with the table.

typeof() external data. When typeof() is called on a command line argument
(ARGV) or on data read from an external source such as a file or socket, the
string is checked to see if it matches the syntax of an integer or float. If
so, that numeric type is returned instead of "string". Under alternate
semantics, this feature is disabled and "string" is always returned.

pcanagno...@gmail.com

unread,
Apr 11, 2014, 10:42:10 AM4/11/14
to
On Thursday, April 10, 2014 11:53:09 PM UTC-4, Anton Treuenfels wrote:
> Did you pay any attention to V5 bugs? There aren't many, but I know of a
> few.

If you post a list, we can take a look at them.

> One has to do with arrays being sorted in the wrong order (there may be a
> message thread archived here about that - years and years old by now).
> Actually it affects both V4 and V5.

I don't know about this bug.

> V5 has a habit of trying to guess function argument types and then promoting
> them to that type no matter what they actually are. This conflicts with a
> tendency of mine to take V4's laxity regarding this issue to pass different
> arguments types to the same function, then figure out inside the function
> what to do based on those types. Perhaps not so much a bug as a "feature".

The auto-promotion hack is disabled in V6 with the -ALTSEMANTICS option. That also changes other semantics, however.

> There is also V5's int() function, which is unable to convert the minimum
> signed integer value (0x80000000) from a double to an integer (V4 can).

I'll check to see if we fixed this. We added ceiling(), floor(), round(), and truncate(), which I'll also check.

> Not a bug, but it would be nice if the regex() function returned a null of
> some sort if it's given something that can't be converted to a regular
> expression, rather than aborting the whole program with an error message.

This is not backward compatible, so we'll have to figure out some clever solution.

~~ Paul

Kaz Kylheku

unread,
Apr 11, 2014, 11:07:04 AM4/11/14
to
On 2014-04-11, pcanagno...@gmail.com <pcanagno...@gmail.com> wrote:
> You can email me for a copy of the V6 files.

This is not how source code is hosted nowadays.

Kaz Kylheku

unread,
Apr 11, 2014, 11:08:16 AM4/11/14
to
>> Not a bug, but it would be nice if the regex() function returned a null of
>> some sort if it's given something that can't be converted to a regular
>> expression, rather than aborting the whole program with an error message.
>
> This is not backward compatible, so we'll have to figure out some clever solution.

How is it not backward compatible to provide a meaningful beahvior instead of
crashing?

Anton Treuenfels

unread,
Apr 11, 2014, 1:08:48 PM4/11/14
to

<pcanagno...@gmail.com> wrote in message
news:4c699778-2526-4995...@googlegroups.com...
> Release Notes for Thompson AWK Version 6.4 [10 March
> 2014]
> ------------------------------------------
> bitextract(int, lsb, size) [v6.4]

> A bit field in the integer is extracted and return as an integer. The lsb
> and size arguments specify the least significant bit and size of the
> field.

If you haven't already, you might want to add some examples here showing a
few results. It's not clear from this whether or not the extracted bits are
shifted to the least significant bit positions before being converted to
integers (or it might not even occur to wonder to someone who's never needed
extract-and-shift, ie., someone not me).

- Anton Treuenfels

PS. BTW, many of the new functions seem to be concerned with faster ways to
do things that are already possible. Was there some specific
application-required need for that? Or was it just easy to do?

pcanagno...@gmail.com

unread,
Apr 11, 2014, 5:07:12 PM4/11/14
to
I have no interest in being more formal than this.

~~ Paul

pcanagno...@gmail.com

unread,
Apr 11, 2014, 5:08:24 PM4/11/14
to
Someone could have written a program that relies on the crash, since there was no other way to check for errors except to write a parser.

~~ Paul

pcanagno...@gmail.com

unread,
Apr 11, 2014, 5:12:26 PM4/11/14
to
On Friday, April 11, 2014 1:08:48 PM UTC-4, Anton Treuenfels wrote:
> > bitextract(int, lsb, size) [v6.4]
>
> > A bit field in the integer is extracted and return as an integer. The lsb
> > and size arguments specify the least significant bit and size of the
> > field.
>
> If you haven't already, you might want to add some examples here showing a
> few results. It's not clear from this whether or not the extracted bits are
> shifted to the least significant bit positions before being converted to
> integers (or it might not even occur to wonder to someone who's never needed
> extract-and-shift, ie., someone not me).

Good point. I've improved the description of the function.
>
> PS. BTW, many of the new functions seem to be concerned with faster ways to
> do things that are already possible. Was there some specific
> application-required need for that? Or was it just easy to do?

I used Hearsay (which compiles into TAWK) to write machine emulators, so speed is important. And, as you suggest, the bit-manipulation instructions were easy to implement.

~~ Paul

Kenny McCormack

unread,
Apr 11, 2014, 5:31:48 PM4/11/14
to
In article <7719bcc8-e722-41a0...@googlegroups.com>,
I get what you are saying, but I will only point out that very few software
packages (I.e., I can't think of any) guarantee this level of backwards
compatibility.

Essentially, if you make the guarantee that anything that ever caused a
crash will always cause a crash, then you are preventing yourself from ever
fixing any bug that resulted in a crash.

--
Rich people pay Fox people to convince middle class people to blame poor people.

(John Fugelsang)

pcanagno...@gmail.com

unread,
Apr 11, 2014, 7:33:40 PM4/11/14
to
On Friday, April 11, 2014 5:31:48 PM UTC-4, Kenny McCormack wrote:
> >Someone could have written a program that relies on the crash, since there was no
> >other way to check for errors except to write a parser.
>
> I get what you are saying, but I will only point out that very few software
> packages (I.e., I can't think of any) guarantee this level of backwards
> compatibility.

That's because no one cares about backward compatibility anymore. The problem here is that the change means your program forges ahead thinking that regex() did its job. I hope the next thing isn't deleting files based on the pattern or something dangerous like that.

One possibility is that we could add a variable REGEX_RETURNS_NIL that defaults to false but can be set to true to change the behavior. What do people think about this idea?

> Essentially, if you make the guarantee that anything that ever caused a
> crash will always cause a crash, then you are preventing yourself from ever
> fixing any bug that resulted in a crash.

I would certainly agree if the crash was a bug, but in this case it's really a feature: It's an error message, not a crash.

~~ Paul

Kaz Kylheku

unread,
Apr 11, 2014, 7:52:40 PM4/11/14
to
> One possibility is that we could add a variable REGEX_RETURNS_NIL that
> defaults to false but can be set to true to change the behavior. What do
> people think about this idea?

With regard to the proliferation of global variables, anyone in their right
mind adopts the views of mainstream computer science.

One way out of the conundrum is to provide a new, similar function which has
the new behavior, rather than to shoehorn two functions into one, distinguished
on the current value of a global variable.

>> Essentially, if you make the guarantee that anything that ever caused a
>> crash will always cause a crash, then you are preventing yourself from ever
>> fixing any bug that resulted in a crash.
>
> I would certainly agree if the crash was a bug, but in this case it's really
> a feature: It's an error message, not a crash.

But "Segmentation fault (core dumped)" is also an error message.

Kenny McCormack

unread,
Apr 11, 2014, 8:00:01 PM4/11/14
to
In article <44b13444-8424-459e...@googlegroups.com>,
<pcanagno...@gmail.com> wrote:
>On Friday, April 11, 2014 5:31:48 PM UTC-4, Kenny McCormack wrote:
>> >Someone could have written a program that relies on the crash, since there was no
>> >other way to check for errors except to write a parser.
>>
>> I get what you are saying, but I will only point out that very few software
>> packages (I.e., I can't think of any) guarantee this level of backwards
>> compatibility.
>
>That's because no one cares about backward compatibility anymore. The problem
>here is that the change means your program forges ahead thinking that regex() did
>its job. I hope the next thing isn't deleting files based on the pattern or
>something dangerous like that.

OK - I get it now. I didn't see where you were going before.

The point is that existing programs aren't setup to check the return value
of regex(), so they assume that it always succeeds (if the program is still
running). So, existing programs would have to be modified to check.

I agree with Janis - a new function is in order.

One common method used in Windows API functions is to append "Ex" to the
name of a function to create a new, enhanced version of the function. So,
we could have regexEx() (heh heh).

pcanagno...@gmail.com

unread,
Apr 11, 2014, 8:00:53 PM4/11/14
to
On Friday, April 11, 2014 7:52:40 PM UTC-4, Kaz Kylheku wrote:
> With regard to the proliferation of global variables, anyone in their right
> mind adopts the views of mainstream computer science.

Are you this rude all the time?

> One way out of the conundrum is to provide a new, similar function which has
> the new behavior, rather than to shoehorn two functions into one, distinguished
> on the current value of a global variable.

Yes, I agree that is a better solution. What would you suggest as the name of the new function?

> > I would certainly agree if the crash was a bug, but in this case it's really
> > a feature: It's an error message, not a crash.
>
> But "Segmentation fault (core dumped)" is also an error message.

Indeed, but you know what I meant. In the case of regex(), it is an error message that can occur in normal processing, unless the caller first checks the syntax of the regular expression, which is difficult. For example, if the user of the program enters the RE, then it is an expected error, not a crash.

~~ Paul

pcanagno...@gmail.com

unread,
Apr 11, 2014, 8:10:00 PM4/11/14
to
On Friday, April 11, 2014 8:00:01 PM UTC-4, Kenny McCormack wrote:
> The point is that existing programs aren't setup to check the return value
> of regex(), so they assume that it always succeeds (if the program is still
> running). So, existing programs would have to be modified to check.

Yes.

> One common method used in Windows API functions is to append "Ex" to the
> name of a function to create a new, enhanced version of the function. So,
> we could have regexEx() (heh heh).

Or how about regEx()? I love it when two names that differ only in case do different things! It's so Unix-file-systemy.

Maybe something as dumb as regex2()?

~~ Paul

Kaz Kylheku

unread,
Apr 11, 2014, 10:46:56 PM4/11/14
to
On 2014-04-12, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> One common method used in Windows API functions is to append "Ex" to the
> name of a function to create a new, enhanced version of the function. So,
> we could have regexEx() (heh heh).

Or, just take advantage of case sensitivity: regEx().

:)

pcanagno...@gmail.com

unread,
Apr 12, 2014, 8:36:13 AM4/12/14
to
Okay, the TAWK v6 team aims to please, so your request has been answered.

If you supply the "c" option in the second argument to regex(), the function returns nil if the expression is invalid, rather than terminating with an error.

This feature already exists in v5. How cool is that?

~~ Paul

Kenny McCormack

unread,
Apr 12, 2014, 8:57:44 AM4/12/14
to
In article <5f8f786f-487e-4531...@googlegroups.com>,
Interesting! Indeed:

C:>awkw '{ print typeof(regex($0,"c")) }'
tawk input? foo
regular_expression
tawk input? (
uninitialized
tawk input?
C:>

Without the "c", the last operation generates a crash.

This is undocumented: I have a TAWK V5 manual in hand at the moment, and
the section on the regex() function only mentions the "i" and "s" flags.

As an aside, including a "flags" parameter to a function is a good design
choice. It makes is possible/easier to extend the function in future
(avoiding the need, as we were headed towards in this thread, for creating
a new "Ex" version of the function).

As another aside, I think it is good that we are discussing TAWK again.
For a long time, some of the regs here disdained any such talk, and
insisted that only GAWK could be discussed (Yes, I'm looking at you, Ed).

Unfortunately, the bad/sad side of this is that I don't think anyone is
quite ready to take Paul up on his offer, because of the risk of
virus/malware. Yes, I know we live in a very trusting world, where people
routinely install God-knows-what on their smartphones/tables which, having
access to their local networks, compromises their entire security scheme
(and so on - you guys know the drill by now), but still, I think we need
someone to figure out what this really is that Paul is offering us and to
try to verify that it really is what it claims to be.

I'd like that person to be me, but I'm not sure it can be (if you see what
I mean...)

--
(The Republican mind, in a nutshell)
You believe things that are incomprehensible, inconsistent, impossible
because we have commanded you to believe them; go then and do what is
unjust because we command it. Such people show admirable reasoning. Truly,
whoever is able to make you absurd is able to make you unjust. If the
God-given understanding of your mind does not resist a demand to believe
what is impossible, then you will not resist a demand to do wrong to that
God-given sense of justice in your heart. As soon as one faculty of your
soul has been dominated, other faculties will follow as well. And from this
derives all those crimes of religion which have overrun the world.

(Alternative condensed translation)
"Those who can make you believe absurdities, can make you commit atrocities".

pcanagno...@gmail.com

unread,
Apr 12, 2014, 9:10:48 AM4/12/14
to
On Saturday, April 12, 2014 8:57:44 AM UTC-4, Kenny McCormack wrote:
> This is undocumented: I have a TAWK V5 manual in hand at the moment, and
> the section on the regex() function only mentions the "i" and "s" flags.

Indeed.

> As an aside, including a "flags" parameter to a function is a good design
> choice. It makes is possible/easier to extend the function in future
> (avoiding the need, as we were headed towards in this thread, for creating
> a new "Ex" version of the function).

I always thought the flags were silly and should simply be specified in the expression itself. That's how it works in Hearsay's expressions. I may have to eat those words.

> As another aside, I think it is good that we are discussing TAWK again.
> For a long time, some of the regs here disdained any such talk, and
> insisted that only GAWK could be discussed (Yes, I'm looking at you, Ed).

How odd.

> Unfortunately, the bad/sad side of this is that I don't think anyone is
> quite ready to take Paul up on his offer, because of the risk of
> virus/malware. Yes, I know we live in a very trusting world, where people
> routinely install God-knows-what on their smartphones/tables which, having
> access to their local networks, compromises their entire security scheme
> (and so on - you guys know the drill by now), but still, I think we need
> someone to figure out what this really is that Paul is offering us and to
> try to verify that it really is what it claims to be.

We have given TAWK v6 to three other folks so far with no issues. Of course, I could be making that up, too. I can see if one of them wants to accept emails to vouch for it.

~~ Paul

Janis Papanagnou

unread,
Apr 12, 2014, 9:36:24 AM4/12/14
to
On 12.04.2014 02:00, Kenny McCormack wrote:
>
> I agree with Janis - a new function is in order.

I didn't mention anything like that in this thread,
it was someone else.

(Usually I'd prefer unambiguous optional arguments,
or polymorphism by argument-type sort of functions,
but that's also depending on the actual language.)

Janis

Ed Morton

unread,
Apr 12, 2014, 11:18:15 AM4/12/14
to
On 4/12/2014 8:10 AM, pcanagno...@gmail.com wrote:
> On Saturday, April 12, 2014 8:57:44 AM UTC-4, Kenny McCormack wrote:
<snip>
>> As another aside, I think it is good that we are discussing TAWK again.
>> For a long time, some of the regs here disdained any such talk, and
>> insisted that only GAWK could be discussed (Yes, I'm looking at you, Ed).
>
> How odd.

And how incorrect. We get people asking "how do i do X in awk" and Kenny used to
frequently reply "in tawk you do it by invoking superDoXfunc()" to which people
respond "great, where do i get tawk?" which was then followed by an interlude of
crickets chirping and tumbleweed blowing across the plains before we get back to
the discussion of how to do X in an awk that's actually available and supported.

In that context, yes, discussion of tawk has been disdained as a complete waste
of time, much like if you asked how to record video on a DVR and someone kept
telling you how they do it on their Beta-max and how much better that was than a
DVR because it did blah-de-blah-de-blah...

Discussion of the other available, supported awks (POSIX, nawk,
/usr/xpg4/bin/awk, and mawk to name a few) in addition to gawk has generally
been considered just fine.

Ed.

Kenny McCormack

unread,
Apr 12, 2014, 12:03:36 PM4/12/14
to
In article <liblgi$68t$1...@dont-email.me>,
Ed Morton <morto...@gmail.com> wrote:
...
>>> insisted that only GAWK could be discussed (Yes, I'm looking at you, Ed).
...
>And how incorrect.

Not really.

>Discussion of the other available, supported awks (POSIX, nawk,
>/usr/xpg4/bin/awk, and mawk to name a few) in addition to gawk has generally
>been considered just fine.

Yeah, but since all those other AWKs suck, the effect is essentially "GAWK
only".

I.e., there really are only two AWKs that work well enough to be
interesting - namely, GAWK & TAWK.

P.S. I think that for people who meet all of these criteria, Paul's new
version of TAWK would be a good get:

1) Don't already have TAWK
2) Use Windows primarily
3) Would like to try out a really fine AWK implementation, one that
leaves all the others in the dust.

I think, based on your posts, that you qualify.

Note that the first two do not apply to me. What I'd really like to see is
an open-sourced version that I could compile for Linux and Mac OSX. That
would get me interested.

--
"The God of the Old Testament is arguably the most unpleasant character
in all fiction: jealous and proud of it; a petty, unjust, unforgiving
control-freak; a vindictive, bloodthirsty ethnic cleanser; a misogynistic,
homophobic, racist, infanticidal, genocidal, filicidal, pestilential,
megalomaniacal, sadomasochistic, capriciously malevolent bully."

- Richard Dawkins, The God Delusion -

Anton Treuenfels

unread,
Apr 12, 2014, 12:39:59 PM4/12/14
to

"Anton Treuenfels" <teamt...@yahoo.com> wrote in message
news:vuidnSEtkIGx-trO...@earthlink.com...
>
> Did you pay any attention to V5 bugs? There aren't many, but I know of a
> few.
>
> One has to do with arrays being sorted in the wrong order (there may be a
> message thread archived here about that - years and years old by now).
> Actually it affects both V4 and V5.

I dug out the original post I made about this using Google Groups:

=====================================

9/20/07
I tried the following both TAWK 4 and 5:

BEGIN {

local seg, pc
local ndx
local array

for ( seg = 0; seg < 1024; seg += 16 )
for ( pc = 0; pc < 1024; pc += 16 )
array[ sprintf("%03X:%03X", seg, pc) ] = ".T."

for ( ndx in array )
print ndx > "index.txt"
}

Given TAWK's default sorting of arrays I expected a sequence like this:

000:000
000:010
000:020
...

But for both I got instead:

000:000
000:0A0
000:0B0
...
000:3F0
000:010
000:020

which doesn't seem right to me.

When I add the lines:

if ( "000:010" < "000:3F0" )
print "TRUE"
else
print "FALSE"

I get "TRUE".

I'm missing something but I don't know what it is. Any help?

==================================

BTW, Google noted a couple of posts in that thread written by Kenny and Ed
but did not recover them. I can't help but wonder if that is more an
unwillingness rather than an inability :-).

- Anton Treuenfels



pcanagno...@gmail.com

unread,
Apr 12, 2014, 3:36:47 PM4/12/14
to
On Saturday, April 12, 2014 12:39:59 PM UTC-4, Anton Treuenfels wrote:
> Given TAWK's default sorting of arrays I expected a sequence like this:

Consider the portion of the key following the colon.

TAWK finds the longest integer sequences, sorts the integer portion, then the following alphabetic portion within each integer portion. So "000" sorts first. Then the "0xx" keys are sorted by xx. Then the "1xx" keys are sorted by xx. Then the "2xx" and "3xx" keys. Then the "nnn" keys.

It appears to be completely nuts. I think it has something to do with trying to sort floating-point numbers in some reasonable order. Of course the fundamental problem is that keys should be able to be any datatype, not just strings.

~~ Paul

pcanagno...@gmail.com

unread,
Apr 12, 2014, 4:54:57 PM4/12/14
to
If you want to try TAWK v6 and would feel better with an existing user vouching for it, send me an email and I will give you his email address. Please don't make it public.

~~ Paul

Anton Treuenfels

unread,
Apr 12, 2014, 5:38:35 PM4/12/14
to

<pcanagno...@gmail.com> wrote in message
news:f2b48505-4b38-4acd...@googlegroups.com...
=========================

Oh, I dunno. It's hard to imagine any reason regular expression keys would
be useful. Even if it was possible, what would be the sort order of such
things?

The problem as I see it is that my "mental model" of how keys work in AWK is
that they are all strings,with numeric values converted to strings in order
to make this work. Behind the scenes I imagine some sort of hashing going on
to make searches efficient, but that's an implementation detail.

But if keys are all strings, then it is disconcerting to have "000:010" be
less than "000:03F0" when compared straight up as strings but greater than
when compared as array indices. Part of the reason I used a colon in them in
the first place was to make sure they didn't look like numbers.

- Anton Treuenfels

Aharon Robbins

unread,
Apr 13, 2014, 2:09:41 AM4/13/14
to
In article <6fc9384e-21cc-4979...@googlegroups.com>,
<pcanagno...@gmail.com> wrote:
>On Thursday, April 10, 2014 8:58:55 PM UTC-4, Kenny McCormack wrote:
>> Paul, I'm very interested in what you've got. I think you should just post
>> it somewhere and let us take a look at it. It should be open-source. I
>> assume it is...
>
>No, it's still copyrighted to Thompson Automation. I didn't feel it was
>my right to change the copyright. Let's see if I can include the V6
>release notes in the next post.

If you're able to reach Pat Thompson, why not just *ask* him for permission
to release it?

Thanks,

Arnold
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon
D.N. Shimshon 9978500 ISRAEL

pcanagno...@gmail.com

unread,
Apr 13, 2014, 8:11:01 AM4/13/14
to
On Saturday, April 12, 2014 5:38:35 PM UTC-4, Anton Treuenfels wrote:
> Oh, I dunno. It's hard to imagine any reason regular expression keys would
> be useful. Even if it was possible, what would be the sort order of such
> things?

I don't know about regular expressions, but certainly integers and floats.

> The problem as I see it is that my "mental model" of how keys work in AWK is
> that they are all strings,with numeric values converted to strings in order
> to make this work. Behind the scenes I imagine some sort of hashing going on
> to make searches efficient, but that's an implementation detail.

The problem is that the keys actually are only strings, with weird conversion and sorting rules for integers and floats. It would be better if the keys remained their original datatype. You would have to specify the sorting rules for regular expressions, tables, etc., but I think that's better than not allowing them at all. Who says I'm going to sort them anyway?

> But if keys are all strings, then it is disconcerting to have "000:010" be
> less than "000:03F0" when compared straight up as strings but greater than
> when compared as array indices. Part of the reason I used a colon in them in
> the first place was to make sure they didn't look like numbers.

If you want them treated as ASCII strings, without the bizarre alphanumeric sorting rules, then use SORTTYPE = 2. You'll get straightforward ASCII sorting.

Also, don't forget SORTTYPE = 0 when you don't care what order you visit the entries. That's the default sort type in Hearsay, my language that compiles into TAWK.

~~ Paul

pcanagno...@gmail.com

unread,
Apr 13, 2014, 8:12:17 AM4/13/14
to
> If you're able to reach Pat Thompson, why not just *ask* him for permission
> to release it?

I've not gotten any response to emails in awhile.

~~ Paul

Aharon Robbins

unread,
Apr 13, 2014, 1:40:27 PM4/13/14
to
In article <bc701ffc-817a-44a1...@googlegroups.com>,
Well, it'd be worth a try anyway. Maybe you have a snail mail address
for him?

pcanagno...@gmail.com

unread,
Apr 13, 2014, 1:45:33 PM4/13/14
to
On Sunday, April 13, 2014 1:40:27 PM UTC-4, Aharon Robbins wrote:
> Well, it'd be worth a try anyway. Maybe you have a snail mail address
> for him?

I have his email address, but he doesn't respond.

~~ Paul

Kenny McCormack

unread,
Apr 13, 2014, 2:14:11 PM4/13/14
to
In article <acec30f7-b139-46e4...@googlegroups.com>,
Maybe he is back in the wilds of Thailand...

This (what I am about to say) is certainly off-topic (in an AWK newsgroup),
and it is certainly in the "Just out of curiosity" category, but I'm
curious how and in what manner you got in touch with him at all? How was
the code transfer done?

Was it all electronic or was there a FTF meeting?

(I ask, incidentally, as someone with a bit more standing than "just
another random Usenetter"...)

--
People who say they'll vote for someone else because Obama couldn't solve
all of Bush's messes are like people complaining that he couldn't cure cancer,
so they'll go and vote for cancer.

pcanagno...@gmail.com

unread,
Apr 13, 2014, 4:44:17 PM4/13/14
to
On Sunday, April 13, 2014 2:14:11 PM UTC-4, Kenny McCormack wrote:
> This (what I am about to say) is certainly off-topic (in an AWK newsgroup),
> and it is certainly in the "Just out of curiosity" category, but I'm
> curious how and in what manner you got in touch with him at all? How was
> the code transfer done?

I asked for the code a few times. At last, in July 2010, he kindly sent me a tar file.

~~ Paul

Kenny McCormack

unread,
Apr 13, 2014, 4:48:01 PM4/13/14
to
In article <95fe7737-bcc9-44d6...@googlegroups.com>,
I see. Very interesting. Thanks.

--
Just for a change of pace, this sig is *not* an obscure reference to
comp.lang.c...

pcanagno...@gmail.com

unread,
Apr 14, 2014, 6:16:55 PM4/14/14
to
I'm sorry that people don't feel comfortable trying TAWK v6 because of concerns about viruses and such. I can understand the concerns, but it's too bad.

If you want to send an email to the fellow who will vouch for it, send me an email and I'll give you his address.


~~ Paul

Kenny McCormack

unread,
Apr 14, 2014, 10:43:42 PM4/14/14
to
In article <3f5f5a2e-5f7d-4f7e...@googlegroups.com>,
Well, I may have done a bad thing by mentioning it, but I'd imagine that at
least one other person out there was thinking the same thing. But,
anyway...

I hope you don't give up on it - but, speaking only for myself, I don't
really see any great benefit in it - since I already have a copy for
Windows - there's no real incentive for me. As I indicated, the main
benefit I see at this point is for people who don't currently have access
to TAWK to be able to get it. Yes, I have high hopes for you, Ed.

But, if we can get an open-source version for Linux, then I'd definitely be
back in the game. Note, BTW, that TAWK does also have a Solaris version,
so I'd imagine that a Linux port won't be that difficult.


--
Modern Conservative: Someone who can take time out from flashing her
wedding ring around and bragging about her honeymoon to complain that a
fellow secretary who keeps a picture of her girlfriend on her desk is
"flauting her sexuality" and "forcing her lifestyle down our throats".

Anton Treuenfels

unread,
Apr 14, 2014, 10:45:47 PM4/14/14
to

"Kenny McCormack" <gaz...@shell.xmission.com> wrote in message
news:libd88$u6t$1...@news.xmission.com...
> In article <5f8f786f-487e-4531...@googlegroups.com>,
> <pcanagno...@gmail.com> wrote:
>>Okay, the TAWK v6 team aims to please, so your request has been answered.
>>
>>If you supply the "c" option in the second argument to regex(), the
>>function
>>returns nil if the expression is invalid, rather than terminating with an
>>error.
>>
>>This feature already exists in v5. How cool is that?
>
> Interesting! Indeed:
>
> C:>awkw '{ print typeof(regex($0,"c")) }'
> tawk input? foo
> regular_expression
> tawk input? (
> uninitialized
> tawk input?
> C:>
>
> Without the "c", the last operation generates a crash.
>
> This is undocumented

I have also verified that the "c" option exists in V4 for MS-DOS:

BEGIN {

print typeof( regex("[a-z]", "c") )
print typeof( regex("[a-", "c") )
}

regular_expression
uninitialized

Similarly undocumented.

- Anton Treuenfels

pcanagno...@gmail.com

unread,
Apr 15, 2014, 7:33:01 AM4/15/14
to
On Monday, April 14, 2014 10:43:42 PM UTC-4, Kenny McCormack wrote:
> Well, I may have done a bad thing by mentioning it, but I'd imagine that at
> least one other person out there was thinking the same thing. But,
> anyway...

No worries. It's a legitimate concern.

> I hope you don't give up on it - but, speaking only for myself, I don't
> really see any great benefit in it - since I already have a copy for
> Windows - there's no real incentive for me. As I indicated, the main
> benefit I see at this point is for people who don't currently have access
> to TAWK to be able to get it. Yes, I have high hopes for you, Ed.

I can't give a copy to someone who doesn't already have it. It's still Pat's product. The advantage for me are the new features and the performance improvement. I use it every day.


~~ Paul

Joep van Delft

unread,
Apr 15, 2014, 7:59:28 AM4/15/14
to
On Tue, 15 Apr 2014 04:33:01 -0700 (PDT)
pcanagno...@gmail.com wrote:

> I can't give a copy to someone who doesn't already have it.

What is this? A new manifestation of the Templar's Knights?

What we know:
- Some mythical and magical software supposedly exists under the
name `tawk`
- Random Guy claims he has The Sources, and improved the non-existing
software
- If you are in the exclusive circle of the initiated, you can get
get it by private mail

I am not at all surprised that `heresay` compiles into TAWK. Publicly
available software is the cult I'll stick with.

Joep

Ed Morton

unread,
Apr 15, 2014, 8:43:19 AM4/15/14
to
On 4/14/2014 9:43 PM, Kenny McCormack wrote:
> In article <3f5f5a2e-5f7d-4f7e...@googlegroups.com>,
> <pcanagno...@gmail.com> wrote:
>> I'm sorry that people don't feel comfortable trying TAWK v6 because of concerns
>> about viruses and such. I can understand the concerns, but it's too bad.
>>
>> If you want to send an email to the fellow who will vouch for it, send me an
>> email and I'll give you his address.
>
> Well, I may have done a bad thing by mentioning it, but I'd imagine that at
> least one other person out there was thinking the same thing. But,
> anyway...
>
> I hope you don't give up on it - but, speaking only for myself, I don't
> really see any great benefit in it - since I already have a copy for
> Windows - there's no real incentive for me. As I indicated, the main
> benefit I see at this point is for people who don't currently have access
> to TAWK to be able to get it. Yes, I have high hopes for you, Ed.

Not trying to be obstinate or negative but I honestly just can't imagine what I
or anyone else could possibly get out of trying tawk. gawk provides everything I
currently need, is well documented, supported, and generally available.

Sure there's a few additional things I'd like to have in it but no big deal if I
have to work around them and the only "missing" functionality I've needed
frequently enough to actively put some effort into providing for myself was the
ability to specify here documents in print statements so with my home-spun hack
you can simply do:

awk '/some regexp/ {
print <<!
/* Here is some random function
* being added to some code.
*/
int foo()
{
int c = bar();

printf("c is now: %d\n", c);

return 0;
}
!
}' file

to add multi-line text to files.

I wouldn't even bother to download a new gawk release - I just use whatever
release is on whatever box I happen to be using at the time and if there's a
feature that's only available on newer releases, I'd either ask the IT guy
du-jour to install a newer one or just code around it. Again, no big deal.

Janis Papanagnou

unread,
Apr 15, 2014, 8:54:32 AM4/15/14
to
On 15.04.2014 14:43, Ed Morton wrote:
[...]
>
> Sure there's a few additional things I'd like to have in it but no big deal if
> I have to work around them and the only "missing" functionality I've needed
> frequently enough to actively put some effort into providing for myself was
> the ability to specify here documents in print statements so with my home-spun
> hack you can simply do:
>
> awk '/some regexp/ {
> print <<!
> /* Here is some random function
> * being added to some code.
> */
> int foo()
> {
> int c = bar();
>
> printf("c is now: %d\n", c);
>
> return 0;
> }
> !
> }' file
>
> to add multi-line text to files.

Are you doing some preprocessing or did you modify the gawk source to obtain
that feature? In the latter case I'd be interested whether you consider that
code to be clean enough to ask Arnold to incorporate it in gawk.

In the past I occasionally needed such a feature as well, and worked around
it by either writing multiple print/printf lines, or by solving it in shell.

(Seeing your code, and until such feature is available, I suppose next time
I'll use such a shell syntax and pre-process awk programs that are using it.)

Janis

>
> [...]

Kaz Kylheku

unread,
Apr 15, 2014, 9:31:05 AM4/15/14
to
On 2014-04-15, pcanagno...@gmail.com <pcanagno...@gmail.com> wrote:
> On Monday, April 14, 2014 10:43:42 PM UTC-4, Kenny McCormack wrote:
>> Well, I may have done a bad thing by mentioning it, but I'd imagine that at
>> least one other person out there was thinking the same thing. But,
>> anyway...
>
> No worries. It's a legitimate concern.
>
>> I hope you don't give up on it - but, speaking only for myself, I don't
>> really see any great benefit in it - since I already have a copy for
>> Windows - there's no real incentive for me. As I indicated, the main
>> benefit I see at this point is for people who don't currently have access
>> to TAWK to be able to get it. Yes, I have high hopes for you, Ed.
>
> I can't give a copy to someone who doesn't already have it. It's still Pat's
> product.

W00T.

First it was, "I have a release ... anyone who wants it send me an e-mail (I
don't want to go to the 'formality' of putting it in a public repo, like normal
open-source people in the year 2014)'."

Then it was, "anyone who wants it, send me an e-mail---but please don't share with others".

Now the story is, "only existing users can actually have the update".

You know, if it's Pat's product, and Pat doesn't answer e-mails, do you even
have a right to be making all these announcements ...

Ed Morton

unread,
Apr 15, 2014, 10:00:31 AM4/15/14
to
On 4/15/2014 7:54 AM, Janis Papanagnou wrote:
> On 15.04.2014 14:43, Ed Morton wrote:
> [...]
>>
>> Sure there's a few additional things I'd like to have in it but no big deal if
>> I have to work around them and the only "missing" functionality I've needed
>> frequently enough to actively put some effort into providing for myself was
>> the ability to specify here documents in print statements so with my home-spun
>> hack you can simply do:
>>
>> awk '/some regexp/ {
>> print <<!
>> /* Here is some random function
>> * being added to some code.
>> */
>> int foo()
>> {
>> int c = bar();
>>
>> printf("c is now: %d\n", c);
>>
>> return 0;
>> }
>> !
>> }' file
>>
>> to add multi-line text to files.
>
> Are you doing some preprocessing or did you modify the gawk source to obtain
> that feature?

Sadly it's a pe-processor. It parses the input with awk to create a new awk
script, essentially replacing:

print <<!
line 1
line 2
!

with:

print "line 1"
print "line 2"

and then executes that generated script on the original input file.

Beyond that, it just does some interesting stuff with the indenting and allowing
awk variables and quotes within the print statements.

In the latter case I'd be interested whether you consider that
> code to be clean enough to ask Arnold to incorporate it in gawk.
>
> In the past I occasionally needed such a feature as well, and worked around
> it by either writing multiple print/printf lines, or by solving it in shell.
>
> (Seeing your code, and until such feature is available, I suppose next time
> I'll use such a shell syntax and pre-process awk programs that are using it.)

Mine's attached below in case it's useful, with usage examples at the end. I
call it "epawk" for "Extended Print AWK".

Ed.

#!/usr/bin/bash
# The above must be the first line of this script as bash or zsh is
# required for the shell array reference syntax used in this script.

##########################################################
# Extended Print AWK
#
# Allows printing of pre-formatted blocks of multi-line text in awk scripts.
#
# Before invoking the tool, do the following IN ORDER:
#
# 1) Start each block of pre-formatted text in your script with
# print << TERMINATOR
# on it's own line and end it with
# TERMINATOR
# on it's own line. TERMINATOR can be any sequence of non-blank characters
# you like. Spaces are allowed around the symbols but are not required.
# If << is followed by -, e.g.:
# print <<- TERMINATOR
# then all leading tabs are removed from the block of pre-formatted
# text (just like shell here documents), if it's followed by + instead, e.g.:
# print <<+ TERMINATOR
# then however many leading tabs are common across all non-blank lines
# in the current pre-formatted block are removed.
# If << is followed by =, e.g.
# print <<= TERMINATOR
# then whatever leading white space (tabs or blanks) occurs before the
# "print" command will be removed from all non-blank lines in
# the current pre-formatted block.
# By default no leading spaces are removed. Anything you place after
# the TERMINATOR will be reproduced as-is after every line in the
# post-processed script, so this for example:
# print << HERE |"cat>&2"
# foo
# HERE
# would cause "foo" to be printed to stderr.
#
# 2) Within each block of pre-formatted text only:
# a) Put a backslash character before every backslash (\ -> \\).
# b) Put a backslash character before every double quote (" -> \").
# c) Enclose awk variables in double quotes without leading
# backslashes (awkVar -> "awkVar").
# d) Enclose awk record and field references ($0, $1, $2, etc.)
# in double quotes without leading backslashes ($1 -> "$1").
#
# 3) If the script is specified on the command line instead of via
# "-f script" then replace all single quote characters (') in or out
# of the pre-formatted blocks with their ANSI octal escape sequence (\047)
# or the sequence '\'' (tick backslash tick tick). This is normal and is
# required because command-line awk scripts cannot contain single quote
# characters as those delimit the script. Do not use hex \x27, see
# http://awk.freeshell.org/PrintASingleQuote.
#
# Then just use it like you would gawk with the small caveat that only
# "-W <option>", not "--<option>", is supported for long options so you
# can use "-W re-interval" but not "--re-interval" for example.
#
# To just see the post-processed script and not execute it, call this
# script with the "-X" option.
#
# See the bottom of this file for usage examples.
##########################################################

toolName="$(basename "$0")"

expand_prints() {

gawk '

!inBlock {
if ( match($0,/^[[:blank:]]*print[[:blank:]]*<</) ) {

# save any blanks before the print in case
# skipType "=" is used.
leadBlanks = $0
sub(/[^[:blank:]].*$/,"",leadBlanks)

$0 = substr($0,RSTART+RLENGTH)

if ( sub(/^[-]/,"") ) { skipType = "-" }
else if ( sub(/^[+]/,"") ) { skipType = "+" }
else if ( sub(/^[=]/,"") ) { skipType = "=" }
else { skipType = "" }

gsub(/(^[[:blank:]]+|[[:blank:]]+$)/,"")

if (/[[:blank:]]/) {
terminator = $0
sub(/[[:blank:]].*/,"",terminator)

postprint = $0
sub(/[^[:blank:]]+[[:blank:]]+/,"",postprint)
}
else {
terminator = $0
postprint = ""
}

startBlock()

next
}
}

inBlock {

stripped=$0
gsub(/(^[[:blank:]]+|[[:blank:]]+$)/,"",stripped)

if ( stripped"" == terminator"" ) {
endBlock()
}
else {
updBlock()
}

next
}

{ print }

function startBlock() { inBlock=1; numLines=0 }

function updBlock() { block[++numLines] = $0 }

function endBlock( i,numSkip,indent) {

if (skipType == "") {
# do not skip any leading tabs
indent = ""
}
else if (skipType == "-") {
# skip all leading tabs
indent = "[\t]+"
}
else if (skipType == "+") {

# skip however many leading tabs are common across
# all non-blank lines in the current pre-formatted block

for (i=1;i<=numLines;i++) {

if (block[i] ~ /[^[:blank:]]/) {

match(block[i],/^[\t]+/)

if ( (numSkip == "") || (numSkip > RLENGTH) ) {
numSkip = RLENGTH
}
}
}

for (i=1;i<=numSkip;i++) {
indent = indent "\t"
}
}
else if (skipType == "=") {
# skip whatever pattern of blanks existed
# before the "print" statement
indent = leadBlanks
}


for (i=1;i<=numLines;i++) {
sub(indent,"",block[i])
print "print \"" block[i] "\"\t" postprint
}

inBlock=0
}

' "$@"

}

unset awkArgs
unset scriptFiles
expandOnly=0
while getopts "v:F:W:f:X" arg
do
case $arg in
f ) scriptFiles+=( "$OPTARG" ) ;;
[vFW] ) awkArgs+=( "-$arg" "$OPTARG" ) ;;
X ) expandOnly=1 ;;
* ) exit 1 ;;
esac
done
shift $(( OPTIND - 1 ))

if [ -z "${scriptFiles[*]}" -a "$#" -gt "0" ]
then
# The script cannot contain literal 's because in cases like this:
# 'BEGIN{ ...abc'def... }'
# the args parsed here (and later again by gawk) would be:
# $1 = BEGIN{ ...abc
# $2 = def... }
# Replace 's with \047 or '\'' if you need them:
# 'BEGIN{ ...abc\047def... }'
# 'BEGIN{ ...abc'\''def... }'
scriptText="$1"
shift
fi

# Remaining symbols in "$@" must be data file names and/or variable
# assignments that do not use the "-v name=value" syntax.

if [ -n "${scriptFiles[*]}" ]
then
if (( expandOnly == 1 ))
then
expand_prints "${scriptFiles[@]}"
else
gawk "${awkArgs[@]}" "$(expand_prints "${scriptFiles[@]}")" "$@"
fi

elif [ -n "$scriptText" ]
then
if (( expandOnly == 1 ))
then
printf '%s\n' "$scriptText" | expand_prints
else
gawk "${awkArgs[@]}" "$(printf '%s\n' "$scriptText" | expand_prints)" "$@"
fi
else
printf '%s: ERROR: no awk script specified.\n' "$toolName" >&2
exit 1
fi

exit

##########################################################
USAGE EXAMPLES:

$ cat data.txt
abc def"ghi
$
#######
$ cat script.awk
{
awkVar="bar"

print "----------------"

print << HERE
backslash: \\

quoted text: \"text\"

single quote as ANSI sequence: \047

literal single quote (ONLY works when script is in a file): '

awk variable: "awkVar"

awk field: "$2"
HERE

print "----------------"

print <<-!
backslash: \\

quoted text: \"text\"

single quote as ANSI sequence: \047

literal single quote (ONLY works when script is in a file): '

awk variable: "awkVar"

awk field: "$2"
!

print "----------------"

print <<+ whatever
backslash: \\

quoted text: \"text\"

single quote as ANSI sequence: \047

literal single quote (ONLY works when script is in a file): '

awk variable: "awkVar"

awk field: "$2"
whatever

print "----------------"
}

$ epawk -f script.awk data.txt
----------------
backslash: \

quoted text: "text"

single quote as ANSI sequence: '

literal single quote (ONLY works when script is in a file): '

awk variable: bar

awk field: def"ghi
----------------
backslash: \

quoted text: "text"

single quote as ANSI sequence: '

literal single quote (ONLY works when script is in a file): '

awk variable: bar

awk field: def"ghi
----------------
backslash: \

quoted text: "text"

single quote as ANSI sequence: '

literal single quote (ONLY works when script is in a file): '

awk variable: bar

awk field: def"ghi
----------------

#######
$ epawk -F\" '{
print <<!
ANSI-tick-surrounded quote-separated field 2 (will work): \047"$2"\047
!
}' data.txt
ANSI-tick-surrounded quote-separated field 2 (will work): 'ghi'
$
#######
epawk -F\" '{
print <<!
Shell-escaped-tick-surrounded quote-separated field 2 (will work): '\''"$2"'\''
"
}' data.txt
Shell-escaped-tick-surrounded quote-separated field 2 (will work): 'ghi'
$
#######
$ epawk -F\" '{
print <<!
Literal-tick-surrounded quote-separated field 2 (will not work): '"$2"'
!
}' data.txt
Literal-tick-surrounded quote-separated field 2 (will not work):
$
#######
$ epawk -X 'BEGIN{
print <<!
foo
bar
!
}'
BEGIN{
print " foo"
print " bar"
}
$
#######
$ cat file
a
b
c
$ epawk '{
print <<+! |"cat>o2"
numLines="NR"
numFields="NF", $0="$0", $1="$1"
!
}' file
$ cat o2
numLines=1
numFields=1, $0=a, $1=a
numLines=2
numFields=1, $0=b, $1=b
numLines=3
numFields=1, $0=c, $1=c
$
#######
$ epawk 'BEGIN{

cmd = "sort"
print <<+! |& cmd
d
b
a
c
!
close(cmd, "to")

while ( (cmd |& getline line) > 0 ) {
print "got:", line
}
close(cmd)

}' file
got: a
got: b
got: c
got: d
$


Kenny McCormack

unread,
Apr 15, 2014, 11:04:04 AM4/15/14
to
In article <53065b75-ebd6-4163...@googlegroups.com>,
<pcanagno...@gmail.com> wrote:
...
>I can't give a copy to someone who doesn't already have it. It's still Pat's
>product. The advantage for me are the new features and the performance
>improvement. I use it every day.

Oh, I see. That *is* a new wrinkle.

Just out of curiosity, how are you verifying?

FWIW, I don't doubt that it is better. Whether or not that is enough to
swing the needle for me is unclear ATM.

Also, to Kazzie, please do note that Paul has stated several times that it
*isn't* open-source, as much as you and I (and no doubt others) wish it
were. We (you & I & others) are just going to have to live with this for
the time being.

--
About that whole "sent His Son to die for us thing", I've never been able
to understand that one. It's not like Jesus isn't going back to Heaven
after his Earthly self dies, right? So, having him be executed, and
resurrect a few days later strikes me as being more akin to spending the
weekend at the non-custodial parent's house than "dying", doesn't it?

pcanagno...@gmail.com

unread,
Apr 15, 2014, 3:21:31 PM4/15/14
to
On Tuesday, April 15, 2014 8:43:19 AM UTC-4, Ed Morton wrote:
> Not trying to be obstinate or negative but I honestly just can't imagine what I
> or anyone else could possibly get out of trying tawk. gawk provides everything I
> currently need, is well documented, supported, and generally available.

I don't know why you'd bother with TAWK if you weren't already using it.

~~ Paul

pcanagno...@gmail.com

unread,
Apr 15, 2014, 3:26:29 PM4/15/14
to
On Tuesday, April 15, 2014 9:31:05 AM UTC-4, Kaz Kylheku wrote:
> First it was, "I have a release ... anyone who wants it send me an e-mail (I
> don't want to go to the 'formality' of putting it in a public repo, like normal
> open-source people in the year 2014)'."

I meant anyone who wants it that already has TAWK. It's not my right to put it in a public place.

> Then it was, "anyone who wants it, send me an e-mail---but please don't share with others".

I asked that people not give out the email of the user who will vouch for the software.

> Now the story is, "only existing users can actually have the update".

I thought that was pretty clear from the beginning, but if not, my apologies.

> You know, if it's Pat's product, and Pat doesn't answer e-mails, do you even
> have a right to be making all these announcements ...

He said it was okay to give it to existing users.

You seem awfully worked up about this.

~~ Paul

pcanagno...@gmail.com

unread,
Apr 15, 2014, 3:30:44 PM4/15/14
to
On Tuesday, April 15, 2014 11:04:04 AM UTC-4, Kenny McCormack wrote:
> Just out of curiosity, how are you verifying?

You mean testing the changes? I'm not doing it formally, since I have no test suite. Just writing small test programs and running many large TAWK and Hearsay (compiles to TAWK) applications.

> FWIW, I don't doubt that it is better. Whether or not that is enough to
> swing the needle for me is unclear ATM.

Fair enough.

> Also, to Kazzie, please do note that Paul has stated several times that it
> *isn't* open-source, as much as you and I (and no doubt others) wish it
> were. We (you & I & others) are just going to have to live with this for
> the time being.

I'm going to have a bumper sticker made: Live Open or Die.

~~ Paul

Kaz Kylheku

unread,
Apr 15, 2014, 4:13:29 PM4/15/14
to
> You seem awfully worked up about this.

I'm not worked up, but I could just have "plonked" this whole thread if it had
been clear from the beginning that it it's a private affair, of sorts.

pcanagno...@gmail.com

unread,
Apr 15, 2014, 6:31:41 PM4/15/14
to
On Tuesday, April 15, 2014 4:13:29 PM UTC-4, Kaz Kylheku wrote:

> I'm not worked up, but I could just have "plonked" this whole thread if it had
> been clear from the beginning that it it's a private affair, of sorts.

It's a family affair, man! ;-)

~~ Paul

Anton Treuenfels

unread,
Apr 15, 2014, 7:40:56 PM4/15/14
to

"Joep van Delft" <joepva...@xs4all.nl> wrote in message
news:20140415135...@xs4all.nl...
I'm inclined to believe him, if for no other reason than learning about the
existence of the undocumented regex() "c" flag.

Granted, it could have been discovered by an accidental incorrect flag
specification plus an incorrect regex specification plus observing a failure
to fail plus making the connection between the two arguments (a case of two
wrongs making a right, perhaps).

Or it could have been discovered by rigorously trying every conceivable
combination of arguments to regex(), just to see what would happen.

Seems easiest to imagine someone with access to the source discovering it by
reading that source.

- Anton Treuenfels

pcanagno...@gmail.com

unread,
Apr 16, 2014, 7:34:42 PM4/16/14
to
Pat has responded to my latest email. We are talking about how he can host TAWK v6 at his website.

~~ Paul

pcanagno...@gmail.com

unread,
Apr 16, 2014, 1:36:20 PM4/16/14
to
Good news! I have contacted Pat and he is willing to host TAWK v6 at his site. I'll post again when it is available.

~~ Paul

Kenny McCormack

unread,
Apr 16, 2014, 5:10:13 PM4/16/14
to
In article <93134b56-4e31-43ff...@googlegroups.com>,
Yey!

--
A Catholic woman tells her husband to buy Viagra.

A Jewish woman tells her husband to buy Pfizer.

hossam....@gmail.com

unread,
Jun 29, 2017, 6:38:06 AM6/29/17
to
On Saturday, April 12, 2014 at 10:54:57 PM UTC+2, pcanagno...@gmail.com wrote:
> If you want to try TAWK v6 and would feel better with an existing user vouching for it, send me an email and I will give you his email address. Please don't make it public.
>
> ~~ Paul

Hi Paul, I want to try TAWK v6, can you kindly send it to me?

Markus Gnam

unread,
Jun 30, 2017, 5:03:02 PM6/30/17
to
Probably it's best to email Paul a private answer posted from Google Groups
(if you haven't done it yet).
0 new messages