Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

one-liner for auto-detection of awk variant

22 views
Skip to first unread message

Kpop 2GM

unread,
Feb 22, 2022, 2:42:46 PM2/22/22
to
hey Janis,

i managed to come up with this intentionally crafted line that managed to spit out 7 different responses, including 4 different responses from gawk, which helps to auto-detect which awk-variant or invocation flags were used, and to tailor your program's/function's behavior, if necessary (e.g. gawk -P posix mode cannot directly calculate length(array), requiring workarounds to yield a count). It's not perfect, as multiple gawk invocation flags still yield "+nan", but it shouldn't be too hard to do a secondary test from that point on.

% ${awk_variant} 'BEGIN { print \
\
-log((log(-0)*log(-0))^-log(0^length("\0")))/(-"0xABCD")^-(1-("<"<"\x3C")) }'

gawk511 -e : +nan
gawk -c : +nan
gawk -t : +nan
--------------------------------------
gawk -M : -nan
--------------------------------------
gawk -n : +inf
gawk -P : -inf
--------------------------------------
mawk1.3.4 : inf
mawk1.9.9.6 : nan
--------------------------------------
nawk20200816 : 0

Hope someone finds utility in this.
The 4Chan Teller

Kpop 2GM

unread,
Feb 23, 2022, 5:22:35 AM2/23/22
to
finally succeeded in creating a grand unified detector function.

Now it could detect 14 unique combinations of awk-variants and invocation flags, including 11 different ways of calling gawk, and return a 2-digit value that maps to the data dictionary listed beneath it.

The detection methodology is entirely based on inherent behavior of each awk combo, instead of relying on external variables or system settings that could be tricked via manual override. It also has built-in cleansing for the local variables in case values were passed during the function call.

Unicode detection is based on UTF-8 encoding of U+06D2, a code point that has been part of the Unicode spec since version 1.1 in 1993. On top of obvious ones like ASCII, C, and POSIX, it works when locale is UTF-8 encoding of any language.

UCS2, UTF-7, UTF-16/32 either endian, EBCDIC ARMSCII, ISCII, ISO8859-anything, CP-anything, or the legacy multi-byte ones like Big5, GB-anything, EUC-JP/KR/CN, Shift-JIS etc, most likely would lead to erroneous detection, since it was only designed with UTF-8 in mind.

Feel free to use it as is or modify it in any shape or form to customize to your needs. While UTF-8 capability detection is in place, and designed to be compliant with published specs, it does not include any copyrighted material or intellectual property pertaining to Unicode Consortium.

The 4Chan Teller

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

command is ::: $awk0 'BEGIN {
print awk_var_tester()

} function awk_var_tester(_, __, ___, ____, _____, ______) {
return \
(substr(______ = "\\%" (___ = _ < "") index((__ *= __ ^= __ = ___ += ++___) ___, ___) sprintf("%c_%%c", __ * (___ ^ ___++ - +-___--) - _ ^ (! __)), __ * __, ! __) substr((______ = ((((_ = (substr(___, __, (sub("..$", (______) "&", ______)) - (sub("_", "", ______)))) substr(______ = sprintf(______, (____ = (_____ = ____ = substr(__, ___) ^ ++___ * __ ^ --___) + (___ + --__) * (__++ - ___) - ___) % (__ ^ ___), (_____ += (__ - ___ ^ ___) ^ ___ + ___) % (__ ^ ___), ____, _____), index(______, "_") + (___ < __)) substr(sub("_.+$", "", ______), __, ! __)) ~ substr(______, __ ~ __, ___ ^ ___)) + (_ ~ ("[^" (______) "]")) * ___ + (_ ~ ("[^" (_) "]")) * (___ ^ ___) + (sprintf("%i", __ ^ __ ^ ___ ^ (-! ! __)) % ___) * (__ + __ + __ + __) + (sprintf("%c", -___) == sprintf("%c", ! ___)) * (__ + __) + ("<" < "\x3C") * (__ * __ * ___) + (length(_) % ___) * __ + ((__ ^ __ - ! ! __) % ___) * (__ / ___) + ("0x" (___) (! _ ! _)) + 0x101) % (int((_ = (! ! _ ! _) ^ ++___) / --___) -+-++__)) % (_ / ___)) + --__ * (______ < -___ + __) + (______ + ___ == __ ? -___ : ______ ~ ((__ - ___) "$") ? __ * (___ + ___) - ___ : ! ___) + _, ++___)) } '

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gawk -e .. |- 06 gawk -ne … |- 01 gawk -nMbe |- 93
gawk -be . |- 90 gawk -nbe .|- 85 mawk1 — ... |- 29

gawk -ce . |- 49 gawk -Me .|- 76 mawk2 — ….|- 21
gawk -cbe |- 33 gawk -Mbe |- 98 nawk — ……. |- 12
gawk -Pe . |- 39 gawk -nMe |- 09
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Janis Papanagnou

unread,
Feb 23, 2022, 8:07:17 AM2/23/22
to
Amazing! :-)

On 23.02.2022 11:22, Kpop 2GM wrote:
> finally succeeded in creating a grand unified detector function.
>
> Now it could detect 14 unique combinations of awk-variants and
> invocation flags, including 11 different ways of calling gawk, and
> return a 2-digit value that maps to the data dictionary listed
> beneath it.
>
> The detection methodology is entirely based on inherent behavior of
> each awk combo, instead of relying on external variables or system
> settings that could be tricked via manual override. It also has
> built-in cleansing for the local variables in case values were passed
> during the function call.
>
> [...]
0 new messages