Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

asort function

46 views
Skip to first unread message

Laurent MANCHON

unread,
Aug 11, 2021, 10:20:11 AM8/11/21
to
--Hi all,

is asort() function exist in TAWK (or only in gawk) ?

thx

Ed Morton

unread,
Aug 11, 2021, 10:30:24 AM8/11/21
to
I don't know the answer to that but I'm curious about why you're asking.
If you have tawk you can just try calling asort() and don't need to ask
so I assume that's not the case. Given that - are you considering trying
to get a copy of tawk if it has asort() instead of just using gawk? If
so - why?

Ed.

Kenny McCormack

unread,
Aug 11, 2021, 10:32:09 AM8/11/21
to
In article <b0a1d496-2fb2-4e79...@googlegroups.com>,
Laurent MANCHON <manch...@gmail.com> wrote:
>--Hi all,
>
>is asort() function exist in TAWK (or only in gawk) ?

Not in TAWK, but not really needed (see below).

However, I'm wondering what the point of the question is. Either you have
access to TAWK - in which case you would know and/or could quickly
determine the answer to the question - or you don't - in which case, the
question is moot. So, which is it?

Anyway, I don't really see the point of asort(), once you have regular
array sorting (see footnote: *) - which TAWK has always had, and GAWK now
has. GAWK's implementation of array sorting is actually quite nice - in
fact, more elaborate and powerful than TAWK's. My sense of the GAWK
development effort is that they put asort()/asorti() in at a point in the
development when they realized that some sort of array sorting was needed,
but they weren't quite ready to do full/regular array sorting.

So, I think asort()/asorti() is now retained mostly for historical reasons.

(*) By "regular" (or "full") array sorting, I mean via the "for (i in A) ..."
syntax.

--
Never, ever, ever forget that "Both sides do it" is strictly a Republican meme.

It is always the side that sucks that insists on saying "Well, you suck, too".

Message has been deleted
Message has been deleted

Laurent MANCHON

unread,
Aug 11, 2021, 12:46:06 PM8/11/21
to
I expressed badly myself, I wanted to know if there is a function similar to asort(gawk) in TAWK ?
As I have to calculate medians I have to sort the values of the table in increasing order and then compute medians on it.
Hence the need for the asort function. I know I can write my own sort function such as:

function alength(A, n, val) {
n=0;
for (val in A) n++
return n;
}

function masort(A, hold, i, j, n) {
n = alength(A);
for (i = 2; i <= n ; i++) {
hold = A[j = i];
while (A[j-1] > hold) {
j--;
A[j+1] = A[j];
}
A[j] = hold;
}
delete A[0];
return n;
}

But I think that an embedded function is faster than a function created in the header, don't you ?
For exemple in Tawk if i want to compute the length of an array i think *_arr is faster than alength(_arr)

Ed Morton

unread,
Aug 11, 2021, 1:02:23 PM8/11/21
to
Sure but why bother trying to find the equivalent of a gawk function in
tawk (available by word of mouth from individuals with a copy of it,
with people mailing photocopies of documentation to each other and a
small user base) instead of just using gawk (widely/easily available,
thoroughly documented online and in books, with a massive user base)?

Ed.

Kenny McCormack

unread,
Aug 11, 2021, 1:31:26 PM8/11/21
to
In article <6a38ebc2-b590-4630...@googlegroups.com>,
Laurent MANCHON <manch...@gmail.com> wrote:
>I expressed badly myself, I wanted to know if there is a function similar to
>asort(gawk) in TAWK ?

OK, now I get it. The keyword is "similar". Without that word, it sounded
like you wanted to know if there was literally a function called "asort" in
TAWK. We have, correctly, asserted that it would have been easier to just
test it, then to post to Usenet.

But you want to know how to sort arrays in TAWK. That's the real point
that you are getting at.

The first answer I can give is: No, there is no library function to do it,
such as asort() in GAWK. But as I argue, there doesn't need to be, and
asort() in GAWK is basically an anachronism at this point in time.

To sort arrays in TAWK (and in current/modern versions of GAWK as well),
you build up your array with the keys (indices) being in the order you want
them, then you use: for (i in A) ...
to iterate through the array in the desired order.

I hope this answers your question.

The details are a little different between TAWK and GAWK, but the
underlying idea is pretty much the same.

--
The difference between communism and capitalism?
In capitalism, man exploits man. In communism, it's the other way around.

- Daniel Bell, The End of Ideology (1960) -

Laurent MANCHON

unread,
Aug 11, 2021, 3:16:06 PM8/11/21
to
not really answered.
Try to calculate the median of the elements of an array and you will understand what I am asking.
You need to sort not the indice of the array but elements.


Ben Bacarisse

unread,
Aug 11, 2021, 5:47:21 PM8/11/21
to
Laurent MANCHON <manch...@gmail.com> writes:

> As I have to calculate medians I have to sort the values of the table
> in increasing order and then compute medians on it.

Technically no. There is an O(1), non-sorting median algorithm, but
it's a bit messy and sorting is so well-understood you are probably
better off doing what you are doing.

--
Ben.

J Naman

unread,
Aug 11, 2021, 11:19:04 PM8/11/21
to
>On Wednesday, 11 August 2021 at 13:02:23 UTC-4, Ed Morton wrote:
> Sure but why bother trying to find the equivalent of a gawk function in
> tawk (available by word of mouth from individuals with a copy of it,
> with people mailing photocopies of documentation to each other and a
> small user base) instead of just using gawk (widely/easily available,
> thoroughly documented online and in books, with a massive user base)?
>
> Ed.

If anyone who REGULARLY CONTRIBUTES to the Gawk community (lang, help) would like my original, NOT A COPY, TAWK Compiler Ver 5.01c, I'll be happy to donate it. I have the original manual, spiral bound, and four 3.5" diskettes for Win 3.1, NT/95, Dos 32-bit, and, drum roll, OS/2. In the original box ... Also have Ver 4 & bound manual-- who would want that? I loved it 20+ years ago, but am firmly Gnu awk now.

Laurent MANCHON

unread,
Aug 12, 2021, 3:02:30 AM8/12/21
to
on unix machine i don't like gawk i prefer mawk which is faster than gawk.
and on windows, compiled program with Tawk v6.7 are faster than gawk.

Janis Papanagnou

unread,
Aug 12, 2021, 3:25:36 AM8/12/21
to
On 12.08.2021 09:02, Laurent MANCHON wrote:
> on unix machine i don't like gawk i prefer mawk which is faster than gawk.

Is that true? - I know of some performance tests (done by Andrew Sumner
20+ years ago) where that was actually not the case - some test cases
were faster, some slower -, and since then a lot of optimizations have
been done in GNU Awk (including byte code support).

If you have some test cases I'd be interested to see actual numbers.

> and on windows, compiled program with Tawk v6.7 are faster than gawk.

If speed is a critical issue you may also try awka, an Awk compiler.

Janis

Laurent MANCHON

unread,
Aug 12, 2021, 4:10:30 AM8/12/21
to
I think Awka has been discontinued for a long time now (http://awka.sourceforge.net/download.html),
and not sure if it works with the latest versions of gcc.

Kenny McCormack

unread,
Aug 12, 2021, 6:13:34 AM8/12/21
to
In article <sf2ide$ab7$1...@news-1.m-online.net>,
Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>On 12.08.2021 09:02, Laurent MANCHON wrote:
>> on unix machine i don't like gawk i prefer mawk which is faster than gawk.
>
>Is that true? - I know of some performance tests (done by Andrew Sumner
>20+ years ago) where that was actually not the case - some test cases
>were faster, some slower -, and since then a lot of optimizations have
>been done in GNU Awk (including byte code support).

Historically, it is (has been) definitely true. Historically, mawk was
always considered very fast, and GAWK was originally designed to be
feature-rich and not have limits (which are common attributes/goals of GNU
software) at the expense of being big and not particularly efficient.
Note, incidentally, that bash also fits this profile. I like bash for its
many nice features, but its own man page says that it is too big and too
slow.

However, this situation may have changed over the years. As you say,
effort has gone into making GAWK more runtime efficient.

>> and on windows, compiled program with Tawk v6.7 are faster than gawk.

1) It is unlikely that speed really is an issue. Most people who think it
is (in pretty much all contexts), turn out to be misguided. If you want
efficiency, writing in AWK is probably not what you should be doing in the
first place.

But, that said, it is true (and yes, I am sort of contradicting myself),
TAWK is very very efficient and fast. This is a good reason to use TAWK,
if you can. I think it is indisputable that TAWK is the best/fastest
significant AWK implementation.

>If speed is a critical issue you may also try awka, an Awk compiler.

I don't think awka - or any other so-called "awk compiler" - makes any
claims to making your program run faster. Aren't they all just for
encryption (aka, code security) purposes?

BTW, all this talk by you and your c.l.a friend which are of the strain
"Why don't you just use GAWK like we do?" are misguided. If the OP has and
is using TAWK, he should continue to do so.

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/DanaC

Kenny McCormack

unread,
Aug 12, 2021, 6:16:41 AM8/12/21
to
In article <261371b2-4ffd-4cf6...@googlegroups.com>,
Laurent MANCHON <manch...@gmail.com> wrote:
>on unix machine i don't like gawk i prefer mawk which is faster than gawk.
>and on windows, compiled program with Tawk v6.7 are faster than gawk.

This. I certainly think that if you have TAWK and are using it, you should
continue to use it. It is clearly the best and the fastest AWK
implementation.

Ignore all the "But you should be using GAWK, because we say so" nonsense
that you are seeing on this forum.

--
"We should always be disposed to believe that which appears to us to be
white is really black, if the hierarchy of the church so decides."

- Saint Ignatius Loyola (1491-1556) Founder of the Jesuit Order -

Janis Papanagnou

unread,
Aug 12, 2021, 6:43:20 AM8/12/21
to
[ please quote context if posting in Usenet ]

On 12.08.2021 10:10, Laurent MANCHON wrote:
> I think Awka has been discontinued for a long time now (http://awka.sourceforge.net/download.html),
> and not sure if it works with the latest versions of gcc.

It's discontinued, yes. (And I haven't tried to compile it with
the latest gcc.)

But isn't Tawk - that you use on Windows - also discontinued?
(So I've heard, at least, since many years.)

Janis

Janis Papanagnou

unread,
Aug 12, 2021, 6:56:38 AM8/12/21
to
On 12.08.2021 12:13, Kenny McCormack wrote:
> In article <sf2ide$ab7$1...@news-1.m-online.net>,
> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>> On 12.08.2021 09:02, Laurent MANCHON wrote:
>
>>> and on windows, compiled program with Tawk v6.7 are faster than gawk.
>
> 1) It is unlikely that speed really is an issue. Most people who think it
> is (in pretty much all contexts), turn out to be misguided. If you want
> efficiency, writing in AWK is probably not what you should be doing in the
> first place.

Or applying algorithms with better complexity. (C.f. for example Ben's
hint on an O(N) algorithm, as opposed to an O(N log N) or even an O(N^2)
algorithm like the one the OP posted as workaround.)


>> If speed is a critical issue you may also try awka, an Awk compiler.
>
> I don't think awka - or any other so-called "awk compiler" - makes any
> claims to making your program run faster. Aren't they all just for
> encryption (aka, code security) purposes?

Don't think so. The performance reference hint I gave was from A. Sumner
(the author of awka) and you can inspect that all at awka's Sourceforge
page.

>
> BTW, all this talk by you and your c.l.a friend which are of the strain
> "Why don't you just use GAWK like we do?" are misguided.

You have some misconception here; the two persons who suggested GNU Awk
in this thread were Ed and you.

I mentioned the performance results and pointed out the optimizations
that happened in GNU Awk during the past 20+ years since the performance
tests. (Even those old tests had a comment that it might be outdated by
the actual awk releases tested.)

But the OP's argument is anyway strange, WRT speed, and also WRT using
discontinued software, and with his assumption that Ed and you are not
really aware what median-calculation would require, so it's not really
worth engaging more here in this thread.

Janis

Kenny McCormack

unread,
Aug 12, 2021, 7:09:57 AM8/12/21
to
In article <sf2up4$do5$1...@news-1.m-online.net>,
Janis Papanagnou <janis_pa...@hotmail.com> wrote:
...
>You have some misconception here; the two persons who suggested GNU Awk
>in this thread were Ed and you.

Really? I have repeatedly said "If you are using TAWK and are happy with
it, you should stick with it." I don't recall ever recommending he switch
to GAWK.

Eddie has, of course, certainly done so.

--

"This ain't my first time at the rodeo"

is a line from the movie, Mommie Dearest, said by Joan Crawford at a board meeting.

Kenny McCormack

unread,
Aug 12, 2021, 7:26:00 AM8/12/21
to
In article <sf2u06$dj4$1...@news-1.m-online.net>,
Yes, but it doesn't matter (in the case of TAWK).

Yes, I know that one of the first commandments of using software is that
you can't use software that isn't being maintained. Your PHB will can your
ass!

And it looks like AWKa fits the mold. Since AWKs is basically a shim
between GAWK and GCC, you'd have to verify that it works with the current
versions of both of those pieces of software. Since it is not being
maintained, it almost certainly isn't compatible with one or both of them.

TAWK is different, though. Since it is:
1) (almost) Perfect
and
2) Entirely standalone
the fact that it is not being maintained is irrelevant.

BTW, I said (almost) above because there is one area where I prefer GAWK.
That is when dealing with files with very long lines - in my work, this
involves lines of several hundred thousands of bytes. TAWK fails badly if
your input lines are too long - and I say "fails badly" because it doesn't
generate error messages; it just generates incorrect results.

There are workarounds, but it is a PIA - and, of course, you have to notice
the incorrect results (and convince yourself that the bug is not in *your*
code) in order to know to deploy the workarounds.

BTW, I don't use TAWK much anymore, because I don't use Windows much
anymore, but when I do use Windows, I tend to use (Cygwin) GAWK, because:
1) The line length problem mentioned above.
2) Compatibility. I can develop on Linux and deploy on Windows.

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/IceCream

Laurent MANCHON

unread,
Aug 12, 2021, 8:13:19 AM8/12/21
to
--
i don't work with very long lines, but i work with very big text files with million of rows.
In in my opinion the main drawback of all awk's family is the string concatenation, it takes too much time,
and this is what I have noted.
Maybe it's common to all languages, I don't know how C handles it.


Janis Papanagnou

unread,
Aug 12, 2021, 8:24:41 AM8/12/21
to
On 12.08.2021 14:13, Laurent MANCHON wrote:
> --
> i don't work with very long lines, but i work with very big text files with million of rows.
> In in my opinion the main drawback of all awk's family is the string concatenation, it takes too much time,
> and this is what I have noted.

Do you mean arbitrary string value concatenations, or adding strings
to an existing string? The latter, i.e. x = x a b c ..., has in GNU
Awk an optimization that makes it very fast.

Janis

Laurent MANCHON

unread,
Aug 12, 2021, 8:31:43 AM8/12/21
to
typically this kind of concatenation:
...
if(!(list[i])){list[i]=array[i,j];}
else{list[i]=list[i] SUBSEP array[i,j];}

Kenny McCormack

unread,
Aug 12, 2021, 10:54:17 AM8/12/21
to
In article <a7bfa7f2-27c8-4d67...@googlegroups.com>,
Well, there's your problem, right there.

If you are using either TAWK or GAWK (which you clearly are), then you
should not be using the old-fashioned SUBSEP-based pseudo-multi-dimensional
arrays. Use real, true multi-dimensional arrays - like a big boy!

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/DanQuayle

Laurent MANCHON

unread,
Aug 12, 2021, 11:04:05 AM8/12/21
to
--
Sure Tawk can do that ?
--

Kenny McCormack

unread,
Aug 12, 2021, 12:23:18 PM8/12/21
to
In article <2ecbd17e-24b0-427a...@googlegroups.com>,
yes.

--
To most Christians, the Bible is like a software license. Nobody
actually reads it. They just scroll to the bottom and click "I agree."

- author unknown -

Laurent MANCHON

unread,
Aug 12, 2021, 12:29:24 PM8/12/21
to
--
i confirm, it works i have changed array[i,j] to array[i][j]
and the concatenation step to:

list[i]=list[i] SUBSEP array[i][j];

Kenny McCormack

unread,
Aug 12, 2021, 1:10:01 PM8/12/21
to
In article <24b17405-ad5d-46e9...@googlegroups.com>,
And then what?

I don't know if you've ever really explained what it is you're doing, but
let me just say that whever I see "SUBSEP" in user code, I assume that the
programmer is trying to create a subscript in a pseudo-multi-dimensional
array. If that is the case for you and your code, then I'd argue that you
shouldn't be doing that, since you could be using real-multi-dimensional
arrays instead.

However, sometimes programmers use SUBSEP as a quick-and-dirty separator
for data, arguing that it is more or less guaranteed to never occur in
data, having nothing to do with pseudo-multi-dimensional arrays. This
usually works but is confusing to the reader of the code, since it looks
like array stuff.

Personally, I usually end up using ^A when I need a fake data separator.

Is there any particular reason why you are using SUBSEP in the code above?

--
Mike Huckabee has yet to consciously uncouple from Josh Duggar.

Laurent MANCHON

unread,
Aug 12, 2021, 3:01:51 PM8/12/21
to
--
absolutely, I use SUBSEP as a concatenation separator.
I could have declared SUBSEP=";" in the begin statement for more security.

Ed Morton

unread,
Aug 13, 2021, 12:06:52 PM8/13/21
to
On 8/12/2021 7:13 AM, Laurent MANCHON wrote:

[When posting please include enough context that your post makes sense
stand-alone - this is usenet, not a forum]
C handles it by requiring you to allocate enough memory up front when
you declare your variables for the maximum that might be required for
that variable. So in C you do something like (pseudo-code):

char x[50] # allocate 50 chars space in memory for x.
x = "foo" # the first 3 chars of x are populated with "foo".
x = x "bar" # the first 6 chars of x are populated with "foobar".

since x already has enough memory to add "bar" to the end but you don't
declare variables in awk so the equivalent is:

x = "foo" # allocate 3 chars space in memory for x and
# populate it with "foo".
x = x "bar" # allocate 6 chars space in memory for x, populate it
# with "foobar", and change the reference for "x" to
# point to the new memory location if required.

It's probably not exactly that simple and I'm sure gawk at least has
some optimizations for it but that gives you idea of why awk has more
work to do than C when concatenating a string to an existing variable -
static memory allocation for it in C vs dynamic memory allocation in awk.

Ed.

Ed Morton

unread,
Aug 13, 2021, 12:09:03 PM8/13/21
to
Please include enough context in your posts so those of us trying to
follow them on usenet know what they're about. See the posts from
everyone else in this thread for examples of that.

Ed.

Laurent MANCHON

unread,
Aug 13, 2021, 12:24:57 PM8/13/21
to
i think i understood.
in C you have to anticipate the size of the memory you will need. You don't know in advance how many concatenations you have to do.
0 new messages