Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

GAWK API Question #3

79 views
Skip to first unread message

Kenny McCormack

unread,
Jan 18, 2015, 10:32:30 AM1/18/15
to
Here's an example of a specific question that could/should be better
covered in the available documentation. It seems to me that one couldn't
possibly know (or deduce) the answer to this question without being
intimately familiar with the GAWK source code. It further seems to me that
it *should* be possible to develop extension libs for GAWK w/o being
intimately familiar with the GAWK source code.

The question is: What exactly does "make_const_string" do? In particular,
does it do "garbage collection" - i.e., does it take care of free'ing
strings when they are no longer in use? Otherwise, it seems to me that if
you use this function repeatedly (as one will do), you will create a memory
leak.

--
Debating creationists on the topic of evolution is rather like trying to
play chess with a pigeon --- it knocks the pieces over, craps on the
board, and flies back to its flock to claim victory.
Message has been deleted

Andrew Schorr

unread,
Jan 20, 2015, 8:14:52 PM1/20/15
to
On Sunday, January 18, 2015 at 10:32:30 AM UTC-5, Kenny McCormack wrote:
> The question is: What exactly does "make_const_string" do? In particular,
> does it do "garbage collection" - i.e., does it take care of free'ing
> strings when they are no longer in use? Otherwise, it seems to me that if
> you use this function repeatedly (as one will do), you will create a memory
> leak.

The make_const_string function is no more complicated than its description in section 16.4.5:

`static inline awk_value_t *'
`make_const_string(const char *string, size_t length, awk_value_t *result)'
This function creates a string value in the `awk_value_t' variable
pointed to by `result'. It expects `string' to be a C string
constant (or other string data), and automatically creates a
_copy_ of the data for storage in `result'. It returns `result'.

It has no magical garbage collection powers.

I agree that the memory management scheme may be a bit confusing. Please take a look at section 16.4.1 (the API introduction) where it says:

* All pointers filled in by `gawk' are to memory managed by `gawk'
and should be treated by the extension as read-only. Memory for
_all_ strings passed into `gawk' from the extension _must_ come
from calling the API-provided function pointers `api_malloc()',
`api_calloc()' or `api_realloc()', and is managed by `gawk' from
then on.

So there's a clear paradigm specified here for how memory will be managed. If you pass allocated memory into gawk, gawk will take control of it from them on. Perhaps this is not sufficiently emphasized in the documentation, but it is stated. Actually, now that I look, the newest (unreleased) version of the documentation has an "Extension summary" section that also includes this language:

* _All_ memory passed from `gawk' to an extension must be treated as
read-only by the extension.

* _All_ memory passed from an extension to `gawk' must come from the
API's memory allocation functions. `gawk' takes responsibility for
the memory and releases it when appropriate.

So help is on the way. Arnold has made a number of recent improvements to the documentation.

If you're curious about whether there are any memory leaks and happen to be using linux, valgrind can be very helpful for catching them. I sometimes use that to confirm that I haven't made any mistakes.

Regards,
Andy

Kenny McCormack

unread,
Jan 20, 2015, 8:35:22 PM1/20/15
to
In article <40ad3638-3a50-4210...@googlegroups.com>,
Andrew Schorr <asc...@telemetry-investments.com> wrote:
>On Sunday, January 18, 2015 at 10:32:30 AM UTC-5, Kenny McCormack wrote:
>> The question is: What exactly does "make_const_string" do? In particular,
>> does it do "garbage collection" - i.e., does it take care of free'ing
>> strings when they are no longer in use? Otherwise, it seems to me that if
>> you use this function repeatedly (as one will do), you will create a memory
>> leak.
>
>The make_const_string function is no more complicated than its description in
>section 16.4.5:
>
>`static inline awk_value_t *'
>`make_const_string(const char *string, size_t length, awk_value_t *result)'
> This function creates a string value in the `awk_value_t' variable
> pointed to by `result'. It expects `string' to be a C string
> constant (or other string data), and automatically creates a
> _copy_ of the data for storage in `result'. It returns `result'.
>
>It has no magical garbage collection powers.

OK, so these questions come to mind:

1) How do I free it? Shouldn't I, as a good citizen, be concerned with this?
(Obviously, the amounts of memory I'm dealing with are so small, that
one could easily take the view that it doesn't matter, but ...)

2) What about if I need to create a "AWK string" - that is, one that
contains a null character. It is legal (in GAWK) to have a an
array index which is a string containing an embedded null. How can
I access that element in an extension?

...
>So help is on the way. Arnold has made a number of recent improvements to the
>documentation.

That's good news!

>If you're curious about whether there are any memory leaks and happen to be using
>linux, valgrind can be very helpful for catching them. I sometimes use that to
>confirm that I haven't made any mistakes.

I'll try that. It seems to me that if there is no garbage collection, then
there will be leaks. Am I missing something?

--
> No, I haven't, that's why I'm asking questions. If you won't help me,
> why don't you just go find your lost manhood elsewhere.

CLC in a nutshell.

Andrew Schorr

unread,
Jan 21, 2015, 9:46:33 PM1/21/15
to
On Tuesday, January 20, 2015 at 8:35:22 PM UTC-5, Kenny McCormack wrote:
> 1) How do I free it? Shouldn't I, as a good citizen, be concerned with this?
> (Obviously, the amounts of memory I'm dealing with are so small, that
> one could easily take the view that it doesn't matter, but ...)

The idea is that the only reason you would call this function is in order
to create an awk_value_t object that will subsequently be passed into gawk
via the API. From that point on, it is gawk's job to manage the memory.
So you should not need to free it.

> 2) What about if I need to create a "AWK string" - that is, one that
> contains a null character. It is legal (in GAWK) to have a an
> array index which is a string containing an embedded null. How can
> I access that element in an extension?

Please note that the 2nd argument to the function is the length of the string.
So what prevents you from passing in a string with an embedded NUL character?

> I'll try that. It seems to me that if there is no garbage collection, then
> there will be leaks. Am I missing something?

There should not be leaks if you use the API properly. If you allocate memory
and then pass it to gawk, gawk takes ownership of the memory. If you do not
pass it to gawk, then you must free it.

Regards,
Andy

Kenny McCormack

unread,
Jan 22, 2015, 4:59:19 AM1/22/15
to
In article <4534c3b4-493b-4f08...@googlegroups.com>,
Andrew Schorr <asc...@telemetry-investments.com> wrote:
>On Tuesday, January 20, 2015 at 8:35:22 PM UTC-5, Kenny McCormack wrote:
>> 1) How do I free it? Shouldn't I, as a good citizen, be concerned with this?
>> (Obviously, the amounts of memory I'm dealing with are so small, that
>> one could easily take the view that it doesn't matter, but ...)
>
>The idea is that the only reason you would call this function is in order
>to create an awk_value_t object that will subsequently be passed into gawk
>via the API. From that point on, it is gawk's job to manage the memory.
>So you should not need to free it.

OK. This sounds to me like a roundabout way of saying "Yes, gawk will
garbage collect it.". Which is what I assumed all along.

>> 2) What about if I need to create a "AWK string" - that is, one that
>> contains a null character. It is legal (in GAWK) to have a an
>> array index which is a string containing an embedded null. How can
>> I access that element in an extension?
>
>Please note that the 2nd argument to the function is the length of the string.
>So what prevents you from passing in a string with an embedded NUL character?

True. Again, as I say (in thread #2), it's all there, but in Unix man
page-ese. Here's a documentation nitpick: The doc says that it takes a "C
string", and, as the good folks in comp.lang.c will tell you, a "C string",
by definition, cannot contain a null character. This is actually what
threw me; I assumed it really meant "a C string".

Specifically, the doc says:

This function creates a string value in the awk_value_t variable pointed to
by result. It expects string to be a C string constant (or other string
data), and automatically creates a copy of the data for storage in result.
It returns result.

>There should not be leaks if you use the API properly. If you allocate memory
>and then pass it to gawk, gawk takes ownership of the memory. If you do not
>pass it to gawk, then you must free it.

What would be an example of using the API improperly? How can one go astray?

--

"This ain't my first time at the rodeo"

is a line from the movie, Mommie Dearest, said by Joan Crawford at a board meeting.

Andrew Schorr

unread,
Jan 22, 2015, 8:20:31 PM1/22/15
to
On Thursday, January 22, 2015 at 4:59:19 AM UTC-5, Kenny McCormack wrote:
> This function creates a string value in the awk_value_t variable pointed to
> by result. It expects string to be a C string constant (or other string
> data), and automatically creates a copy of the data for storage in result.
> It returns result.

I think most people are not likely to use strings with embedded NUL characters, so I'm not sure this is a big shortcoming in the docs. What's your usage case for strings with NULs in them?

> What would be an example of using the API improperly? How can one go astray?

I can't really answer that one. I spend my time trying to write good code, not bad code.

Regards,
Andy

Kenny McCormack

unread,
Jan 23, 2015, 7:22:02 AM1/23/15
to
In article <ace5691e-5002-4af9...@googlegroups.com>,
Andrew Schorr <asc...@telemetry-investments.com> wrote:
>On Thursday, January 22, 2015 at 4:59:19 AM UTC-5, Kenny McCormack wrote:
>> This function creates a string value in the awk_value_t variable pointed to
>> by result. It expects string to be a C string constant (or other string
>> data), and automatically creates a copy of the data for storage in result.
>> It returns result.
>
>I think most people are not likely to use strings with embedded NUL
>characters, so I'm not sure this is a big shortcoming in the docs. What's
>your usage case for strings with NULs in them?

It's not. This is theoretical.
And it is obviously not that big of a deal; just pointing out one of the
pitfalls for the unwary.

A couple of other notes/observations:
1) The ability to have nulls - i.e., any character - in strings (and
thus to support using them to carry "binary" data) is one of the
great strengths of gawk (as compared to traditional/vendor-supplied
AWKs)
2) It is not at all unreasonable (according to the Kenny Gold Standard
test) to read the text literally when it says "It expects the
strings to be a C string".

>> What would be an example of using the API improperly? How can one go
>> astray?
>
>I can't really answer that one. I spend my time trying to write good
>code, not bad code.

Kind of a snarky response.

Anyway, it is always useful in programming to know what traps to avoid.
And your text (shown in previous posts, but now clipped) implies that there
*are* ways to get in trouble. It is not unreasonable to be curious about
what those ways are.

--
Watching ConservaLoons playing with statistics and facts is like watching a
newborn play with a computer. Endlessly amusing, but totally unproductive.

Kaz Kylheku

unread,
Jan 23, 2015, 6:38:28 PM1/23/15
to
On 2015-01-23, Andrew Schorr <asc...@telemetry-investments.com> wrote:
> On Thursday, January 22, 2015 at 4:59:19 AM UTC-5, Kenny McCormack wrote:
>> This function creates a string value in the awk_value_t variable pointed to
>> by result. It expects string to be a C string constant (or other string
>> data), and automatically creates a copy of the data for storage in result.
>> It returns result.
>
> I think most people are not likely to use strings with embedded NUL
> characters, so I'm not sure this is a big shortcoming in the docs. What's
> your usage case for strings with NULs in them?

The mere mention of "C string" in a non-C newsgroup is not automatically a push
button for the embedded NUL debate. Sometimes it's just a statement of fact.
the function is a C function; it works with C strings.

Andrew Schorr

unread,
Jan 24, 2015, 4:12:30 PM1/24/15
to
On Friday, January 23, 2015 at 7:22:02 AM UTC-5, Kenny McCormack wrote:
> >I think most people are not likely to use strings with embedded NUL
> >characters, so I'm not sure this is a big shortcoming in the docs. What's
> >your usage case for strings with NULs in them?
>
> It's not. This is theoretical.
> And it is obviously not that big of a deal; just pointing out one of the
> pitfalls for the unwary.

I think it should be fine to use a string with an embedded NUL. The function description says "a C string constant (or other string data)", and I think the "other string data" case is meant to cover the situation where a NUL character might be in there.

> A couple of other notes/observations:
> 1) The ability to have nulls - i.e., any character - in strings (and
> thus to support using them to carry "binary" data) is one of the
> great strengths of gawk (as compared to traditional/vendor-supplied
> AWKs)

Agreed.

> 2) It is not at all unreasonable (according to the Kenny Gold Standard
> test) to read the text literally when it says "It expects the
> strings to be a C string".

Yes, but please note the "(or other string data)" case that I mentioned above. So it is not restricted to "C string" data.

> >> What would be an example of using the API improperly? How can one go
> >> astray?
> >
> >I can't really answer that one. I spend my time trying to write good
> >code, not bad code.
>
> Kind of a snarky response.

Sorry. That was not my intention.

> Anyway, it is always useful in programming to know what traps to avoid.
> And your text (shown in previous posts, but now clipped) implies that there
> *are* ways to get in trouble. It is not unreasonable to be curious about
> what those ways are.

If the concern is primarily memory leaks, then here's a construct that would leak memory:

awk_value_t x;

make_const_string("apple", 5, &x);
sym_update("fruit", make_const_string("pear", 4, &x));

In that case, the malloced string pointer to "apple" would be lost. But I cannot imagine why anybody would do this.

Regards,
Andy

Kenny McCormack

unread,
Jan 25, 2015, 4:26:53 AM1/25/15
to
In article <b71d077e-a36f-4a93...@googlegroups.com>,
Andrew Schorr <asc...@telemetry-investments.com> wrote:
...
>If the concern is primarily memory leaks, then here's a construct that
>would leak memory:
>
>awk_value_t x;
>
>make_const_string("apple", 5, &x);
>sym_update("fruit", make_const_string("pear", 4, &x));
>
>In that case, the malloced string pointer to "apple" would be lost. But I
>cannot imagine why anybody would do this.

Interesting. I'm not sure if the use of "sym_update" is part of the
example or not. But here's the code I use to traverse an array:

for (i=1; i<=count; i++) {
make_const_string( /* Convert i from num to string */, &index);
if (!get_array_element(array, &index, AWK_STRING, &value)) {
printf("dump_array: get_array_element failed\n");
goto the_end;
}
printf("Array Element: %zu is '%s'\n",i,value.STR);
}

So, this makes a string for each value of i in the loop. Does that cause a
memory leak?

--
Those on the right constantly remind us that America is not a
democracy; now they claim that Obama is a threat to democracy.

Andrew Schorr

unread,
Jan 25, 2015, 9:44:56 AM1/25/15
to
On Sunday, January 25, 2015 at 4:26:53 AM UTC-5, Kenny McCormack wrote:
> for (i=1; i<=count; i++) {
> make_const_string( /* Convert i from num to string */, &index);
> if (!get_array_element(array, &index, AWK_STRING, &value)) {
> printf("dump_array: get_array_element failed\n");
> goto the_end;
> }
> printf("Array Element: %zu is '%s'\n",i,value.STR);
> }
>
> So, this makes a string for each value of i in the loop. Does that cause a
> memory leak?

No. There is no leak because you pass "index" into gawk. As the documentation states, gawk takes ownership of all memory passed to it. In this case, the gawk API will free the memory before the function returns.

Regards,
Andy

Kenny McCormack

unread,
Jan 25, 2015, 12:43:23 PM1/25/15
to
In article <3d1a1ce0-0dc2-4a69...@googlegroups.com>,
Ah, I see. The difference is that in your example, the "apple" version of
the awk_value_t object 'x' is never used, it is just created, then x is
re-purposed. That's the key - if the "apple" version had been used in some
way, then it would have been garbage collected as needed.

I think I finally understand what is meant by "passed into gawk".

Again, the Kenny Gold Standard test applies here.

But, yeah, I think we finally got to the point.

--
To most Christians, the Bible is like a software license. Nobody
actually reads it. They just scroll to the bottom and click "I agree."

- author unknown -

Kenny McCormack

unread,
Jan 25, 2015, 2:45:26 PM1/25/15
to
In article <3d1a1ce0-0dc2-4a69...@googlegroups.com>,
Andrew Schorr <asc...@telemetry-investments.com> wrote:
...
>> So, this makes a string for each value of i in the loop. Does that cause a
>> memory leak?
>
>No. There is no leak because you pass "index" into gawk. As the documentation
>states, gawk takes ownership of all memory passed to it. In this case, the gawk
>API will free the memory before the function returns.

Actually, I have another question: What is meant by "the" function?
(As in "before *the* function returns") Which function?

One way of reading this is that when I call, say, get_array_element(),
that function (get_array_element()) actually frees the memory that I have
allocated for my index variable before my program continues. This implies
that as soon as I call get_array_element(), my string is no longer valid.

I.e., I cannot re-use the awk_value_t object after that. That these are
essentially single-use objects; they must be re-setup before each use.
Note: This (assuming this is the correct reading) is not (necessarily) a
bad thing; it is just somewhat counter=intuitive.

Is that correct?

--
A liberal, a moderate, and a conservative walk into a bar...

Bartender says, "Hi, Mitt!"

Andrew Schorr

unread,
Jan 25, 2015, 7:22:22 PM1/25/15
to
On Sunday, January 25, 2015 at 2:45:26 PM UTC-5, Kenny McCormack wrote:
> Actually, I have another question: What is meant by "the" function?
> (As in "before *the* function returns") Which function?

The API function, which, in this case, is get_array_element.

> One way of reading this is that when I call, say, get_array_element(),
> that function (get_array_element()) actually frees the memory that I have
> allocated for my index variable before my program continues. This implies
> that as soon as I call get_array_element(), my string is no longer valid.

Correct.

> I.e., I cannot re-use the awk_value_t object after that. That these are
> essentially single-use objects; they must be re-setup before each use.
> Note: This (assuming this is the correct reading) is not (necessarily) a
> bad thing; it is just somewhat counter=intuitive.
>
> Is that correct?

Yes.

Regards,
Andy

Kenny McCormack

unread,
Jan 25, 2015, 7:49:20 PM1/25/15
to
In article <85d46f58-6ba1-45fa...@googlegroups.com>,
Andrew Schorr <asc...@telemetry-investments.com> wrote:
...
>> I.e., I cannot re-use the awk_value_t object after that. That these are
>> essentially single-use objects; they must be re-setup before each use.
>> Note: This (assuming this is the correct reading) is not (necessarily) a
>> bad thing; it is just somewhat counter=intuitive.
>>
>> Is that correct?
>
>Yes.

Wow. I am impressed.

I guess this thread actually served a purpose after all.

--
"Although written many years ago, Lady Chatterley's Lover has just
been reissued by the Grove Press, and this fictional account of the
day-to-day life of an English gamekeeper is still of considerable
interest to outdoor minded readers, as it contains many passages on
pheasant raising, the apprehending of poachers, ways to control vermin,
and other chores and duties of the professional gamekeeper.

"Unfortunately, one is obliged to wade through many pages of extraneous
material in order to discover and savor these sidelights on the
management of a Midlands shooting estate, and in this reviewer's opinion
this book cannot take the place of J.R. Miller's Practical Gamekeeping"
(Ed Zern, Field and Stream, November 1959, p. 142).

Andrew Schorr

unread,
Jan 25, 2015, 8:48:56 PM1/25/15
to
On Sunday, January 25, 2015 at 7:49:20 PM UTC-5, Kenny McCormack wrote:
> I guess this thread actually served a purpose after all.

Excellent.
0 new messages