Understanding Tcl memory management of list created in C

Gertjan

unread,

Oct 24, 2022, 3:34:52 PM10/24/22

to

Hello,

I spent a painful weekend trying to make more sensible Tcl list from data we generate in a C extension and to understand whether or not I am creating a memory leak. The return to the user is a List of List - pairs of time and ADC data in fact.

I could replicate the actual code but I think the pseudo version will do. Its typical I think.

INT32 GetMyData (ClientData clientdata, Tcl_Interp *interp, INT32 argc, const CHAR *argv[])

Loop over N data elements:
Tcl_Obj *pSubList = Tcl_NewListObj(0,NULL);

Now put Time and a Float into pSubList using two calls to
TclListObjAppendElement(interp, pSubList,....);

//copy the sublist as a single element to the output list:
Tcl_ListObjAppendElement(interp, pOutList, pSubList);
EndLoop
Tcl_SetObjResult(interp, pOutList);
return TCL_OK;

Works like a charm, I get { {time1}{data1} {time2}{data2}...

Any C-programmer not terribly familiar with the Tcl API should immediately be concerned about the repeated Tcl_NewListObj(). Who cleans up ?

I made mistaken attempts to decrement the ref counter on pSubList at the end of the loop - resulting in various seg faults.
Having gained more understanding, perhaps some one can judge the following statements:

1. An AppendElement API call doesn't extend the size of the pOutList - it merely make a reference to the pSubList in pOutList. For that reason, the AppendElement call increments the ref counter on pSubList - it is now 'shared' and needs to stay alive.

2. The AppendElement API does *not* increment pOutList ref counter because that list is not shared anywhere. In fact the call will fail it is ever sees pOutList with a non-zero reference counter

3. The Tcl_SetObjResult(interp, pOutList); increments the Ref counter of pOutlist because the interpreter needs these data and it should not be cleaned up.

Now for the final statement. Lets say I call my C function in a loop:

While { 1 } {
set MyData [GetMyData ]
}

4. The Tcl interpreters will recursively Free the memory for ALL the lists objects created in a previous call to GetMyData upon executing the next one when $MyData is over-written. Right ? Otherwise we have a memory leak.

Let me know if (1)..(4) are correct. The bottom line is that I dont really need to worry about the ref counts of "temporary" lists (they aren't temporary) or what happens with my pOutList.

Much appreciated

Gertjan

Rich

unread,

Oct 24, 2022, 3:53:57 PM10/24/22

to

Gertjan <gho...@gmail.com> wrote:
> I could replicate the actual code but I think the pseudo version will
> do. Its typical I think.
>
> INT32 GetMyData (ClientData clientdata, Tcl_Interp *interp, INT32 argc, const CHAR *argv[])
>
> Loop over N data elements:
> Tcl_Obj *pSubList = Tcl_NewListObj(0,NULL);
>
> Now put Time and a Float into pSubList using two calls to
> TclListObjAppendElement(interp, pSubList,....);
>
> //copy the sublist as a single element to the output list:
> Tcl_ListObjAppendElement(interp, pOutList, pSubList);
> EndLoop
> Tcl_SetObjResult(interp, pOutList);
> return TCL_OK;
>
> Works like a charm, I get { {time1}{data1} {time2}{data2}...
>
> Any C-programmer not terribly familiar with the Tcl API should
> immediately be concerned about the repeated Tcl_NewListObj(). Who
> cleans up ?

Provided the extension writer does not do something funky with the ref
counter, the Tcl runtime cleans up when the last reference to the list
disappears.

> Having gained more understanding, perhaps some one can judge the
> following statements:
>
> 1. An AppendElement API call doesn't extend the size of the pOutList
> - it merely make a reference to the pSubList in pOutList. For that
> reason, the AppendElement call increments the ref counter on
> pSubList - it is now 'shared' and needs to stay alive.

Well, it does extend the size of pOutList - by one more element.

But yes, the sub-list will have it's refcount incremented as part of
the 'append' process.

> 2. The AppendElement API does *not* increment pOutList ref counter
> because that list is not shared anywhere. In fact the call will fail
> it is ever sees pOutList with a non-zero reference counter

It should not do so, because it is not creating a new reference to the
outer list container, so no reason to increment the ref counter of the
outer list.

> 3. The Tcl_SetObjResult(interp, pOutList); increments the Ref
> counter of pOutlist because the interpreter needs these data and it
> should not be cleaned up.

Yes, and at the same time, the Tcl runtime takes over 'ownership' and
tracking to determine when to free() the allocated memory.

> Now for the final statement. Lets say I call my C function in a
> loop:
>
> While { 1 } {
> set MyData [GetMyData ]
> }
>
> 4. The Tcl interpreters will recursively Free the memory for ALL the
> lists objects created in a previous call to GetMyData upon executing
> the next one when $MyData is over-written. Right ? Otherwise we
> have a memory leak.

Yes, each time MyData is overwritten, the old data in MyData will first
be freed.

> Let me know if (1)..(4) are correct. The bottom line is that I dont
> really need to worry about the ref counts of "temporary" lists (they
> aren't temporary) or what happens with my pOutList.

From your psudeo code, it looks like you likely don't need to do
anything with the ref counters for things to work out properly and for
Tcl to take over the lifetime management of the resulting list.

Robert Heller

unread,

Oct 24, 2022, 4:30:01 PM10/24/22

to

At Mon, 24 Oct 2022 12:34:50 -0700 (PDT) Gertjan <gho...@gmail.com> wrote:

>
> Hello,
>
> I spent a painful weekend trying to make more sensible Tcl list from data we generate in a C extension and to understand whether or not I am creating a memory leak. The return to the user is a List of List - pairs of time and ADC data in fact.
>
> I could replicate the actual code but I think the pseudo version will do. Its typical I think.
>
> INT32 GetMyData (ClientData clientdata, Tcl_Interp *interp, INT32 argc, const CHAR *argv[])
>
> Loop over N data elements:
> Tcl_Obj *pSubList = Tcl_NewListObj(0,NULL);
>
> Now put Time and a Float into pSubList using two calls to
> TclListObjAppendElement(interp, pSubList,....);
>
> //copy the sublist as a single element to the output list:
> Tcl_ListObjAppendElement(interp, pOutList, pSubList);
> EndLoop
> Tcl_SetObjResult(interp, pOutList);
> return TCL_OK;
>
> Works like a charm, I get { {time1}{data1} {time2}{data2}...
>
> Any C-programmer not terribly familiar with the Tcl API should immediately be concerned about the repeated Tcl_NewListObj(). Who cleans up ?
>
> I made mistaken attempts to decrement the ref counter on pSubList at the end of the loop - resulting in various seg faults.
> Having gained more understanding, perhaps some one can judge the following statements:
>
> 1. An AppendElement API call doesn't extend the size of the pOutList - it
> merely make a reference to the pSubList in pOutList. For that reason, the
> AppendElement call increments the ref counter on pSubList - it is now
> 'shared' and needs to stay alive.

Yes.

>
> 2. The AppendElement API does *not* increment pOutList ref counter because
> that list is not shared anywhere. In fact the call will fail it is ever sees
> pOutList with a non-zero reference counter

Yes.

>
> 3. The Tcl_SetObjResult(interp, pOutList); increments the Ref counter of
> pOutlist because the interpreter needs these data and it should not be
> cleaned up.

Yes.

>
> Now for the final statement. Lets say I call my C function in a loop:
>
> While { 1 } {
> set MyData [GetMyData ]
> }
>
> 4. The Tcl interpreters will recursively Free the memory for ALL the lists
> objects created in a previous call to GetMyData upon executing the next one
> when $MyData is over-written. Right ? Otherwise we have a memory leak.

Yes.

>
> Let me know if (1)..(4) are correct. The bottom line is that I dont really
> need to worry about the ref counts of "temporary" lists (they aren't
> temporary) or what happens with my pOutList.
>
> Much appreciated
>
> Gertjan

I believe you are correct.

Note: if you have C/C++ code that calls Tcl_Alloc for "internal" use, that
code needs to call Tcl_Free (eventually). Tcl_NewObj() (which is called by all
of the Tcl_New<mumble> functions) will take of the ref-count and the various
functions that set, append, etc. TclObjs to variables and such will also
update ref-counts as needed.

>
>
>
>
>
>
>
>
>
>

--
Robert Heller -- Cell: 413-658-7953 GV: 978-633-5364
Deepwoods Software -- Custom Software Services
http://www.deepsoft.com/ -- Linux Administration Services
hel...@deepsoft.com -- Webhosting Services

heinrichmartin

unread,

Oct 24, 2022, 4:39:35 PM10/24/22

to

On Monday, October 24, 2022 at 9:34:52 PM UTC+2, Gertjan wrote:
> Hello,
>
> I spent a painful weekend trying to make more sensible Tcl list from data we generate in a C extension and to understand whether or not I am creating a memory leak. The return to the user is a List of List - pairs of time and ADC data in fact.
>
> I could replicate the actual code but I think the pseudo version will do. Its typical I think.
>
> INT32 GetMyData (ClientData clientdata, Tcl_Interp *interp, INT32 argc, const CHAR *argv[])
>
> Loop over N data elements:
> Tcl_Obj *pSubList = Tcl_NewListObj(0,NULL);

"The new list value returned by Tcl_NewListObj has reference count zero."; it will be incremented only in Tcl_ListObjAppendElement.
This also means, you must free it explicitly if you return prematurely.

About the same idea applies to the inner elements and the outer list.

Side-note: you could initialize the size with two, i.e. (2, NULL). "If objv is NULL, the resulting list contains 0 elements, with reserved space in an internal representation for objc more elements (to avoid its reallocation later)."

> //copy the sublist as a single element to the output list:

You have mentioned shared objects (copy on edit), i.e. "copy" is not exactly correct here.

> Any C-programmer not terribly familiar with the Tcl API should immediately be concerned about the repeated Tcl_NewListObj(). Who cleans up ?

Tcl, when the ref count drops to zero.

> 1. An AppendElement API call doesn't extend the size of the pOutList - it merely make a reference to the pSubList in pOutList. For that reason, the AppendElement call increments the ref counter on pSubList - it is now 'shared' and needs to stay alive.

It extends the size (by the definition of its semantics), but it does not create a string rep for the extended list. The consequence is correct anyway.

> 2. The AppendElement API does *not* increment pOutList ref counter because that list is not shared anywhere. In fact the call will fail it is ever sees pOutList with a non-zero reference counter

Yes, it does not touch the ref count, but I assume (don't know exactly) it is the caller's responsibility to not call it on shared lists (i.e. I assume no error on C level).

> 3. The Tcl_SetObjResult(interp, pOutList); increments the Ref counter of pOutlist because the interpreter needs these data and it should not be cleaned up.

Yes, increment from 0 to 1 in this case.

> Now for the final statement. Lets say I call my C function in a loop:
>
> While { 1 } {
> set MyData [GetMyData ]
> }
>
> 4. The Tcl interpreters will recursively Free the memory for ALL the lists objects created in a previous call to GetMyData upon executing the next one when $MyData is over-written. Right ? Otherwise we have a memory leak.

Not just _list_ objects. And replace free with decrement the ref count, i.e. you are not leaking unless the inner elements are leaking (in the part that is not part of your pseudo code).

Gertjan

unread,

Oct 24, 2022, 5:20:29 PM10/24/22

to

Rich,Robert, Martin,

Slight inconsistencies in my language aside, I think I am on the right track and thank you all for confirming. I have a feeling a bit of basic knowledge like this should find it self at the top of the API documentation. The Wiki has some pages on dec/inc ref counters but it doesn't clarify the fundamentals: objects become shared, appends do not "expand/copy" and this is why if you do any operations like 'appending' list A to list B, then to modify list A lafter you need to duplicate it first.
i
There is a further consequence here. If in a script you have a very large list A and append/concat to list B, there is almost no CPU overhead. But then if you modify list A later, there must be significant overhead as list A now must be duplicated. i.e.

5. In the following code:

set A { some very large list}
lappend B {*}$A
lappend A {"hello world"}

there is almost no CPU overhead for the first lappend, which adds a massive list A to B. But a single item addition on the 2nd lappend is quite costly. Correct ?

Again, I appreciate the fast response I got here.

Cheers

G

heinrichmartin

unread,

Oct 24, 2022, 6:37:36 PM10/24/22

to

On Monday, October 24, 2022 at 11:20:29 PM UTC+2, Gertjan wrote:
> Slight inconsistencies in my language aside, I think I am on the right track and thank you all for confirming. I have a feeling a bit of basic knowledge like this should find it self at the top of the API documentation. The Wiki has some pages on dec/inc ref counters but it doesn't clarify the fundamentals: objects become shared, appends do not "expand/copy" and this is why if you do any operations like 'appending' list A to list B, then to modify list A lafter you need to duplicate it first.
> i
> There is a further consequence here. If in a script you have a very large list A and append/concat to list B, there is almost no CPU overhead. But then if you modify list A later, there must be significant overhead as list A now must be duplicated. i.e.

Indeed, knowing the internals can speed up scripts (or the other way round, you might encounter unexpectedly slow sections if you were not aware of some internals). Shimmering (i.e. switching between representations) is one example - not the one that you are after, but a simple one:
Creating a long list by appending items in a (tight) loop is fast in Tcl until you log the list in every iteration. File/console operation is the expected factor, but actually creating the string representation is an unexpected one (in a language that emphasizes "everything is a string"). On the other hand, this makes Tcl great imho - be flexible while coding, be fast if possible.

> set A { some very large list}
> lappend B {*}$A

This requires at least:
- realloc in B
- memcopy pointers from A to end of B
- ref incr on all elements of A

> lappend A {"hello world"}

This might not even require a realloc, because Tcl allocates internal list presentation progressively.
It duplicates the object in A only if it is shared, which is not the case here.

Side-note: [lindex $A end] will be a string including the quotation marks; this might or might not be expected.

> there is almost no CPU overhead for the first lappend, which adds a massive list A to B. But a single item addition on the 2nd lappend is quite costly. Correct ?

I'd say the first operation is more expensive than the second one.

Btw, ::tcl::unsupported::representation is quite nice to learn about internals, but an interactive session might thwart your efforts in two ways:
1. it creates string representations to display results
2. it might hold copies of parameters in the command history

Ralf Fassel

unread,

Oct 25, 2022, 4:52:49 AM10/25/22

to

* Robert Heller <hel...@deepsoft.com>

| > 2. The AppendElement API does *not* increment pOutList ref counter because
| > that list is not shared anywhere. In fact the call will fail it is ever sees
| > pOutList with a non-zero reference counter
>
| Yes.

Nitpick: [...] In fact the call will fail it is ever sees pOutList with
a reference counter *greater than 1* (i.e. a shared list obj).

R'

Harald Oehlmann

unread,

Oct 25, 2022, 6:07:27 AM10/25/22

to

Dear Folks,

I appreciate this thread, as it is a common issue.
I personally got the basics from the outdated book by Jeff Hobbs.

A good wiki page is:

https://wiki.tcl-lang.org/page/Tcl%5FObj+refCount+HOWTO

As a fist documentation step, additionaly information may be added there.

Thank you all,
Harald

clt.to...@dfgh.net

unread,

Oct 25, 2022, 12:58:56 PM10/25/22

to

>From: Harald Oehlmann <wort...@yahoo.com>
>Date: Tue Oct 25 10:07:16 GMT 2022
>Subject: Understanding Tcl memory management of list created in C

Another interesting nitpick

set B [list {*}$lA]

is about twice as fast as:

lappend B {*}$A

for long lists on my machine, and does not assume B starts empty, or does not yet exist.

In any case I can think of, a fast and safe way to copy a long list (saved in A), then append "zzz" to the copy (while saving the original list in A) would be:

set B $A
lappend B zzz

In this case the first step is very fast (copying a pointer to the original list), and the second is slow (because the original list is duplicated by Tcl's internal copy on write logic before it is modified). Which is very close to Gertjan's thoughts on relative timings.

Dave B

Gertjan

unread,

Oct 25, 2022, 6:03:31 PM10/25/22

to

Martin,

Since this is a conceptual learning exercise I need to ask you to clarify this response of your or it ruins what I think I learned :)

1. You said that in my example:

set A { some very large list}
lappend B {*}$A

lappend A {"hello world"}

That the last line "does not even require a realloc....it duplicates only if A is shared". But A IS shared - it becomes shared once lappend B{*}$A is executed. Right ?

2. Perhaps related: "I'd say the first operation is more expensive than the second one.". Again - no - the 1st lappend is cheap because B is merely linked to A. But the 2nd is very expensive because that is when A is necessarily duplicated and may not be modified since it's Ref count >= 1.

Appreciate your thoughts.

Thanks - G

Ralf Fassel

unread,

Oct 26, 2022, 9:31:35 AM10/26/22

to

* Gertjan <gho...@gmail.com>

| 1. You said that in my example:
>
| set A { some very large list}
| lappend B {*}$A
| lappend A {"hello world"}
>
| That the last line "does not even require a realloc....it duplicates
| only if A is shared". But A IS shared - it becomes shared once
| lappend B{*}$A is executed. Right ?

Wrong, since

lappend B {*}$A

does *not* lappend A, it lappends every element of A.
The refcount of every element of A is incr'd after that, but A's refcount itself is not:

% set A [list some very large list]
some very large list
% ::tcl::unsupported::representation $A
value is a list with a refcount of 3, object pointer at 0x1afeeb0, internal representation 0x1b4e360:(nil), string representation "some very lar..."

% lappend B {*}$A
some very large list
% ::tcl::unsupported::representation $A
value is a list with a refcount of 3, object pointer at 0x1b030b0, internal representation 0x1b2b3a0:(nil), string representation "some very lar..."

Note how the refcount did not change (don't ask me why it is *3* and not
1 or 2).

Now
lappend B $A
is a different beast, here indeed A is shared after that.

% lappend B $A
{some very large list}
% ::tcl::unsupported::representation $A
value is a list with a refcount of 4, object pointer at 0x1b030b0, internal representation 0x1b2b3a0:(nil), string representation "some very lar..."

Note how the refcount *did* change.

HTH
R'

Harald Oehlmann

unread,

Oct 26, 2022, 10:04:06 AM10/26/22

to

Am 26.10.2022 um 15:31 schrieb Ralf Fassel:> Note how the refcount did

not change (don't ask me why it is *3* and not
> 1 or 2).

That is due to the history package, active in interactive mode.
The history package makes copies of each command, which increments the
ref-count of all objects.

% set A { some very large list};
some very large list

% lappend B {*}$A
some very large list
% ::tcl::unsupported::representation $A

value is a list with a refcount of 4, object pointer at 008767C0,
internal representation 03EB5D78:00000000, string representation " some
very la..."
% history clear ; ::tcl::unsupported::representation $A
value is a list with a refcount of 2, object pointer at 008767C0,
internal representation 03EB5D78:00000000, string representation " some
very la..."

Another surprising way of refcount increment is, if an object is the
result of a command and thus copied to the interpreter result field.

Harald

Gertjan

unread,

Oct 26, 2022, 5:15:54 PM10/26/22

to

Harald, Ralf - Thanks. My example was sloppy, I should have used lappend B $A. Well, perhaps ignorance, not sloppy: I did not consider the ref count of individual list objects versus the list itself. And indeed, in my C function, where I first run into this, I am appending $A as a single item to B, so observe the refcount increment on A.

Again, much appreciated.

Gertjan