Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Tcl Performance hotspot in TclFreeObj() (while migrating from 8.4 to 8.6)

385 views
Skip to first unread message

Trey Jackson

unread,
Sep 5, 2018, 11:04:58 PM9/5/18
to
All,

I'm tracking down performance issues during a porting project and have a question as to how to do things differently to avoid a performance hotspot.
Tcl 8.6.8 (and Tcl 8.4.19)

My code is spending a non-trivial amount of time (6% of time) inside of CreateHashEntry in Tcl 8.6, where it spends basically 0% of time. The majority of the CreateHashEntry is coming through the TclFreeObj() calls.

Looking at the source (it's not compiled with TCL_MEM_DEBUG defined), it appears that this chunk of code in TclFreeObj() is the culprit:

/*
* We cannot use TclGetContinuationTable() here, because that may
* re-initialize the thread-data for calls coming after the finalization.
* We have to access it using the low-level call and then check for
* validity. This function can be called after TclFinalizeThreadData() has
* already killed the thread-global data structures. Performing
* TCL_TSD_INIT will leave us with an un-initialized memory block upon
* which we crash (if we where to access the uninitialized hashtable).
*/

{
ThreadSpecificData *tsdPtr = TCL_TSD_INIT(&dataKey);
Tcl_HashEntry *hPtr;

if (tsdPtr->lineCLPtr) {
hPtr = Tcl_FindHashEntry(tsdPtr->lineCLPtr, objPtr);
if (hPtr) {
ckfree(Tcl_GetHashValue(hPtr));
Tcl_DeleteHashEntry(hPtr);
}
}
}

So, my question is, why might this be a hot spot?

I don't have threaded code. These tests are not constantly sourcing files - so I don't really understand why that line continuation pointers would be of much use. A solid third to half of the 6% runtime is actually spent deleting Tcl_Obj which are dynamically created commands with no continuations at all.

I'm in the process of building an instrumented version of the Tcl source and will be using that to dig into this. But I'd appreciate any tips on what I could do to avoid this problem, as well as how to better characterize this (what kind of information to provide) that might help explain the hotspot.


thanks,

TJ

Donal K. Fellows

unread,
Sep 7, 2018, 3:39:06 AM9/7/18
to
On 06/09/2018 04:04, Trey Jackson wrote:
> I don't have threaded code. These tests are not constantly sourcing files - so I don't really understand why that line continuation pointers would be of much use. A solid third to half of the 6% runtime is actually spent deleting Tcl_Obj which are dynamically created commands with no continuations at all.

There are two uses of hash tables in TclFreeObj itself.

1. For tracking what objects are owned by what thread. (Bad Things
Happen when objects cross threads, so we take special steps to prevent
it and give just a conventional *and reliable* panic.)
2. For tracking the provenance of strings derived (by backslash
substitution) from script literals. This lets you get accurate line
numbers of code out of [info frame], but the machinery for doing this is
expensive (and quite confusing for most Tcl core developers). This is
because we don't store a record of whether we care about this in the
values where we care, and that's because the Tcl_Obj structure is a
major memory stress point.

BTW, neither use actually creates in CreateHashEntry; that's also the
lookup function (because :IMPLEMENTATION-DETAIL: the OUT parameter to
indicate whether the value was allocated is NULL[*]).

Donal.
[* IMPLEMENTATION-DETAILs should not be relied on by user code.
Especially this one. ]
--
Donal Fellows — Tcl user, Tcl maintainer, TIP editor.

Trey Jackson

unread,
Sep 14, 2018, 6:34:40 PM9/14/18
to
I disabled the use of the hash in the second case (tracking the provenance of strings) - and the runtime associated with CreateHashEntry disappeared completely.

I'm seeing this cost associated with maintaining the line numbers consistently being a hot spot in most of my tests (hot spot meaning, it's one of the leaf nodes in the performance tree that's taking the most time, i.e. one of the first things you would try to fix).

From how I interpreted your response, it sounds like this functionality (maintaining line numbers) is expensive and confusing, but probably not an easy thing to disable/remove.

Is that correct? Would you have any other suggestions/pointers for me to look at?


thanks.

Andreas Kupries

unread,
Sep 18, 2018, 2:38:06 AM9/18/18
to
Trey Jackson <bigfa...@gmail.com> writes:

> I disabled the use of the hash in the second case (tracking the
> provenance of strings) - and the runtime associated with
> CreateHashEntry disappeared completely.

For context. This was all done quite a few years ago (2004) for
... some customer of ActiveState. They wanted this provenance tracking
for the easier debugging of problematic scripts, i.e. easier finding
of which file the troublesome code came from, even if the code was in
some deeper eval/uplevel for the body-script of a custom-coded control
command, etc.

See TIP #280, and #378

https://core.tcl-lang.org/tips/doc/trunk/tip/280.md
https://core.tcl-lang.org/tips/doc/trunk/tip/378.md

The really big constraint was the inability to change public data
structures like Tcl_Obj in any way. No additional fields were possible

The only solution seen at the time (and seen still (to me)) was the
introduction of a hashtable keyed by the Tcl_Obj address/pointer to
hold the additional information.

The code might be more confusing than it has to be because of that.

It also pays a price in more effort required to manage all the new
information, what with having to do (Find|Create|Delete)HashEntry
instead of a simple struct field access/dereference, etc.

All the relevant places (which directly access lineCLPtr) can be found
with simple grep

hephaistos:(504) ~/Play/CoreTcl/tcl-8.7 > grep -rn lineCLPtr .

./ChangeLog:7158: [Bug 2871908]: Enforce separation of concerns between the lineCLPtr
./ChangeLog:7186: and lineCLPtr hashtables. Also make the names of the continuation

./tclInt.h:1259: * file "tclBasic.c", and stored in the thread-global hashtable "lineCLPtr" in

./tclCompile.c:809: * found in this file. The "lineCLPtr" hashtable is managed in the file
./tclBasic.c:6197: * function, after the evaluator is done. The relevant "lineCLPtr"

./tclObj.c:79: Tcl_HashTable *lineCLPtr; /* This table remembers for each Tcl_Obj

./tclObj.c:541: if (!tsdPtr->lineCLPtr) {
./tclObj.c:542: tsdPtr->lineCLPtr = ckalloc(sizeof(Tcl_HashTable));
./tclObj.c:543: Tcl_InitHashTable(tsdPtr->lineCLPtr, TCL_ONE_WORD_KEYS);
./tclObj.c:576: Tcl_CreateHashEntry(tsdPtr->lineCLPtr, objPtr, &newEntry);

./tclObj.c:731: Tcl_FindHashEntry(tsdPtr->lineCLPtr, originObjPtr);
./tclObj.c:765: Tcl_FindHashEntry(tsdPtr->lineCLPtr, objPtr);
./tclObj.c:803: for (hPtr = Tcl_FirstHashEntry(tsdPtr->lineCLPtr, &hSearch);
./tclObj.c:808: Tcl_DeleteHashTable(tsdPtr->lineCLPtr);
./tclObj.c:809: ckfree(tsdPtr->lineCLPtr);
./tclObj.c:810: tsdPtr->lineCLPtr = NULL;

./tclObj.c:1404: if (tsdPtr->lineCLPtr) {
./tclObj.c:1405: hPtr = Tcl_FindHashEntry(tsdPtr->lineCLPtr, objPtr);
./tclObj.c:1495: if (tsdPtr->lineCLPtr) {
./tclObj.c:1496: hPtr = Tcl_FindHashEntry(tsdPtr->lineCLPtr, objPtr);

These are a number of `TclContinuations...` functions which
encapsulate the access and are called by other parts of the core.

> I'm seeing this cost associated with maintaining the line numbers
> consistently being a hot spot in most of my tests (hot spot meaning,
> it's one of the leaf nodes in the performance tree that's taking the
> most time, i.e. one of the first things you would try to fix).

> From how I interpreted your response, it sounds like this
> functionality (maintaining line numbers) is expensive and confusing,
> but probably not an easy thing to disable/remove.

> Is that correct? Would you have any other suggestions/pointers for
> me to look at?

Thinking about things, with the freeing of the information in
TclFreeObj disabled I have two main thoughts

(1) Freeing the CL data of a Tcl_Obj is mostly defered to the end of
the thread, when the thread exit handler
`TclThreadFinalizeContLines` gets invoked.

So, better perf, price is a larger heap.

(2) When a Tcl_Obj is released and its memory is reused, and the new
Tcl_Obj also needs lineCLPtr data.

Then the freeing of the old data happens in
`TclContinuationsEnter` as the system enters the information for
the new Tcl_Obj.

See the comments in that piece of code. This code replaces the
existing data with anything new, releasing the old allocation. And
while this was originally done for the case of shared literals as
the comments exhaustively explain; this looks to be the right
thing to do for a recycled Tcl_Obj* as well.


When you have the code in TclFreeObj disabled, does the core still
pass the test suite ? If you run it udner valgrind, and -DPURIFY, does
it leak memory it was not before you disabled the code ?

Both tests passing and nothing new leaking would be indicators that
the change, the disabling, could be benign.


For a longer-term solution Don's TIP #445 is a start of hiding the
Tcl_Obj details behind a set of functions. The moment that is
completed, i.e. Tcl_Obj not public anymore, then the data currently
held in these adjunct hash tables can move into new fields of the
Tcl_Obj structure, for much faster direct access.

https://core.tcl-lang.org/tips/doc/trunk/tip/445.md


> thanks.

> On Friday, September 7, 2018 at 12:39:06 AM UTC-7, Donal K. Fellows wrote:
>> On 06/09/2018 04:04, Trey Jackson wrote:

>> > I don't have threaded code. These tests are not constantly
>> > sourcing files - so I don't really understand why that line
>> > continuation pointers would be of much use. A solid third to
>> > half of the 6% runtime is actually spent deleting Tcl_Obj which
>> > are dynamically created commands with no continuations at all.

>> 2. For tracking the provenance of strings derived (by backslash
>> substitution) from script literals. This lets you get accurate line
>> numbers of code out of [info frame], but the machinery for doing
>> this is expensive (and quite confusing for most Tcl core
>> developers). This is because we don't store a record of whether we
>> care about this in the values where we care, and that's because the
>> Tcl_Obj structure is a major memory stress point.

>> BTW, neither use actually creates in CreateHashEntry; that's also the
>> lookup function (because :IMPLEMENTATION-DETAIL: the OUT parameter to
>> indicate whether the value was allocated is NULL[*]).
>>
>> Donal.
>> [* IMPLEMENTATION-DETAILs should not be relied on by user code.
>> Especially this one. ]
>> --
>> Donal Fellows — Tcl user, Tcl maintainer, TIP editor.
>

--
See you,
Andreas Kupries <akup...@shaw.ca>
<http://core.tcl.tk/akupries/>
Developer @ SUSE (MicroFocus Canada LLC)
<andreas...@suse.com>

EuroTcl 2018, Jul 7-8, Munich/DE, http://eurotcl.eu/
Tcl'2018, Oct 15-19, Houston, TX, USA. https://www.tcl.tk/community/tcl2018/
-------------------------------------------------------------------------------

Trey Jackson

unread,
Sep 19, 2018, 1:23:10 AM9/19/18
to
On Monday, September 17, 2018 at 11:38:06 PM UTC-7, Andreas Kupries wrote:
> Trey Jackson <...> writes:
>
> > I disabled the use of the hash in the second case (tracking the
> > provenance of strings) - and the runtime associated with
> > CreateHashEntry disappeared completely.
>
> For context. This was all done quite a few years ago (2004) for
> ... some customer of ActiveState. They wanted this provenance tracking
> for the easier debugging of problematic scripts, i.e. easier finding
> of which file the troublesome code came from, even if the code was in
> some deeper eval/uplevel for the body-script of a custom-coded control
> command, etc.
>
> See TIP #280, and #378
>
> https://core.tcl-lang.org/tips/doc/trunk/tip/280.md
> https://core.tcl-lang.org/tips/doc/trunk/tip/378.md

The context helps, thanks. I agree with the desired functionality - it's very useful.

Reading tip 378 - from reading the text, it seemed like this behavior would be disabled w/out the compile/runtime flag, but that flag doesn't appear to affect the lineCLPtr code. I enabled the flag for one of my performance runs and saw no real difference in runtime.

> The really big constraint was the inability to change public data
> structures like Tcl_Obj in any way. No additional fields were possible
>
> The only solution seen at the time (and seen still (to me)) was the
> introduction of a hashtable keyed by the Tcl_Obj address/pointer to
> hold the additional information.
>
> The code might be more confusing than it has to be because of that.
>
> It also pays a price in more effort required to manage all the new
> information, what with having to do (Find|Create|Delete)HashEntry
> instead of a simple struct field access/dereference, etc.
>
> All the relevant places (which directly access lineCLPtr) can be found
> with simple grep
>
> hephaistos:(504) ~/Play/CoreTcl/tcl-8.7 > grep -rn lineCLPtr .
>
[snip]
> Thinking about things, with the freeing of the information in
> TclFreeObj disabled I have two main thoughts
>
> (1) Freeing the CL data of a Tcl_Obj is mostly defered to the end of
> the thread, when the thread exit handler
> `TclThreadFinalizeContLines` gets invoked.
>
> So, better perf, price is a larger heap.
>
> (2) When a Tcl_Obj is released and its memory is reused, and the new
> Tcl_Obj also needs lineCLPtr data.
>
> Then the freeing of the old data happens in
> `TclContinuationsEnter` as the system enters the information for
> the new Tcl_Obj.
>
> See the comments in that piece of code. This code replaces the
> existing data with anything new, releasing the old allocation. And
> while this was originally done for the case of shared literals as
> the comments exhaustively explain; this looks to be the right
> thing to do for a recycled Tcl_Obj* as well.
>
>
> When you have the code in TclFreeObj disabled, does the core still
> pass the test suite ? If you run it udner valgrind, and -DPURIFY, does
> it leak memory it was not before you disabled the code ?
>
> Both tests passing and nothing new leaking would be indicators that
> the change, the disabling, could be benign.

I'll run the tests, but I imagine there would be some leakage due to hash values never being freed. I can extend my disabling to include all the other uses of lineCLPtr (prevent it from being initialized at all) - that would seem to be the way to go to avoid memory leaks, but I would think it would cripple the functionality.

Andreas Kupries

unread,
Sep 19, 2018, 4:08:07 AM9/19/18
to
Trey Jackson <bigfa...@gmail.com> writes:

> On Monday, September 17, 2018 at 11:38:06 PM UTC-7, Andreas Kupries wrote:
>> Trey Jackson <...> writes:
>>
>> > I disabled the use of the hash in the second case (tracking the
>> > provenance of strings) - and the runtime associated with
>> > CreateHashEntry disappeared completely.
>>
>> For context. This was all done quite a few years ago (2004) for
>> ... some customer of ActiveState. They wanted this provenance tracking
>> for the easier debugging of problematic scripts, i.e. easier finding
>> of which file the troublesome code came from, even if the code was in
>> some deeper eval/uplevel for the body-script of a custom-coded control
>> command, etc.
>>
>> See TIP #280, and #378
>>
>> https://core.tcl-lang.org/tips/doc/trunk/tip/280.md
>> https://core.tcl-lang.org/tips/doc/trunk/tip/378.md
>
> The context helps, thanks. I agree with the desired functionality - it's very useful.

> Reading tip 378 - from reading the text, it seemed like this
> behavior would be disabled w/out the compile/runtime flag, but that
> flag doesn't appear to affect the lineCLPtr code. I enabled the
> flag for one of my performance runs and saw no real difference in
> runtime.

I mentioned 378 for completeness. It disables the most expensive parts
of the functionality. The lineCLPtr stuff is not affected by the flag
(At the time it was not seen as expensive, I believe). Note also that
the flag is off by default, and can only be switched on, activating
the really expensive things.

>> Thinking about things, with the freeing of the information in
>> TclFreeObj disabled I have two main thoughts
>>
>> (1) Freeing the CL data of a Tcl_Obj is mostly defered to the end of
>> the thread, when the thread exit handler
>> `TclThreadFinalizeContLines` gets invoked.
>>
>> So, better perf, price is a larger heap.
>>
>> (2) When a Tcl_Obj is released and its memory is reused, and the new
>> Tcl_Obj also needs lineCLPtr data.
>>
>> Then the freeing of the old data happens in
>> `TclContinuationsEnter` as the system enters the information for
>> the new Tcl_Obj.
>>
>> See the comments in that piece of code. This code replaces the
>> existing data with anything new, releasing the old allocation. And
>> while this was originally done for the case of shared literals as
>> the comments exhaustively explain; this looks to be the right
>> thing to do for a recycled Tcl_Obj* as well.
>>
>>
>> When you have the code in TclFreeObj disabled, does the core still
>> pass the test suite ? If you run it udner valgrind, and -DPURIFY, does
>> it leak memory it was not before you disabled the code ?
>>
>> Both tests passing and nothing new leaking would be indicators that
>> the change, the disabling, could be benign.

> I'll run the tests, but I imagine there would be some leakage due to
> hash values never being freed.

Actually, that is why I asked to check. Because of (1) and (2) I have
com to believe that leakage actually might not happen. If the Tcl_Obj
slot is re-used after a deletion then a `TclContinuationsEnter` may
happen to free the old information just before the new comes in. And
everything not caught by that should be caught and freed by the thread
exit handler `TclThreadFinalizeContLines`. Ditto anything still alive
at the time.

Apologies that this apparently did not come across clearly.

> I can extend my disabling to include all the other uses of lineCLPtr
> (prevent it from being initialized at all) - that would seem to be
> the way to go to avoid memory leaks, but I would think it would
> cripple the functionality.

Definitely cripple.

Trey Jackson

unread,
Oct 8, 2018, 5:41:17 PM10/8/18
to
I finally ran the test suite with the disabling the last condition of TclFreeObj().

Without my change, the test results were:

all.tcl: Total 31340 Passed 30232 Skipped 1108 Failed 0

With my change, the test results were:

all.tcl: Total 31340 Passed 30220 Skipped 1108 Failed 12
Sourced 148 Test Files.
Files with failing tests: assemble.test dict.test info.test


Not sure what to do with this information.



The specific test failures were:

==== dict-23.6 CompileWord OBOE FAILED
==== Contents of test case:

apply {n {
dict get {a b} {*}{
} [return [incr n -[linenumber]]]
}} [linenumber]

---- Result was:
2
---- Result should have been (exact matching):
1
==== dict-23.6 FAILED


The info.test failures all dealt with the line numbers being different, like (just one of the failed tests):

==== info-30.40 TIP 280 for compiled [subst] FAILED
==== Contents of test case:

set empty {}
subst {$empty[reduce [info frame 0]]} ; # 1782

---- Result was:
type source line 1784 file info.test cmd {info frame 0} proc ::tcltest::RunTest
---- Result should have been (exact matching):
type source line 1782 file info.test cmd {info frame 0} proc ::tcltest::RunTest
==== info-30.40 FAILED


The assemble tests seem a little more serious:

==== assemble-30.4 throw in wrong context FAILED
==== Contents of test case:
proc x {} {
list [catch {
assemble {
beginCatch @bad
push error
eval { list [catch {error whatever} result] $result }
invokeStk 2
push 0
reverse 2
jump @done

label @bad
load x
pushResult

label @done
endCatch
list 2
}
} result] $result $::errorCode [split $::errorInfo \n]
}
x
---- Result was:
1 {"loadScalar1" instruction may not appear in a context where an exception has been caught and not disposed of.} {TCL ASSEM BADTHROW} {{"loadScalar1" instruction may not appear in a context where an exception has been caught and not disposed of.} { in assembly code between lines 11 and 16} { ("assemble" body, line 1)} { invoked from within} \"assemble\ \{ { beginCatch @bad} { push error} { eval { list [catch {error whatever} result] $result }} { invokeStk 2} { push 0} { rev..."}}
---- Result should have been (glob matching):
1 {"loadScalar1" instruction may not appear in a context where an exception has been caught and not disposed of.} {TCL ASSEM BADTHROW} {{"loadScalar1" instruction may not appear in a context where an exception has been caught and not disposed of.} { in assembly code between lines 10 and 15}*}
==== assemble-30.4 FAILED



==== assemble-41.1 Inconsistent stack usage FAILED
==== Contents of test case:

proc x {y} {
assemble {
load y
jumpFalse else
push 0
jump then
label else
push 1
push 2
label then
pop
}
}
catch {x 1}
set errorInfo

---- Result was:
inconsistent stack depths on two execution paths
("assemble" body, line 12)
invoked from within
"assemble {
load y
jumpFalse else
push 0
jump then
label else
push 1
push 2
label then
pop
}"
(procedure "x" line 2)
invoked from within
"x 1"
---- Result should have been (glob matching):
inconsistent stack depths on two execution paths
("assemble" body, line 10)*
==== assemble-41.1 FAILED



==== assemble-41.2 Inconsistent stack, jumptable and default FAILED
==== Contents of test case:
proc x {y} {
assemble {
load y
jumpTable {0 else}
push 0
label else
pop
}
}
catch {x 1}
set errorInfo
---- Result was:
inconsistent stack depths on two execution paths
("assemble" body, line 7)
invoked from within
"assemble {
load y
jumpTable {0 else}
push 0
label else
pop
}"
(procedure "x" line 2)
invoked from within
"x 1"
---- Result should have been (glob matching):
inconsistent stack depths on two execution paths
("assemble" body, line 6)*
==== assemble-41.2 FAILED

Donal K. Fellows

unread,
Oct 9, 2018, 6:38:53 AM10/9/18
to
On 08/10/2018 22:41, Trey Jackson wrote:
> Not sure what to do with this information.

It looks like all the failures are failures of line number tracking.

Donal.

Trey Jackson

unread,
Oct 9, 2018, 2:30:37 PM10/9/18
to
Right, that was kind of expected, yes? It'd be pretty magical if disabling a portion of a feature didn't cause problems.

While I appreciate the benefit of more accurate line numbering in the error messages/stack traces, I think that a 5-6% degradation in performance is a pretty high penalty to pay. Especially since the line numbers only matter when you're debugging, and the penalty happens all the time (deletion of Tcl objects happens through the lifetime of the application).

What would the cost be to provide an option (compile time or runtime) to disable this? Or would this kind of a change need to wait until Tip #445 is accomplished?

leon.m...@gmail.com

unread,
Dec 2, 2018, 2:42:42 PM12/2/18
to
Now there is a ticket filed (https://core.tcl.tk/tcl/tkttimeline/a09031e288) related to this. I am committed to implementing at least the quick fix for it.

leon.m...@gmail.com

unread,
Dec 10, 2018, 4:02:45 PM12/10/18
to
Hi Andreas & Donal,

I have a quick fix for this issue at https://core.tcl.tk/tcl/tkttimeline/a09031e288. Since I am a fresh user of the Tcl core fossil repository, with every action requiring approval by moderators, I am curious as to what would be the most efficient way for me to contribute this fix to Tcl core while bothering the Tcl core team as little as possible.

Best regard,
Leon

Harald Oehlmann

unread,
Dec 11, 2018, 2:27:34 AM12/11/18
to
Leon,

thank you for the message.

The moderator aüpproval is a miss-feature, making people like you not
feeling welcomed.
It was told to be a spam protection. I have never seen any spam...

My personal comment would be to write in the TIP what the patch is
about. For me, a pointer may be used to give rationale, but the main
lines should be in the ticket. We have no relyable way to refer to a clt
thread.

Not bothering the core people is IMHO not the right strategy. At the
end, everybody wants to help. The goal is to kame Tcl/Tk better.

So thank you for your patch,
Harald

leon.m...@gmail.com

unread,
Dec 13, 2018, 8:45:57 AM12/13/18
to
Harald,

Thank you for the info.

I signed up at the TIPs repository and created a new TIP in my local clone however I am still missing permissions to push it. I didn't find a way of requesting a commit privilege through the fossil web interface. How do I proceed from here?

Harald Oehlmann

unread,
Dec 14, 2018, 4:26:30 AM12/14/18
to
Leon,

it is a manual process.
I normally ask Andreas Kupries, but he is already on holiday.
I invite you to send a message on tcl-core mailing list to get the
privileges.

Thank you and best regards,
Harald
0 new messages