Better approach to extension-specific objects...

Mikhail Teterin

unread,

Jun 30, 2006, 5:30:27 PM6/30/06

to

Consider the usual Tcl extension's use:

set foo [openfoo]
bar $foo
augment $foo
...
closefoo $foo

For many years extension writers have used the same inferior approach to
handling their own extension-specific objects. It went like this:

. keep a private array or a hash-table with pointers to the extension
specific structure type;
. treat object's string representation as an index into the array;
or the key into the hash-table;
. EVERY TIME, an object is needed, look it up in either the
array by index or in the hash-table by key.

This seems astoundingly wasteful in both time and memory... Why not,
instead, declare your own Tcl_ObjType? The Tcl_UpdateStringProc will simply
use sprintf(... "%p" ...) to create string representation (obj->bytes).

The Tcl_SetFromAnyProc will try to

sscanf(obj->bytes, "%p", &obj->internalRep.otherValuePtr);

This would avoid wasting memory on private arrays and hash-tables, as well
as wasting time performing lookups, almost every time an object is
returned.

In fact, the sprintf-ing and sscanf-ing would not even be necessary most of
the time in a regular script, as the string representations are rarely
used.

Should TCL core offer simpler to use helper functions to the API to
facilitate transition from the traditional "myobject%d" approach? I mean,
providing standard functions to be used as updateStringProc and
setFromAnyProc?

Something like below:

void
Tcl_PointerUpdateString(struct Tcl_Obj *O)
{
O->bytes = ckalloc(sizeof(void *)*2 + 3);
O->length = sprintf(O->bytes, "%p",
O->internalRep.otherValuePtr);
}

and

int
Tcl_PointerUpdatePointer(Tcl_Interp *I, struct Tcl_Obj *O)
{
if (O->typePtr != NULL && O->typePtr->freeIntRepProc)
O->typePtr-freeIntRepProc(O);

if (1 !=
sscanf(O->bytes, "%p", &O->internalRep.otherValuePtr)) {
if (I != NULL)
Tcl_AppendResult(I, O->bytes,
": not a pointer", NULL);
return TCL_ERROR;
}
return TCL_OK;
}

With these generics in place, extension-writers will have only the other two
Tcl_ObjType methods to worry about...

Opinions?

-mi

Bruce Hartweg

unread,

Jun 30, 2006, 6:14:45 PM6/30/06

to

inherently unsafe?
if I have an opages id that maps internal to my object in a has, if the user
mucks it up, (or frees/closes it but keeps the handle around) and calls my
fund with a bogus handle - It won;t be in my hash and I can throw a tcl level
error "no such ???" or "invalid ??" with you implekmentation, an invalid handle
would be turned into a random pointer and then all kinds of nasty, untraceable,
strange and horrible things can happen.

Bruce

Andrew Mangogna

unread,

Jun 30, 2006, 8:59:31 PM6/30/06

to

Mikhail Teterin wrote:

> Consider the usual Tcl extension's use:
>
> set foo [openfoo]
> bar $foo
> augment $foo
> ...
> closefoo $foo
>
> For many years extension writers have used the same inferior approach to
> handling their own extension-specific objects. It went like this:
>
> . keep a private array or a hash-table with pointers to the
> extension
> specific structure type;
> . treat object's string representation as an index into the array;
> or the key into the hash-table;
> . EVERY TIME, an object is needed, look it up in either the
> array by index or in the hash-table by key.
>
> This seems astoundingly wasteful in both time and memory... Why not,
> instead, declare your own Tcl_ObjType? The Tcl_UpdateStringProc will
> simply use sprintf(... "%p" ...) to create string representation
> (obj->bytes).
>

[snip]
> Opinions?
>
> -mi

I'm of the opinion that extensions that introduce a new Tcl_ObjType should
provide a true string representation of that type that can serve as both a
literal value of the type and is the serializable representation. In that
case the "updateString" and "setFromAny" functions have real work to do to
reconstruct the external string representation and parse any string
representation into internal form. I find string values such
as "myEx00B708C" to be less than useful for entities that represent a true
value. Such "handles" are fine for entities like channels that don't
represent a set of values (and in the case of channels do not define a
Tcl_ObjType).

--
Andrew Mangogna

George Peter Staplin

unread,

Jul 2, 2006, 7:10:33 PM7/2/06

to

Mikhail Teterin wrote:
> Consider the usual Tcl extension's use:
>
> set foo [openfoo]
> bar $foo
> augment $foo
> ...
> closefoo $foo
>
> For many years extension writers have used the same inferior approach to
> handling their own extension-specific objects. It went like this:
>
> . keep a private array or a hash-table with pointers to the extension
> specific structure type;
> . treat object's string representation as an index into the array;
> or the key into the hash-table;
> . EVERY TIME, an object is needed, look it up in either the
> array by index or in the hash-table by key.
>
> This seems astoundingly wasteful in both time and memory... Why not,
> instead, declare your own Tcl_ObjType? The Tcl_UpdateStringProc will simply
> use sprintf(... "%p" ...) to create string representation (obj->bytes).
>
> The Tcl_SetFromAnyProc will try to
>
> sscanf(obj->bytes, "%p", &obj->internalRep.otherValuePtr);
>
> This would avoid wasting memory on private arrays and hash-tables, as well
> as wasting time performing lookups, almost every time an object is
> returned.

I had the same thoughts before, until Kevin B. Kenny of the Tcl core
team explained the problems with that approach. I implemented a binary
tree algorithm using such an approach, but I wouldn't recommend it.

Regarding safety and your method. If an object is retained or generated
that has the string representation of a previous object then bad things
can happen.

The principle behind Tcl handles is that you can avoid lookup in a hash
table most of the time. You need only do hash lookup if the internal
representation is lost, due to shimmering, or an epoch change. You also
can verify with the hash table if the structure associated with the
object is no longer valid, or shimmer the Tcl_Obj internal
representation back to the proper value via the hash table.

You may find this useful and relevant:
http://wiki.tcl.tk/13881

-George

Donal K. Fellows

unread,

Jul 3, 2006, 4:58:46 AM7/3/06

to

Andrew Mangogna wrote:
> I'm of the opinion that extensions that introduce a new Tcl_ObjType should
> provide a true string representation of that type that can serve as both a
> literal value of the type and is the serializable representation.

Portable, robust, safe extensions should do that. By portable, I mean
that the values must be meaningful when sent to another Tcl interpreter
in another thread or process. By robust, I mean that the values must
survive loss of the internal representation and subsequent regeneration
of it. By safe, I mean that it must not be possible to synthesize an
illegal member of the type; it must be possible to make a decision for
any value whether it is definitely a member of the type or definitely
not, and this act is particularly important to get right if there are
raw pointers stored within the value. All of Tcl's core object types
are safe and robust, and most are portable; those that are not are
basically representations of things that are not meaningful outside a
single interpreter/process, such as array iterators, but even with
those you could write them to disk, throw the values away, read them
back as strings from the disk and use them again, since Tcl doesn't
couple the lifetime of the underlying entity to the lifetime of the
value.

If you're not concerned so much about portability, robustness and
safety (e.g. because it is an application-specific extension and you
can enforce correct object management at the app level) then you can
ignore those restrictions and put raw pointers in Tcl_Objs quite
handily. It works well for almost everything. :-) But if you do it, you
*must* not allow user input to be interpreted as one of these special
Tcl_Obj instances. And in fact, you can do a bit better when it comes
to closing out problems.

This is easy to do by requiring the object have a magic blessing in the
form of a custom Tcl_ObjType whose setFromAny function always fails.
This produces a type of object that is fragile (i.e. it can easily lose
its blessing) but safe (untrusted code *cannot* synthesize handles,
with a down-side that the values are only interpretable within one
process, i.e. no portability). Often that's good enough, especially if
the number of objects is very large; you don't usually want the code to
manage some master hashtable of handle names to become expensive to run
in itself (a problem I've hit in the past).

Donal.

Mikhail Teterin

unread,

Jul 6, 2006, 6:10:23 PM7/6/06

to

Donal K. Fellows wrote:

> If you're not concerned so much about portability, robustness and
> safety (e.g. because it is an application-specific extension and you
> can enforce correct object management at the app level) then you can
> ignore those restrictions and put raw pointers in Tcl_Objs quite
> handily. It works well for almost everything. :-)

Well, this is how channels ought to be implemented, for one. Other things,
like database-connection handlers (sybtcl, oratcl) should be done this way
too, IMO... Whenever the object being passed around is just a pointer
anyway, my method, indeed, satisfies your robustness requirement (and the
portability one too, as long as another interpreter is within the same
process).

> But if you do it, you must not allow user input to be interpreted as one

> of these special Tcl_Obj instances.

Why not? A regular user would not do that. Others will find a way to blast
their foot off anyway. I'm all for safety, but when it impedes the
performance of the debugged and working program I object to it...

> And in fact, you can do a bit better when it comes to closing out
> problems.

I wish, the "updateString" and "setFromAny" were also passed a pointer to
the Tcl_ObjType structure. This would've allowed writing generic methods,
which would be safe(r) and readable, because they could use type->name as
the prefix...

-mi