Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Memory usage of Tcl_Obj

8 views
Skip to first unread message

j...@mrc-lmb.cam.ac.uk

unread,
Jun 7, 2002, 7:30:54 AM6/7/02
to
I have an application which creates lists of lists. The internal list consists
of 6 objects (3 1-byte strings and 3 doubles). The outer list may contain many
copies of the 6-tuples. On a test with 250,000 tuples it appeared to take over
100Mb to store this data. This seems very excessive!

However I have lots of objects. 1 + 250,000 * 7 (6 elements plus the list
itself) == 1750001.

Looking at the Tcl_Obj structure I can see one way of quickly saving 17% of
memory (I use a 64-bit alpha).


With comments stripped out (for brevity) we have:

typedef struct Tcl_Obj {
int refCount;
char *bytes;
int length;
Tcl_ObjType *typePtr;
union {
long longValue;
double doubleValue;
VOID *otherValuePtr;
struct {
VOID *ptr1;
VOID *ptr2;
} twoPtrValue;
} internalRep;
} Tcl_Obj;

Now on the Alphas pointers are 64-bit and need to be 64-bit aligned. Ints are
32 bit and longs are 64 bit.

So starting with int, pointer, int, pointer uses two lots of 32-bit padding
between the int and pointers. Simply rearranging this to int, int, pointer,
pointer will save 8 bytes.

Saving more memory gets tricky. The largest component of the union is the
twoPtrValue (16 bytes on this system). I guess it's fast to have some commonly
used items directly in the internalRep while user-additions get to use
otherValuePtr, however for my particular nested list half of internalRep is
redundant. Replacing the twoPtrValue by a single pointer to an allocated block
will further reduce memory for simple lists, but it'll increase memory where
twoPtrValue is needed and will slow down code using it (which looks to be
quite bad).

My calculations show than 84Mb of my memory usage is in the Tcl_Obj structure
itself, while the remainder (not entirely known as I only had gross estimates,
but probably of the order 30 to 40Mb) is presumably in the memory that 'bytes'
points to. I assume that the only way I can really save more is by repacking
my data into a different data structure (eg strings and arrays).

Any more thoughts?

James
--
James Bonfield (j...@mrc-lmb.cam.ac.uk) Fax: (+44) 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/

Bob Techentin

unread,
Jun 7, 2002, 8:49:01 AM6/7/02
to
<j...@mrc-lmb.cam.ac.uk> wrote:
> I have an application which creates lists of lists. The internal
list consists
> of 6 objects (3 1-byte strings and 3 doubles). The outer list may
contain many
> copies of the 6-tuples. On a test with 250,000 tuples it appeared to
take over
> 100Mb to store this data. This seems very excessive!
>
> However I have lots of objects. 1 + 250,000 * 7 (6 elements plus the
list
> itself) == 1750001.
>
> Looking at the Tcl_Obj structure I can see one way of quickly saving
17% of
> memory (I use a 64-bit alpha).
>
[snip]

Good analysis of the size of Tcl_Obj. Yes, there is overhead with
every Tcl_Obj, and when you have millions of objects, you're going to
consume a lot of memory for those objects.

You _could_ hack the configuration of Tcl_Obj to reduce memory
footprint specifically for your alpha, but then you would probably
have a lot of trouble with extensions. Even Tk might be very
difficult to adapt. (I don't know - I've never tried.)

Perhaps an alternative would be to change your application
architecture to reduce the number of Tcl_Objs in your system. Your
main data structure is a list of 250,000 elements, each of which is a
list of 6 elements. If you instead constructed this as a list of
250,000 strings, you would decrease your object count by a factor of
seven.

Instead of:

lappend myList [list $c1 $c2 $c3 $n1 $n2 $n3]

Build the data structure like this:

lappend myList [format "%c %c %c %g %g %g" $c1 $c2 $c3 $n1 $n2 $n3]

Then use [scan] instead of [lindex] to parse the data.

Bob
--
Bob Techentin techenti...@mayo.edu
Mayo Foundation (507) 538-5495
200 First St. SW FAX (507) 284-9171
Rochester MN, 55901 USA http://www.mayo.edu/sppdg/

Jean-Claude Wippler

unread,
Jun 7, 2002, 10:06:22 AM6/7/02
to
In article <adq5he$kgk$1...@pegasus.csx.cam.ac.uk>, j...@mrc-lmb.cam.ac.uk
wrote:

> I have an application which creates lists of lists. The internal list
> consists
> of 6 objects (3 1-byte strings and 3 doubles). The outer list may contain
> many
> copies of the 6-tuples. On a test with 250,000 tuples it appeared to take
> over
> 100Mb to store this data. This seems very excessive!
>
> However I have lots of objects. 1 + 250,000 * 7 (6 elements plus the list
> itself) == 1750001.

[...]


> My calculations show than 84Mb of my memory usage is in the Tcl_Obj structure
> itself, while the remainder (not entirely known as I only had gross
> estimates,
> but probably of the order 30 to 40Mb) is presumably in the memory that
> 'bytes'
> points to. I assume that the only way I can really save more is by repacking
> my data into a different data structure (eg strings and arrays).
>
> Any more thoughts?

You could represent your data as a MetaKit in-memory dataset:
http://www.equi4.com/metakit/

Ints are stored adaptively, using 1..32 bits/int. All data is managed
in column-wise (vectorized) order.

package require Mk4tcl
mk::file open db datafile
mk::view layout db.vec {a:I b:I c:I}
for {set i 0} {$i < 100000} {incr i} {
mk::row append db.vec a $i b $i c $i
}
mk::file commit db
mk::file close db
puts [file size datafile]

Output: 1200067, that's 30000 32-bit ints.

(With a loop to only 10000, the output is 60066, i.e. 30000 16-bit ints)

-jcw

j...@mrc-lmb.cam.ac.uk

unread,
Jun 7, 2002, 1:01:59 PM6/7/02
to
In <adqa3s$vqn$1...@tribune.mayo.edu> "Bob Techentin" <techenti...@mayo.edu> writes:

> You _could_ hack the configuration of Tcl_Obj to reduce memory
> footprint specifically for your alpha, but then you would probably
> have a lot of trouble with extensions. Even Tk might be very
> difficult to adapt. (I don't know - I've never tried.)

A different layout of the Tcl_Obj data structure would give be completely
backwards compatible at the source level (assuming no one is ever warped
enough to assume the ordering of the memory addresses used for particular
variables). However it will mean compiled code will break.

What's the Tcl/Tk stance of binary compatibility? Should compiled extensions
for Tcl8.3 work when loaded into Tcl8.4? My gut feeling is that attempting to
ensure this puts too many restrictions on things, and on MS Windows it's
virtually impossible to do such things as the dynamic libraries use function
'numbers' instead of symbolic names.

> Perhaps an alternative would be to change your application
> architecture to reduce the number of Tcl_Objs in your system. Your
> main data structure is a list of 250,000 elements, each of which is a
> list of 6 elements. If you instead constructed this as a list of
> 250,000 strings, you would decrease your object count by a factor of
> seven.

Indeed this is how I decided to go. I simply thought that using objects
appeared to be the "correct" way to go. It makes the tcl code nice and simple,
but clearly it's better to sacrifice a bit of speed and encode the data by
other means. Before Tcl_Obj I used to simply pack binary data into strings and
use binary format / binary scan to parse them. Maybe I should go back to this
technique :)

Georgios Petasis

unread,
Jun 7, 2002, 1:45:55 PM6/7/02
to
<j...@mrc-lmb.cam.ac.uk> wrote in message
news:adq5he$kgk$1...@pegasus.csx.cam.ac.uk...

> I have an application which creates lists of lists. The internal list
consists
> of 6 objects (3 1-byte strings and 3 doubles). The outer list may contain
many
> copies of the 6-tuples. On a test with 250,000 tuples it appeared to take
over
> 100Mb to store this data. This seems very excessive!
>
Actually I had a problem very similar to yours. I have written an app that
uses tcl objects as the internal representation for storing information.

In my situation the structure was:
{ {int obj} {string obj} {list obj 1} {list obj 2} }
list obj 1: { {int obj} {int obj} ...}
list obj 2: { {string obj { {string obj} {list obj} }} ...}

The solution I originally gave was to detect identical information and try
to
reduce the number of unique objects. This is very easy with a tcl hash
table.
I used a customised one, that uses objects as keys but compares list objects
without generating a string rep :-) This saved me about 25% memory!

Now, since I always want to reduce memory, I have created many new
tcl object types. They all store the same representation but in a compact
way,
i.e. not being a tuple of 4 elements but using an internal structure. If
"bad" code
wants to use them as a tcl list object, this is done by tcl. When the
objects are given
back to me for storage, I enforce a convertion back to the customised and
compact
object types, and I invalidate the string rep. This has given me a reduction
of 34%,
resulting in a total reduction of 64%. Not bad, don't you think?

Now, storing about 350.000 of such tuples (I really don't know
how many tcl objects due to re-use of objects) requires about 68 MB of
memory on
a 32-bit architecture (linux). These objects usually hold strings of a few
chars long
and long integers.

George


Joe English

unread,
Jun 7, 2002, 2:48:34 PM6/7/02
to
James Bonfield wrote:

>Bob Techentin > writes:
>> You _could_ hack the configuration of Tcl_Obj to reduce memory
>> footprint specifically for your alpha, but then you would probably
>> have a lot of trouble with extensions. Even Tk might be very
>> difficult to adapt. (I don't know - I've never tried.)
>
>A different layout of the Tcl_Obj data structure would give be completely
>backwards compatible at the source level (assuming no one is ever warped
>enough to assume the ordering of the memory addresses used for particular
>variables). However it will mean compiled code will break.
>
>What's the Tcl/Tk stance of binary compatibility? Should compiled extensions
>for Tcl8.3 work when loaded into Tcl8.4?

AFAIK, the policy is that extensions compiled with -DUSE_TCL_STUBS
against version 8.x headers can be dynamically loaded into Tcl
interpreters using Tcl 8.y iff y >= x.

If y <= x, this is _not_ guaranteed, even if the extension only uses
features that are present in version y. To date (8.1 through 8.4)
this level of backwards compatibility also works, but it is not
guaranteed.

More details here: <URL: http://mini.net/cgi-bin/wikit/stubs >

At any rate, changing the layout of a Tcl_Obj is probably a
non-starter until Tcl 9, since it would break stubs compatibility.


--Joe English

jeng...@flightlab.com

Jeff Hobbs

unread,
Jun 7, 2002, 3:11:30 PM6/7/02
to
j...@mrc-lmb.cam.ac.uk wrote:
>
> In <adqa3s$vqn$1...@tribune.mayo.edu> "Bob Techentin" <techenti...@mayo.edu> writes:
>
> > You _could_ hack the configuration of Tcl_Obj to reduce memory
> > footprint specifically for your alpha, but then you would probably
> > have a lot of trouble with extensions. Even Tk might be very
> > difficult to adapt. (I don't know - I've never tried.)
>
> A different layout of the Tcl_Obj data structure would give be completely
> backwards compatible at the source level (assuming no one is ever warped
> enough to assume the ordering of the memory addresses used for particular
> variables). However it will mean compiled code will break.
>
> What's the Tcl/Tk stance of binary compatibility? Should compiled extensions
> for Tcl8.3 work when loaded into Tcl8.4? My gut feeling is that attempting to
> ensure this puts too many restrictions on things, and on MS Windows it's
> virtually impossible to do such things as the dynamic libraries use function
> 'numbers' instead of symbolic names.

Binary compatability is a MUST. It is the whole base of stubs, and
thus this is something CANNOT change in 8.x. It is unfortunate that
the alignment in Tcl_Obj on 64bit architectures isn't great, but it
is something that we will have to live with in 8.x.

--
Jeff Hobbs The Tcl Guy
Senior Developer http://www.ActiveState.com/
Tcl Support and Productivity Solutions
Join us in Sept. for Tcl'2002: http://www.tcl.tk/community/tcl2002/

Ralf Fassel

unread,
Jun 7, 2002, 4:16:03 PM6/7/02
to
* j...@mrc-lmb.cam.ac.uk

| I have an application which creates lists of lists. The internal list consists
| of 6 objects (3 1-byte strings and 3 doubles). The outer list may contain many
| copies of the 6-tuples. On a test with 250,000 tuples it appeared to take over
| 100Mb to store this data. This seems very excessive!

Couldn't you at least for the 1-Byte strings use an array, instead of
creating new instances of those?

I.e.
Instead of
lappend list [list $byte1 $byte2 $byte3 ...]
go
foreach elt [list $byte1 $byte2 $byte3] {
if {![info exists array($elt)]} {
set array($elt) $elt
}
}
lappend list [list $array($byte1) $array($byte2) $array($byte3) ...]

There are at most 256 different 1-byte Objects, so this should save a
lot. For the doubles, I don't know... If they're all different, it
would not make much sense.

set max 250000
set list [list]
for {set i 0; set o 0} {$i < $max} {incr i} {
set elt [list]
for {set j 0} {$j < 3} {incr j; incr o} {
set new [expr {$o%256}]
# use array...
# if {![info exist array($new)]} {
# set array($new) $new
# }
# lappend elt $array($new)
# or direct:
lappend elt $new
}
lappend list $elt
}

If I use the direct lappend as shown, I get 39MB, if I use the
commented block, I get 25MB (reduce 35%) (32-bit machine).

R'

Chang Li

unread,
Jun 7, 2002, 6:30:11 PM6/7/02
to
j...@mrc-lmb.cam.ac.uk wrote in message news:<adq5he$kgk$1...@pegasus.csx.cam.ac.uk>...

That is where the struct and vector is needed.

Chang

lvi...@yahoo.com

unread,
Jun 8, 2002, 7:18:29 AM6/8/02
to

According to Bob Techentin <techenti...@mayo.edu>:
:Perhaps an alternative would be ...

followed by some good advice for the short term.

Seems to me that, in the long term, that patches so that all structures
are evaluated and rearranged into a memory efficient manner would be
a useful exercise for Tcl 9.0 (which, given the length of time for Tcl 8.4,
is likely to release in the next century <smile>)

--
Support Internet Radio <URL: http://saveinternetradio.org/ >
Join Tcl'2002 in Vancouver http://www.tcl.tk/community/tcl2002/
Even if explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.

Richard.Suchenwirth

unread,
Jun 10, 2002, 4:38:07 AM6/10/02
to
Come to think, the twoPtrValue might also be a natural storage place for
Lisp-like CAR/CDR constructs...
--
Schoene Gruesse/best regards, Richard Suchenwirth - +49-7531-86 2703
Siemens Dematic AG, PA RC D2, Buecklestr.1-5, 78467 Konstanz,Germany
Personal opinions expressed only unless explicitly stated otherwise.

David N. Welton

unread,
Jun 10, 2002, 4:46:05 AM6/10/02
to
"Richard.Suchenwirth" <Richard.Suchenw...@siemens.com> writes:

> Come to think, the twoPtrValue might also be a natural storage place
> for Lisp-like CAR/CDR constructs...

Don't they risk getting overwritten/transformed into something
unusable, like everything else in there?

--
David N. Welton
Consulting: http://www.dedasys.com/
Personal: http://www.dedasys.com/davidw/
Free Software: http://www.dedasys.com/freesoftware/
Apache Tcl: http://tcl.apache.org/

Donal K. Fellows

unread,
Jun 10, 2002, 6:12:41 AM6/10/02
to
"David N. Welton" wrote:
> "Richard.Suchenwirth" <Richard.Suchenw...@siemens.com> writes:
>> Come to think, the twoPtrValue might also be a natural storage place
>> for Lisp-like CAR/CDR constructs...
>
> Don't they risk getting overwritten/transformed into something
> unusable, like everything else in there?

It's no worse than for any other object type. According to current object
conversion semantics, everything in the internalRep is at risk. I don't expect
any change to that prior to Tcl9...

Donal.
--
Donal K. Fellows http://www.cs.man.ac.uk/~fellowsd/ fell...@cs.man.ac.uk
-- "I'm going to open a new xterm. This one's pissing me off" Anon. (overheard)

Donal K. Fellows

unread,
Jun 10, 2002, 6:08:21 AM6/10/02
to
j...@mrc-lmb.cam.ac.uk wrote:
> "Bob Techentin" <techenti...@mayo.edu> writes:
>> You _could_ hack the configuration of Tcl_Obj to reduce memory
>> footprint specifically for your alpha, but then you would probably
>> have a lot of trouble with extensions. Even Tk might be very
>> difficult to adapt. (I don't know - I've never tried.)

At the source level, there shouldn't be any problem. Unless someone is warped
enough to try the tricks mentioned just below...

> A different layout of the Tcl_Obj data structure would give be completely
> backwards compatible at the source level (assuming no one is ever warped
> enough to assume the ordering of the memory addresses used for particular
> variables). However it will mean compiled code will break.

IOW, there's no chance we'll take it on without a major version change.

> What's the Tcl/Tk stance of binary compatibility? Should compiled extensions
> for Tcl8.3 work when loaded into Tcl8.4? My gut feeling is that attempting
> to ensure this puts too many restrictions on things, and on MS Windows it's
> virtually impossible to do such things as the dynamic libraries use function
> 'numbers' instead of symbolic names.

James, may I introduce you to the "Stubs" mechansim? High degrees of backward
compatability are possible on all systems (yes, that does include Windows) when
you do just a little bit of extra work yourself (and we've got scripts to manage
most of the work; in practise the extra work is a #define and a function call
for extension authors, and a bit of care for core maintainers[*]) The main
problem with this is structure versioning; we're not about to add a version
field to Tcl_Objs (there is just so much binary code that depends on the present
layout, and it would make the memory consumption problem worse) so Tcl_Objs are
probably stuck until we start on Tcl9...

The big benefit of Stubs is that old extensions can keep on working with new
versions of Tcl without users doing anything (except perhaps reinstalling.) No
recompilation. No source hacking. It Just Works (and does so on loads of
platforms too.) You can't get much better than that, can you?

Donal.
[* Don't modify structures at all unless you control their allocation or
*everyone* knows about a versioning field, and don't modify function
signatures unless you move the name to a new place in the stub table and
leave a compatability function in the hole. All quite manageable. ]

Donal K. Fellows

unread,
Jun 10, 2002, 6:32:22 AM6/10/02
to
j...@mrc-lmb.cam.ac.uk wrote:
> I have an application which creates lists of lists. The internal list
> consists of 6 objects (3 1-byte strings and 3 doubles). The outer list
> may contain many copies of the 6-tuples. On a test with 250,000 tuples
> it appeared to take over 100Mb to store this data. This seems very
> excessive!

Are you replicating the data anywhere? Can you use some kind of sharing to cut
the amount of memory allocated? (When I added such a thing to [split] for the
split-into-chars case, the resulting code was far faster on long strings despite
the overhead of cache management.)

> However I have lots of objects. 1 + 250,000 * 7 (6 elements plus the list
> itself) == 1750001.
>
> Looking at the Tcl_Obj structure I can see one way of quickly saving 17%
> of memory (I use a 64-bit alpha).

Alphas (like all other 64-bit platforms) gobble memory with unseemly haste. Add
another GB or two and stop moaning! ;^)

> So starting with int, pointer, int, pointer uses two lots of 32-bit
> padding between the int and pointers. Simply rearranging this to int,
> int, pointer, pointer will save 8 bytes.

And forces you to Recompile The World. The Tcl_Obj structure definition is
public and rather widely used.

> Saving more memory gets tricky. The largest component of the union is the

> twoPtrValue (16 bytes on this system). [...]

It's used in quite a lot of places in the core IIRC.

> My calculations show than 84Mb of my memory usage is in the Tcl_Obj
> structure itself, while the remainder (not entirely known as I only had
> gross estimates, but probably of the order 30 to 40Mb) is presumably in
> the memory that 'bytes' points to.

The memory might also be occupied in things like UNICODE representations if
you're using strings. Plus, each list has a structure and an array associated
with it so as to provide constant time access and amortized-linear time append.

> I assume that the only way I can really save more is by repacking
> my data into a different data structure (eg strings and arrays).

Custom Tcl_Obj types can save a lot of memory by letting you store a whole
structure directly; I've used this in my own applications to cut memory usage.
And if you can increase the degree of sharing, that is a definite big win (and
it should be trivial to do for single character strings - have a look at the
source to Tcl_SplitObjCmd for an example - though sharing the floats could be
trickier. It depends on the details of the application.)

Donal.

Bob Techentin

unread,
Jun 10, 2002, 9:43:31 AM6/10/02
to
<j...@mrc-lmb.cam.ac.uk> wrote:
> I have an application which creates lists of lists. The internal
> list consists of 6 objects (3 1-byte strings and 3 doubles).
> The outer list may contain many copies of the 6-tuples. On
> a test with 250,000 tuples it appeared to take over
> 100Mb to store this data. This seems very excessive!
>
> However I have lots of objects. 1 + 250,000 * 7 (6 elements
> plus the list itself) == 1750001.
>
> Looking at the Tcl_Obj structure I can see one way of
> quickly saving 17% of memory (I use a 64-bit alpha).
>
[snip]

Now that I've mulled it over a bit, I have a completely different
suggestion, James.

If you are really considering munging up Tcl_Obj and recompiling a
custom Tcl library and extensions, then surely you would not be
intimidated by a relatively simple compiled extension. A compiled
extension would be efficient, robust, and portable. Overall a much
better solution that hacking the Tcl core.

You could write a relatively simple extension to provide a container
for all this data. Start with a simple structure or class to hold the
single-byte strings and doubles, create a large list, vector, or other
container. Then add a few accessor methods, wrap them in Tcl
commands, and load this compiled extension into your interpreter.
Then you have a very memory efficient way to store all your data, and
access is as easy (or even easier) than a list-of-lists Tcl structure.

Take a look at SWIG (http://www.swig.org/). There is an excellent
chapter on Tcl in the users manual, which includes examples of
wrapping data structures with script code.

Good luck,


Bob
--
Bob Techentin techenti...@mayo.edu
Mayo Foundation (507) 538-5495
200 First St. SW FAX (507) 284-9171
Rochester MN, 55901 USA http://www.mayo.edu/sppdg/

<j...@mrc-lmb.cam.ac.uk> wrote in message
news:adq5he$kgk$1...@pegasus.csx.cam.ac.uk...

lvi...@yahoo.com

unread,
Jun 10, 2002, 1:35:06 PM6/10/02
to

According to Bob Techentin <techenti...@mayo.edu>:
:If you are really considering munging up Tcl_Obj

I thought that James' suggestion was a mere rearrangement of the order
of members in the structure. If so, that's a LONG ways from writing
a custom extension.

However, James, if the idea of a custom extension DOESN'T turn you off,
take a look at what George H does in BLT with vector data - perhaps
there are concepts which you could model after ...

j...@mrc-lmb.cam.ac.uk

unread,
Jun 10, 2002, 2:51:25 PM6/10/02
to
In <ae2o0a$e0s$2...@srv38.cas.org> lvi...@yahoo.com writes:


> However, James, if the idea of a custom extension DOESN'T turn you off,
> take a look at what George H does in BLT with vector data - perhaps
> there are concepts which you could model after ...

Actually this already is in a compiled extension which creates the Tcl list as
it's result simply because it seemed like the "right thing to do". I'm
perfectly happy about changing the way I create the data structures, but my
observation still holds true - the Tcl_Obj structure is needlessly
inefficient.

David Gravereaux

unread,
Jun 10, 2002, 3:35:45 PM6/10/02
to
j...@mrc-lmb.cam.ac.uk wrote:

>on MS Windows it's
>virtually impossible to do such things as the dynamic libraries use function
>'numbers' instead of symbolic names.

it's not often, at least in Tcl, that one expects the slot index (aka
ordinal) to be the same across versions anyways. This feature of
specifying ordinals during the link of the dll isn't used. It could be
used, but I don't know what it would do as an improvement.
--
David Gravereaux <davy...@pobox.com>
[species: human; planet: earth,milkyway,alpha sector]
Please be aware of the 7.5 year ping times when placing a call from alpha centari

David Gravereaux

unread,
Jun 10, 2002, 3:45:13 PM6/10/02
to
j...@mrc-lmb.cam.ac.uk wrote:

>What's the Tcl/Tk stance of binary compatibility? Should compiled extensions
>for Tcl8.3 work when loaded into Tcl8.4?

As Donal points out, "Stubs" is the way.

http://www.tcl.tk/doc/howto/stubs.html

I'd swear that page is truncated. I remember it being three times longer.

Chang Li

unread,
Jun 10, 2002, 8:14:54 PM6/10/02
to
j...@mrc-lmb.cam.ac.uk wrote in message news:<ae2sfd$gji$1...@pegasus.csx.cam.ac.uk>...

> > However, James, if the idea of a custom extension DOESN'T turn you off,
> > take a look at what George H does in BLT with vector data - perhaps
> > there are concepts which you could model after ...
>
> Actually this already is in a compiled extension which creates the Tcl list as
> it's result simply because it seemed like the "right thing to do". I'm
> perfectly happy about changing the way I create the data structures, but my
> observation still holds true - the Tcl_Obj structure is needlessly
> inefficient.
>

Reorder two int type in the Tcl_Obj is a good optimization.
It is also benefit some 32-bit compilers because we often
select 8 byte alignment for structure. It should not cause
any compatible problems (?).

Chang


> James

lvi...@yahoo.com

unread,
Jun 11, 2002, 8:24:17 AM6/11/02
to

According to Chang Li <CHA...@neatware.com>:
:j...@mrc-lmb.cam.ac.uk wrote in message
:news:<ae2sfd$gji$1...@pegasus.csx.cam.ac.uk>...
:> but my

:> observation still holds true - the Tcl_Obj structure is needlessly
:> inefficient.
:>
:
:Reorder two int type in the Tcl_Obj is a good optimization.
:It is also benefit some 32-bit compilers because we often
:select 8 byte alignment for structure. It should not cause
:any compatible problems (?).

James, I agree with your observation. And, as people have mentioned, this
is something that can be addressed during Tcl 9.

Chang, it certainly _will_ cause compatibility problems, since it would
cause all binary extensions to cease to work with with the tcl interpretered...

Chang Li

unread,
Jun 11, 2002, 3:05:37 PM6/11/02
to
lvi...@yahoo.com wrote in message news:<ae4q5h$cip$5...@srv38.cas.org>...

>
> James, I agree with your observation. And, as people have mentioned, this
> is something that can be addressed during Tcl 9.
>
> Chang, it certainly _will_ cause compatibility problems, since it would
> cause all binary extensions to cease to work with with the tcl interpretered...

Bad luck! The code related to the order of struct emlements must be great.
However, for the 64-bit extensions, you need to recompile them all,
so there are no binary compatible problems. We may add a #ifdef to branch
the header. I agree not to change on 32-bit kernel because there are
too many wonderful binarys. But for 64-bit there exists an opportunity.

To wait for Tcl 9, that may be 10 years later :-) You can count 1 year TIP
and 1 year beta for 8.5, then 8.6, ..., it is 10 years :-) And I still doubt
even for Tcl 9, is it worth to break the backward compatibility.

Chang

Donal K. Fellows

unread,
Jun 12, 2002, 11:36:10 AM6/12/02
to
Chang Li wrote:
> To wait for Tcl 9, that may be 10 years later :-) You can count 1 year TIP
> and 1 year beta for 8.5, then 8.6, ..., it is 10 years :-) And I still doubt
> even for Tcl 9, is it worth to break the backward compatibility.

Speaking for myself, it has taken *far* too long for 8.4 to get out the door.
And it has not yet been settled whether the next release after 8.4 will be 8.5
or 9.0; it really depends on what features get put in (if we decide to cull out
old functions from the stub tables[*] or change crucial structures, like Tcl_Obj
or Tcl_ObjType, that'll force a major version change for sure.)

Donal.
[* There's quite a few places where this could be done with. ]

-- Actually, come to think of it, I don't think your opponent, your audience,
or the metropolitan Tokyo area would be in much better shape.
-- Jeff Huo <je...@starfall.com.nospam>

lvi...@yahoo.com

unread,
Jun 12, 2002, 11:46:57 AM6/12/02
to

According to Chang Li <CHA...@neatware.com>:
:lvi...@yahoo.com wrote in message news:<ae4q5h$cip$5...@srv38.cas.org>...

:>
:> James, I agree with your observation. And, as people have mentioned, this
:> is something that can be addressed during Tcl 9.
:>
:> Chang, it certainly _will_ cause compatibility problems, since it would
:> cause all binary extensions to cease to work with with the tcl
:interpretered...
:
:Bad luck! The code related to the order of struct emlements must be great.
:However, for the 64-bit extensions, you need to recompile them all,
:so there are no binary compatible problems. We may add a #ifdef to branch
:the header. I agree not to change on 32-bit kernel because there are
:too many wonderful binarys. But for 64-bit there exists an opportunity.

There seems to me to be a language communication problem going on here.

When I talk about binary extensions above, I am not specifically referring
to extensions which are compiled and then installed.

I am talking about the actual .a and .so files. The STUBs mechanism is such
that, as long as structures do not change, one can use a new version of Tcl
with a .a or .so file that was compiled 2, 3, or more years ago.

Changing the order of elements in a structure breaks that arrangement, and
forces every use of the new Tcl to have to recompile every extension.

While that may very well work for you, it does not work for everyone.

George A. Howlett

unread,
Jun 13, 2002, 6:01:52 AM6/13/02
to
j...@mrc-lmb.cam.ac.uk wrote:

> A different layout of the Tcl_Obj data structure would give be completely
> backwards compatible at the source level (assuming no one is ever warped
> enough to assume the ordering of the memory addresses used for particular
> variables). However it will mean compiled code will break.

You're absolutely right. It will break binary compatibility to make
Tcl/Tk 64-bit ready. Data alignments will force structures to be
reordered. Not all 64-bit platforms are forgiving about unaligned
data.

My concern is that in the absence of a real 64-bit port, Tcl will be
replaced in scientific and enginneering applications with other
scripting languages. I suspect this will become more important when
McKinley/Opteron are available.

--gah

Donal K. Fellows

unread,
Jun 13, 2002, 8:31:25 AM6/13/02
to
"George A. Howlett" wrote:
> My concern is that in the absence of a real 64-bit port, Tcl will be
> replaced in scientific and enginneering applications with other
> scripting languages. I suspect this will become more important when
> McKinley/Opteron are available.

If it didn't involve breaking the whole world and his dog, I'd favour doing
something about it right now. (In theory, we could use #ifdef's to restrict the
change to real 64-bit platforms while leaving 32-bit platforms[*] alone, but
that'd be really icky.)

Tcl needs a number of things doing to it to be truly 64-bit ready, as opposed to
just functioning correctly though inefficiently on 64-bit platforms. This
thread just highlights one of them (and not even the most serious one.)

Donal.
[* Which are far more common. ]
--
"[He] would have needed to sell not only his own soul, but have somehow gotten
in on the ground floor of an Amway-like pryamid scheme delivering the souls
of kindergarten students to Satan by the truckload like so many boxes of Girl
Scout Cookies." -- John S. Novak, III <j...@concentric.net>

j...@mrc-lmb.cam.ac.uk

unread,
Jun 13, 2002, 12:07:08 PM6/13/02
to
In <3D08909D...@cs.man.ac.uk> "Donal K. Fellows" <fell...@cs.man.ac.uk> writes:

> change to real 64-bit platforms while leaving 32-bit platforms[*] alone, but
> that'd be really icky.)

> [* Which are far more common. ]

I'm not so convinced now. Although the Alpha I'm using was one of the early
64-bit systems, now this also includes Sun, HP and SGI. Ie pretty much all
mainstream Unix systems are 64-bit now. The only exceptions are Intel based
systems (Linux and Windows) and the few Macs out there (although I'm not sure
about those - perhaps modern ones are 64-bit too).

Granted there are many more PC users than unix workstation users, but given
Intel have a 64-bit chip already out there it'll perhaps only be another year
before that changes too. Is Tcl 9 going to be ready before IA64 becomes
mainstream? Probably not...

It's a thorny issue and one that I didn't fully understand (not having
investigated stubs) when I started this thread. I now see the issues in
changing data structures and hence the dilema involved. I don't much like the
idea of hundreds of ifdefs everywhere either, and in my case I can live with
the lack of proper 64bit support.

To be terribly controversial about the whole thing I've started wondering
whether the benefits that stubs bring are outweighed by the lack of
flexibility they impose on the language. In an open-source environment binary
compatibility isn't as big an issue as the freedom to make radical
changes. Any developer ought to have enough knowledge, given source code, to
rebuild extensions. I do get the general feeling that Tcl is dying and issues
like this don't aid that. Just my random thoughts - take them with a pinch of
salt as I've been using Tcl for many years and are, on the whole, very happy
with it.

Joe English

unread,
Jun 13, 2002, 12:46:33 PM6/13/02
to
James Bonfield wrote:
>
>To be terribly controversial about the whole thing I've started wondering
>whether the benefits that stubs bring are outweighed by the lack of
>flexibility they impose on the language.

The advent of Stubs has made a *huge* difference for me.
Upgrading Tcl and Tk used to be a major undertaking; it
would take days, sometimes weeks, to hunt down and recompile
all the various extensions, on all the different platforms.
In many cases the extensions themselves would need to be upgraded
(source-level compatibility has _usually_ been pretty good, but
not 100%, and the build system has always been problematic.)

Now it's possible to just rebuild the core and update extensions
piecemeal, or not at all. This is especially helpful during
beta cycles.

The Stubs mechanism also solves some really hairy system-specific
dynamic linking issues.


--Joe English

jeng...@flightlab.com

Chang Li

unread,
Jun 13, 2002, 10:44:34 PM6/13/02
to
j...@mrc-lmb.cam.ac.uk wrote in message news:<aeafvc$4s2$1...@pegasus.csx.cam.ac.uk>...

> In <3D08909D...@cs.man.ac.uk> "Donal K. Fellows" <fell...@cs.man.ac.uk> writes:
>
> > change to real 64-bit platforms while leaving 32-bit platforms[*] alone, but
> > that'd be really icky.)
> > [* Which are far more common. ]
>

Icky? To keep one source base on 32-bit and 64-bit there may have
more.



> I'm not so convinced now. Although the Alpha I'm using was one of the early
> 64-bit systems, now this also includes Sun, HP and SGI. Ie pretty much all
> mainstream Unix systems are 64-bit now. The only exceptions are Intel based
> systems (Linux and Windows) and the few Macs out there (although I'm not sure
> about those - perhaps modern ones are 64-bit too).
>

Unfortunately, Alpha is dead because HP/Compaq is replacing it with
Itanium.
However, 64-bit processor is popular in Unix today. Intel will release
its second generation Itanium 2 within three months. AMD's Opteron
will be released
in the end of this year. So for large applications 64-bit is the way
to go.



> Granted there are many more PC users than unix workstation users, but given
> Intel have a 64-bit chip already out there it'll perhaps only be another year
> before that changes too. Is Tcl 9 going to be ready before IA64 becomes
> mainstream? Probably not...
>
> It's a thorny issue and one that I didn't fully understand (not having
> investigated stubs) when I started this thread. I now see the issues in
> changing data structures and hence the dilema involved. I don't much like the
> idea of hundreds of ifdefs everywhere either, and in my case I can live with
> the lack of proper 64bit support.

I do not think hundreds of #ifdefs are needed in this case. You may
need
a --enable--64bit during compiling. There is similar option used
before
like --enable--thread. Contrast to the thread codes embedded in the
core
the size of 64-bit codes is much small.

The key point for me is to keep a single source code base for both
32-bit and 64-bit. That is good for code maintaince.

>
> To be terribly controversial about the whole thing I've started wondering
> whether the benefits that stubs bring are outweighed by the lack of
> flexibility they impose on the language. In an open-source environment binary
> compatibility isn't as big an issue as the freedom to make radical
> changes. Any developer ought to have enough knowledge, given source code, to
> rebuild extensions. I do get the general feeling that Tcl is dying and issues
> like this don't aid that. Just my random thoughts - take them with a pinch of
> salt as I've been using Tcl for many years and are, on the whole, very happy
> with it.

Stub is a very important feature of Tcl. Other languages are lacking
of this.
Tcl + C reaches the best performance and richest features. Without
stub the upgrade will be painful. In many cases I think recompile
source code is waste
time if I have not modified the source code. Recompile and
installation are
belong to the "dirty" job.

Chang

>
> James

j...@mrc-lmb.cam.ac.uk

unread,
Jun 14, 2002, 5:07:21 AM6/14/02
to

> The advent of Stubs has made a *huge* difference for me.

This is what I like to here. By being deliberately controversial I was rather
hoping my own experiences would be put into context with people claiming how
useful stubs are :)

> The Stubs mechanism also solves some really hairy system-specific
> dynamic linking issues.

Now that I can really sympathise with! Dynamic linking, especially on
hideously broken systems like windows, is something which varies a lot from
system to system. Tcl's load command has been really useful for us.

Donal K. Fellows

unread,
Jun 14, 2002, 11:00:35 AM6/14/02
to
j...@mrc-lmb.cam.ac.uk wrote:
> I'm not so convinced now. Although the Alpha I'm using was one of the early
> 64-bit systems, now this also includes Sun, HP and SGI. Ie pretty much all
> mainstream Unix systems are 64-bit now. The only exceptions are Intel based
> systems (Linux and Windows) and the few Macs out there (although I'm not
> sure about those - perhaps modern ones are 64-bit too).

But a lot of these systems can run in either 32-bit or 64-bit mode. I'm pretty
sure that most builds of Tcl are 32-bit builds, given the extremely slow rate at
which 64-bit-related bugs are reported... ;^)

As I've said, there's a number of outstanding 64-bit issues of which structure
alignment and packing is not the most pressing. Not even close. (Large memory
object handling is the number one problem, and tinkering with that breaks a
truly enormous amount of the core.)

Donal.

"[Windows] XP does feel like it was designed by Fisher Price, I agree."
-- Steve Allen <all...@cs.man.ac.uk>

j...@mrc-lmb.cam.ac.uk

unread,
Jun 14, 2002, 1:17:06 PM6/14/02
to
In <3D0A0513...@cs.man.ac.uk> "Donal K. Fellows" <fell...@cs.man.ac.uk> writes:

> But a lot of these systems can run in either 32-bit or 64-bit mode. I'm pretty
> sure that most builds of Tcl are 32-bit builds, given the extremely slow rate at
> which 64-bit-related bugs are reported... ;^)

Indeed we do the same. However there's not really an option as far as
structure packing and size of pointers go (unless you want really slow code).
Anyway, griping aside I _am_ happy with Tcl so please don't take my comments
too personally :)

> alignment and packing is not the most pressing. Not even close. (Large memory
> object handling is the number one problem, and tinkering with that breaks a
> truly enormous amount of the core.)

This is somewhere that size_t needs to be used. It's really hard to do so in a
rigorous way, especially as functions such as printf don't have format
specifiers for these 'unknown' types (I cast them to long and use %ld, just
incase). How many times do we use loop variables stepping through arrays with
"int i" for example? It should be 'size_t i'. I imagine 99% of the C
programmers out there (myself included) are guilty of this.

Joe English

unread,
Jun 14, 2002, 2:49:22 PM6/14/02
to
James Bonfield wrote:
> [...]

>This is somewhere that size_t needs to be used. It's really hard to do so in a
>rigorous way, especially as functions such as printf don't have format
>specifiers for these 'unknown' types (I cast them to long and use %ld, just
>incase).

That was the correct way to printf() implementation-defined
integral types in C89, but C99 broke it. For C99, you have
to use something like:

#include <inttypes.h>
...
size_t nbytes = ... ;
...
printf("nbytes = %" PRIdMAX "\n", (intmax_t)nbytes);

since 'long' is no longer guaranteed to be the largest
integral type.


>How many times do we use loop variables stepping through arrays with
>"int i" for example? It should be 'size_t i'. I imagine 99% of the C
>programmers out there (myself included) are guilty of this.

Using an 'int' as an array index is actually perfectly legitimate
(as long as it's known that there are no more than INT_MAX elements
in the array). 'size_t' is only required when you want to represent
the number of bytes in an (arbitrary) object.

--Joe English

jeng...@flightlab.com

Donal K. Fellows

unread,
Jun 19, 2002, 5:45:19 AM6/19/02
to
j...@mrc-lmb.cam.ac.uk wrote:

> Donal K. Fellows writes:
>> alignment and packing is not the most pressing. Not even close. (Large
>> memory object handling is the number one problem, and tinkering with that
>> breaks a truly enormous amount of the core.)
>
> This is somewhere that size_t needs to be used.

It's worse than that. In a lot of places, numeric values that refer to sizes
need to be signed. And sizes and indices are passed around by reference or by
return-value. And everything is highly interconnected.

I've tried to take on this challenge, tried and failed. There's a humungous
mess in there, and I don't see a way to fix it nicely without breaking backward
compatability. :^(

[The other points you raise are dancing around the edge of the topic...]

-- Prices aren't rising - discounts are falling! It's WalMart-in-reverse! I'm
not spending more, I'm saving less! -- Chris Ahlstrom <ahls...@home.com>

Chang Li

unread,
Jun 19, 2002, 1:43:37 PM6/19/02
to
j...@mrc-lmb.cam.ac.uk wrote in message news:<aed8ei$nrl$1...@pegasus.csx.cam.ac.uk>...

> In <3D0A0513...@cs.man.ac.uk> "Donal K. Fellows" <fell...@cs.man.ac.uk> writes:
>
> This is somewhere that size_t needs to be used. It's really hard to do so in a
> rigorous way, especially as functions such as printf don't have format
> specifiers for these 'unknown' types (I cast them to long and use %ld, just
> incase). How many times do we use loop variables stepping through arrays with
> "int i" for example? It should be 'size_t i'. I imagine 99% of the C
> programmers out there (myself included) are guilty of this.
>

No guilty. Use "size_t i" instead of "int i" will break tons of C++ codes.
That is why LP64 model effective. There are no main benefits to 64-bit
array index. You can use pointer instead.

Chang

> James

George A. Howlett

unread,
Jun 19, 2002, 8:54:27 PM6/19/02
to
j...@mrc-lmb.cam.ac.uk wrote:

> This is somewhere that size_t needs to be used. It's really hard to do so in a
> rigorous way, especially as functions such as printf don't have format
> specifiers for these 'unknown' types (I cast them to long and use %ld, just
> incase). How many times do we use loop variables stepping through arrays with
> "int i" for example? It should be 'size_t i'. I imagine 99% of the C
> programmers out there (myself included) are guilty of this.

I'd like to get started on it, with full knowledge that it won't be
right the first time. I'd prefer to wade ahead with a 64-bit branch,
unencumbered by 32-bit backward compatibility issues. I'd like a real
64-bit version, one that takes advantage of the 64-bit word length and
the greater address space.

I'm sure there are places in Tcl and Tk where routines will compile
and run on 32 or 64 bit architectures with no change.

I also know there are places where structures will need to be
reordered, ints changed to longs (or long longs), and int32_t and
int64_t will be required. For example, there's lots of places where
ints are used as counters. The code still works on 64-bit platforms,
but it also limits how many things can be counted. As unlikely as it
may seem right now, I'm eventually going to want my hash table to hold
more than 2^32 entries.

--gah

Chang Li

unread,
Jun 20, 2002, 10:11:50 AM6/20/02
to
"George A. Howlett" <g...@gambit.xmen.net> wrote in message news:<uh29u3m...@corp.supernews.com>...

> j...@mrc-lmb.cam.ac.uk wrote:
>
>
> I'd like to get started on it, with full knowledge that it won't be
> right the first time. I'd prefer to wade ahead with a 64-bit branch,
> unencumbered by 32-bit backward compatibility issues. I'd like a real
> 64-bit version, one that takes advantage of the 64-bit word length and
> the greater address space.
>

In almost all 64-bit Unix/Linux systems, they are LP64 model, that is the
int is 32-bit, long and pointer are 64-bit. So to replace int as long
you can use the 64-bit integer. However, Winodws 64-bit is the P64 model.
So its long is still 32-bit. This raises the cross platform problem.
MS likes to be difference.



> I'm sure there are places in Tcl and Tk where routines will compile
> and run on 32 or 64 bit architectures with no change.
>
> I also know there are places where structures will need to be
> reordered, ints changed to longs (or long longs), and int32_t and
> int64_t will be required. For example, there's lots of places where
> ints are used as counters. The code still works on 64-bit platforms,
> but it also limits how many things can be counted. As unlikely as it
> may seem right now, I'm eventually going to want my hash table to hold
> more than 2^32 entries.

Hash table is the algorithm who is best suitable for 64-bit.

George, I checked your TIP for 64-bit Hash algorithm, but wondering
it has not been approved to add into Tcl core. Maybe you need to
ask for vote.

Chang

>
> --gah

j...@mrc-lmb.cam.ac.uk

unread,
Jun 20, 2002, 1:17:42 PM6/20/02
to

> "George A. Howlett" <g...@gambit.xmen.net> wrote in message news:<uh29u3m...@corp.supernews.com>...
> > j...@mrc-lmb.cam.ac.uk wrote:
> >
> >
> > I'd like to get started on it, with full knowledge that it won't be

Just to clarify confusion - the above statement was written by George Howlett,
not myself.

(Chang Li) writes:
> In almost all 64-bit Unix/Linux systems, they are LP64 model, that is the
> int is 32-bit, long and pointer are 64-bit. So to replace int as long
> you can use the 64-bit integer. However, Winodws 64-bit is the P64 model.
> So its long is still 32-bit. This raises the cross platform problem.
> MS likes to be difference.

The most sensible strategy is just to define these as typedefs and get on with
it from there. The P64 model almost certainly has a 64 bit integer type -
quite likely the ghastly "long long int" type! Why they couldn't have just
decided long to mean the longest available integral type I do not know. Well
I do actually - it'd expose too much broken code. I started working with the
DEC Alphas back in the "field test" days and it was amazing how much stuff
just didn't compile (including lots of the GNU stuff). It's all better now of
course.

Anyway you may find that there's already checks and typedefs for this sort of
thing in the GNU autoconf system. It's bound to be a common
requirement. That'll cover most of the unix systems. For non-unix you can then
just use the same typedefs that autoconf uses, but implemented by hand in the
windows/mac specific header files.

George A. Howlett

unread,
Jun 20, 2002, 10:55:41 PM6/20/02
to
Chang Li <CHA...@neatware.com> wrote:

> Hash table is the algorithm who is best suitable for 64-bit.

I've been using the improved version in BLT for some months now. I've
tested it on HPUX and Solaris 2.8. I don't have access to a Win64
box, but I assume it should be fine there too.

> George, I checked your TIP for 64-bit Hash algorithm, but wondering
> it has not been approved to add into Tcl core. Maybe you need to
> ask for vote.

64-bit support isn't a front-burner item for most people yet. It
probably won't be (except for engineering and scientific folks) until
x86-64 is cheaply available. I work in E-CAD where it's already an
issue. I offered the improvements to spur porting Tcl to 64-bits.

--gah


Jeffrey Hobbs

unread,
Jun 21, 2002, 1:02:07 AM6/21/02
to
"George A. Howlett" wrote:

> Chang Li <CHA...@neatware.com> wrote:
> > George, I checked your TIP for 64-bit Hash algorithm, but wondering
> > it has not been approved to add into Tcl core. Maybe you need to
> > ask for vote.
>
> 64-bit support isn't a front-burner item for most people yet. It
> probably won't be (except for engineering and scientific folks) until
> x86-64 is cheaply available. I work in E-CAD where it's already an
> issue. I offered the improvements to spur porting Tcl to 64-bits.

In addition to this, it is not currently binary compatible with
previous versions, which breaks the benefit of stubs. While this
is becoming a more important issue, we either have to determine how
to fit it into 8.x while still maintaining binary compatability, or
we need to start thinking about Tcl 9.0 sooner rather than later.

--
Jeff Hobbs The Tcl Guy
Senior Developer http://www.ActiveState.com/
Tcl Support and Productivity Solutions

George A. Howlett

unread,
Jun 22, 2002, 3:14:37 AM6/22/02
to
Jeffrey Hobbs <Je...@activestate.com> wrote:

> In addition to this, it is not currently binary compatible with
> previous versions, which breaks the benefit of stubs. While this
> is becoming a more important issue, we either have to determine how
> to fit it into 8.x while still maintaining binary compatability, or
> we need to start thinking about Tcl 9.0 sooner rather than later.

It's safe to say that any real 64-bit port will start it's own
compatibility baseline (9.0?). It's unlikely that a 32-bit extension
will work with a 64-bit tclsh or wish. I believe that "dlopen" will
prevent you from loading a 32-bit library into a 64-bit application.
Another reason is that changing Tcl and Tk to make use of 64-bit
performance and addressing will necessarily change internal
structures. That's the reason I say "real" 64-bit port.

Stubs can provide the same benefits for 64-bit versions of Tcl/Tk--
that of backlinking. One thing to consider is separating the 64-bit
sources from the 32-bit. The benfits are that 1) 32-bit applications
can still retain binary compatibility, even beyond 9.0, 2) you avoid
#ifdef hell, and 3) the 64-bit port can take shape without worrying
how it affects the 32-bit version.

--gah


My own opinion is that it doesn't make sense
to make 64-bit versions of all the Tcl API calls.


To do so would defeat the purpose of a 64-bit
version.


The major benefit of stubs -- backlinking --

It's safe to say that it's 1) unlikely that a 32-bit extension will
work with a 64-bit tclsh or wish and 2) not very useful that it does

Chang Li

unread,
Jun 23, 2002, 12:50:42 AM6/23/02
to
"George A. Howlett" <g...@gambit.xmen.net> wrote in message news:<uh88uta...@corp.supernews.com>...

> Jeffrey Hobbs <Je...@activestate.com> wrote:
>
> > In addition to this, it is not currently binary compatible with
> > previous versions, which breaks the benefit of stubs. While this
> > is becoming a more important issue, we either have to determine how
> > to fit it into 8.x while still maintaining binary compatability, or
> > we need to start thinking about Tcl 9.0 sooner rather than later.
>
> It's safe to say that any real 64-bit port will start it's own
> compatibility baseline (9.0?). It's unlikely that a 32-bit extension
> will work with a 64-bit tclsh or wish. I believe that "dlopen" will
> prevent you from loading a 32-bit library into a 64-bit application.
> Another reason is that changing Tcl and Tk to make use of 64-bit
> performance and addressing will necessarily change internal
> structures. That's the reason I say "real" 64-bit port.
>

It could but need VERY complicated wrapper on both 32-bit and 64-bit.
And I think it is worthless.

> Stubs can provide the same benefits for 64-bit versions of Tcl/Tk--
> that of backlinking. One thing to consider is separating the 64-bit
> sources from the 32-bit. The benfits are that 1) 32-bit applications
> can still retain binary compatibility, even beyond 9.0, 2) you avoid
> #ifdef hell, and 3) the 64-bit port can take shape without worrying
> how it affects the 32-bit version.
>

There are pros and cons. The major cons are maintance of source code.
When you fork source code base into two, there is a synchronization for
upgrade and bug fixed.

I like the separation because we can drop off the burden of the old codes
and add more new special features in 64-bit. The Tcl source compatibility
is enough.

The selection is revolution or evolution.

Chang

Donal K. Fellows

unread,
Jun 24, 2002, 9:02:09 AM6/24/02
to
"George A. Howlett" wrote:
> It's safe to say that any real 64-bit port will start it's own
> compatibility baseline (9.0?). It's unlikely that a 32-bit extension
> will work with a 64-bit tclsh or wish. I believe that "dlopen" will
> prevent you from loading a 32-bit library into a 64-bit application.

IME, they are usually represented as different machine architectures in the
binary header, and might even be using different instruction sets. Suffice to
say, nobody's seriously trying to interwork 32-bit and 64-bit binary modules.
Or not for long unless they are überhackers... ;^)

> Another reason is that changing Tcl and Tk to make use of 64-bit
> performance and addressing will necessarily change internal
> structures. That's the reason I say "real" 64-bit port.

As I've said, I looked into doing 64-bit adaption properly, but the changes tend
to spread throughout the core rather rapidly. It's more work than you might
naïvely expect, due mainly to TclExecuteByteCode. :^(

> Stubs can provide the same benefits for 64-bit versions of Tcl/Tk--
> that of backlinking. One thing to consider is separating the 64-bit
> sources from the 32-bit. The benfits are that 1) 32-bit applications
> can still retain binary compatibility, even beyond 9.0, 2) you avoid
> #ifdef hell, and 3) the 64-bit port can take shape without worrying
> how it affects the 32-bit version.

One major disadvantage: it forks the code. I'm *really* keen on keeping the
number of active CVS branches down so that unrelated fixes get picked up nicely
instead of having to be added by hand. However, when we do decide to work
towards a major version change (and I'm keen on there never being an 8.5; I
think that this issue, and a number of others too, merit a major change) this
should be one of the things that gets addressed as early in the cycle as
possible.

So... I strongly support doing something about this. Just not yet.

"If RedHat are so purblind that they think computing is about ever fancier
'desktop themes', they are in the interior design business, and as everyone
knows, if you can piss you can paint." -- Steve Blinkhorn

George A. Howlett

unread,
Jun 24, 2002, 1:53:33 PM6/24/02
to
Donal K. Fellows <fell...@cs.man.ac.uk> wrote:

> As I've said, I looked into doing 64-bit adaption properly, but the
> changes tend to spread throughout the core rather rapidly. It's

> more work than you might naively expect, due mainly to
> TclExecuteByteCode. :^(

I've looked at TclExecuteByteCode. Can you enumerate what makes this
routine any harder to port?

>> One thing to consider is separating the 64-bit sources from the
>> 32-bit. The benfits are that 1) 32-bit applications can still
>> retain binary compatibility, even beyond 9.0, 2) you avoid #ifdef
>> hell, and 3) the 64-bit port can take shape without worrying how it
>> affects the 32-bit version.

> One major disadvantage: it forks the code. I'm *really* keen on
> keeping the number of active CVS branches down so that unrelated
> fixes get picked up nicely instead of having to be added by hand.

I don't think you can get around that problem. It's going to take
human intervention to determine if the fixes made against the 32-bit
sources will also work for the 64-bits. I wouldn't blindly trust
those changes.

The code base is going be forked regardless--it's just a matter of
how. You can #ifdef the 64-bit parts in the same CVS branch or you
can have another branch. I think with all the structure redefinitions
it's not worth trying to keep everything in the same branch.

The other advantage is that a 64-bit port can be started now, rather
than later.

--gah

Jeff Hobbs

unread,
Jun 24, 2002, 2:37:25 PM6/24/02
to
"George A. Howlett" wrote:
>
> Donal K. Fellows <fell...@cs.man.ac.uk> wrote:
>
> > As I've said, I looked into doing 64-bit adaption properly, but the
> > changes tend to spread throughout the core rather rapidly. It's
> > more work than you might naively expect, due mainly to
> > TclExecuteByteCode. :^(
>
> I've looked at TclExecuteByteCode. Can you enumerate what makes this
> routine any harder to port?

One issue that I know of, and likely similar ones exist, is that some
of the bytecodes rely on moving actually ints around as ints on the
stack (rather than a pointer to an int obj). I believe this is the
reason that 'incr' is the one command that currently doesn't like
wide integers in 8.4. While correcting this isn't a science project,
it is something that must be done and carefully tested.

--
Jeff Hobbs The Tcl Guy
Senior Developer http://www.ActiveState.com/
Tcl Support and Productivity Solutions

Join us in Sept. for Tcl'2002: http://www.tcl.tk/community/tcl2002/

George A. Howlett

unread,
Jun 24, 2002, 4:17:20 PM6/24/02
to
Jeff Hobbs <Je...@activestate.com> wrote:
>> I've looked at TclExecuteByteCode. Can you enumerate what makes this
>> routine any harder to port?

> One issue that I know of, and likely similar ones exist, is that some
> of the bytecodes rely on moving actually ints around as ints on the
> stack (rather than a pointer to an int obj). I believe this is the
> reason that 'incr' is the one command that currently doesn't like
> wide integers in 8.4. While correcting this isn't a science project,
> it is something that must be done and carefully tested.

I'm assuming that a 32-bit int would be translated to a 64-bit long
(LP) or a 64-bit long long (LLP).

--gah


Chang Li

unread,
Jun 25, 2002, 1:48:58 AM6/25/02
to
"George A. Howlett" <g...@siliconmetrics.com> wrote in message news:<k7LR8.55$nD1....@news.uswest.net>...

> Jeff Hobbs <Je...@activestate.com> wrote:
>
> I'm assuming that a 32-bit int would be translated to a 64-bit long
> (LP) or a 64-bit long long (LLP).
>

This may lead to rewrite almost entire Tcl core. And the long is still
32-bit in Windows 64-bit. There is no long long. Its 64-bit type is
int64. So we may still need a typedef like wide int in Tcl. But there
are some great advantages
to use 64-bit integer. For example a string length could be over 2GB.
This extremly use may be normal in the future if we use 64GB memory.
However if we
limit the string length under 2GB, few Tcl codes need to be modified.

Chang

> --gah

George A. Howlett

unread,
Jun 25, 2002, 4:37:48 AM6/25/02
to
Chang Li <CHA...@neatware.com> wrote:

> This may lead to rewrite almost entire Tcl core. And the long is
> still 32-bit in Windows 64-bit. There is no long long. Its 64-bit
> type is int64.

You're right, there's no "long long" in Visual C++ 6.0 [The Intel
compiler supports long longs though]. But there are polymorphic types
such as

INT_PTR
LONG_PTR

that are the size of a pointer.

I think many changes are straight forward. You can convert ints and
longs to polymorphic types. That means that Tcl_GetIntFromObj will
return a 32-bit int under Win32 and a 64-bit int under Win64.

It will also help to assume an ANSI C compiler. For example, the
casting of malloc can lead to errors if you forget to include the
prototype.


> So we may still need a typedef like wide int in Tcl. But there are
> some great advantages to use 64-bit integer. For example a string
> length could be over 2GB. This extremly use may be normal in the
> future if we use 64GB memory. However if we limit the string length
> under 2GB, few Tcl codes need to be modified.

My view is there's little point creating a 64-bit version of Tcl using
a 32-bit memory model. It seems like as waste to have fast 64-bit
hardware run software that can't make use of the upper 32-bits.

--gah


Donal K. Fellows

unread,
Jun 25, 2002, 6:28:52 AM6/25/02
to
"George A. Howlett" wrote:
> It will also help to assume an ANSI C compiler. For example, the
> casting of malloc can lead to errors if you forget to include the
> prototype.

I think we're very close to forcing an ANSI compiler already. Some features are
just too valuable.

> My view is there's little point creating a 64-bit version of Tcl using
> a 32-bit memory model. It seems like as waste to have fast 64-bit
> hardware run software that can't make use of the upper 32-bits.

I agree with that. That's one reason why I'm keen on going to 9.0 and not 8.5
after the current release.

"[E]ven now, wars are fought and lost, people are killed and unkilled, toilet
rolls are used and unused, pants are derwear and underwear, all because
of the delicious velvety substance that is Marmite." -- Nathan Weston

Donal K. Fellows

unread,
Jun 25, 2002, 6:22:20 AM6/25/02
to
"George A. Howlett" wrote:
> Donal K. Fellows <fell...@cs.man.ac.uk> wrote:
>> As I've said, I looked into doing 64-bit adaption properly, but the
>> changes tend to spread throughout the core rather rapidly. It's
>> more work than you might naively expect, due mainly to
>> TclExecuteByteCode. :^(
>
> I've looked at TclExecuteByteCode. Can you enumerate what makes this
> routine any harder to port?

More than anything else, the problem is type leak. Because of the historical
identity in size between ints and longs, there's a lot of variable reuse in TEBC
which would not be warranted on a proper 64-bit port. OK, this is not the only
function that exhibits this property, but it is far and away the worst offender.

The other areas that need tackling are:
memory allocation
lengths and indices
structure packing

Maybe there's other things to do too.

>> One major disadvantage: it forks the code. I'm *really* keen on
>> keeping the number of active CVS branches down so that unrelated
>> fixes get picked up nicely instead of having to be added by hand.
>
> I don't think you can get around that problem. It's going to take
> human intervention to determine if the fixes made against the 32-bit
> sources will also work for the 64-bits. I wouldn't blindly trust
> those changes.

The core is, compared to other things (e.g. the TEA sampleextension), remarkably
easy to port. Once we get the main changes done (i.e. get longs in a multitude
of correct places and reorder some structures) there should not be a problem in
keeping things on-track.

> The code base is going be forked regardless--it's just a matter of
> how. You can #ifdef the 64-bit parts in the same CVS branch or you
> can have another branch. I think with all the structure redefinitions
> it's not worth trying to keep everything in the same branch.

I disagree. I particular, branching branches *everything* and I don't relish
working on keeping, say, the script library up to date with other non-64bit
bugfixes. George, you should have write access to the CVS repository so if you
wish to start a branch to work on this sort of thing, I can't stop you (if you
do, drop me a line telling me the name of the branch so I can list it in TIP#31)
but I will not work on it. From a maintenance point of view, branching is a
major mistake. IMO anyway. I'd much rather the whole core was properly 64-bit
aware, particularly as 64-bit machines will be becoming far more common in the
future, and system memories of even desktop systems are starting to push up
towards the point when 32-bit systems can't address all of it.

As I said before, I support doing something about this. Just Not Yet.

lvi...@yahoo.com

unread,
Jun 26, 2002, 8:07:35 AM6/26/02
to

: From a maintenance point of view, branching is a
:major mistake.

Are we talking brancing in the sense that primarily, new development would
be taking place in the 64 bit branch, and those requiring say 32 bit
support would be trying to retrofit fixes as best they could, or just freezing
at say Tcl 8.4 (or 7.6 or whatever) and never moving forward?

--
Support Internet Radio <URL: http://saveinternetradio.org/ >
Join Tcl'2002 in Vancouver http://www.tcl.tk/community/tcl2002/
Even if explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.

lvi...@yahoo.com

unread,
Jun 26, 2002, 8:04:53 AM6/26/02
to

: It seems like as waste to have fast 64-bit

:hardware run software that can't make use of the upper 32-bits.

Hopefully when tcl makes use of 64 bits, it won't slow down its execution.
Currently, that isn't the case - the limited 64 bit support now results
in slower execution for applications which would have run in 32 bits anyways.

Donal K. Fellows

unread,
Jun 27, 2002, 6:11:59 AM6/27/02
to
lvi...@yahoo.com wrote:
>: From a maintenance point of view, branching is a major mistake.
>
> Are we talking brancing in the sense that primarily, new development would
> be taking place in the 64 bit branch, and those requiring say 32 bit
> support would be trying to retrofit fixes as best they could, or just
> freezing at say Tcl 8.4 (or 7.6 or whatever) and never moving forward?

I'm talking in the sense of having 32-bit and 64-bit Tcl on different CVS
branches; one of them must be the main branch, and that one will receive all the
fixes to things that have nothing to do with the 32/64 issue, and the other
branch will either be stuck at an ancient version or forever running to catch
up. If it was instead possible to change the main trunk so that it became more
64-bit friendly while remaining good for 32-bit systems (I'm convinced this is
possible, though not backward-compatible) then this branching would not be
necessary.

I'd rather have exactly one non-trunk branch producing releases, and that's the
current release branch (at the moment, that's the one that the 8.3.* releases
came from.)

-- I have to warn you up front that I'm pretty sure you're full of crap, but
it might still be interesting to see your argument.
-- Bill Newman <wne...@netcom.com>

Chang Li

unread,
Jun 27, 2002, 9:29:40 PM6/27/02
to
"Donal K. Fellows" <fell...@cs.man.ac.uk> wrote in message news:<3D1AE4EF...@cs.man.ac.uk>...

> lvi...@yahoo.com wrote:
>
> I'm talking in the sense of having 32-bit and 64-bit Tcl on different CVS
> branches; one of them must be the main branch, and that one will receive all the
> fixes to things that have nothing to do with the 32/64 issue, and the other
> branch will either be stuck at an ancient version or forever running to catch
> up. If it was instead possible to change the main trunk so that it became more
> 64-bit friendly while remaining good for 32-bit systems (I'm convinced this is
> possible, though not backward-compatible) then this branching would not be
> necessary.
>

That is a good idea. Maybe we could add a new 64bit subdirectory
similar to win, unix in the branch. Then we could avoid many #ifdef
troubles. The makefile could find the right path to compile.

Chang

Donal K. Fellows

unread,
Jun 28, 2002, 6:16:32 AM6/28/02
to
Chang Li wrote:
> That is a good idea. Maybe we could add a new 64bit subdirectory
> similar to win, unix in the branch. Then we could avoid many #ifdef
> troubles. The makefile could find the right path to compile.

Don't need it. Things like reordering structures and type modification (once
you wade through all the natural layers of abstractions down to the actual
binary code generated) are really 32-bit neutral; any changes made will be fine
to be shared around everyone. It can't be done in a binary-compatible way.

If someone with the right permissions[*] wants to start a development branch to
work on these things, feel free. But no releases will be made from it and the
aim will be to merge into mainline as soon as practical (given that it can't be
done for a minor release, and there's probably current 64-bit users who are
after backward compatability.)

Donal.
[* Anyone interested on working on this currently without CVS access to the
core should contact the TCT on mailto:tcl-...@lists.sf.net with their SF
username so they can be given access. ]

-- Well, I'm not exactly a high-brow cineaste either. The number of Iranian
movies I've seen can be counted on one hand by a guy who lost all his
fingers in a tragic fax machine accident. -- Mike Kozlowski

George A. Howlett

unread,
Jun 28, 2002, 1:08:17 PM6/28/02
to
Donal K. Fellows <fell...@cs.man.ac.uk> wrote:
> lvi...@yahoo.com wrote:
>>: From a maintenance point of view, branching is a major mistake.
>>
>> Are we talking brancing in the sense that primarily, new development would
>> be taking place in the 64 bit branch, and those requiring say 32 bit
>> support would be trying to retrofit fixes as best they could, or just
>> freezing at say Tcl 8.4 (or 7.6 or whatever) and never moving forward?

> I'm talking in the sense of having 32-bit and 64-bit Tcl on
> different CVS branches; one of them must be the main branch, and
> that one will receive all the fixes to things that have nothing to
> do with the 32/64 issue, and the other branch will either be stuck
> at an ancient version or forever running to catch up. If it was
> instead possible to change the main trunk so that it became more
> 64-bit friendly while remaining good for 32-bit systems (I'm
> convinced this is possible, though not backward-compatible) then
> this branching would not be necessary.

I think there are two basic points.

1. A 64-bit port will certainly force basic internal changes to
both Tcl and Tk. Structures will need to be reordered.
Many variables will need to change from 32-bit ints to 64-bit
ints. This will break binary compatibility with the current
32-bit sources.

2. If binary compatibility was a worthwhile goal for the previous
versions of Tcl/Tk (who'd of thought I'd be saying this), then
it's still a worthwhile goal [...at least until 32-bit platforms
become as scarce as a 80286].

Both points suggest that there will be more than a bit of code
separation between 32-bit and 64-bit ports.

In a perfect world, I'd rather have everything in one source tree. I
agree that spliting the two ports will cause headaches. But the
problem of not spliting is that there's never be a good time to start
the 64-bit port [until it's a day late and a dollar short].

--gah


Chang Li

unread,
Jun 28, 2002, 7:41:37 PM6/28/02
to
"George A. Howlett" <g...@siliconmetrics.com> wrote in message news:<5K0T8.27$_J1.2...@news.uswest.net>...

> I think there are two basic points.
>
> 1. A 64-bit port will certainly force basic internal changes to
> both Tcl and Tk. Structures will need to be reordered.
> Many variables will need to change from 32-bit ints to 64-bit
> ints. This will break binary compatibility with the current
> 32-bit sources.
>
> 2. If binary compatibility was a worthwhile goal for the previous
> versions of Tcl/Tk (who'd of thought I'd be saying this), then
> it's still a worthwhile goal [...at least until 32-bit platforms
> become as scarce as a 80286].
>

There are few 64-bit extensions available so the binary compatibility
of 64-bit for old extensions is unnecessary. The difficult is to
separate the 32-bit and 64-bit code. For example many printfs need to
be rewritten to show 64-bit correctly. They are both 32/64-bit
dependented
and platform dependented (for 64-bit). There are combinations 32/win,
32/unix, 64/win, and 64/unix. In the 32-bit world, ANSI C is standard.
So there are no difference in 32/win and 32/unix for standard C
library.
Is there a standard ANSI C and standard library for 64-bit? I do not
think there is now. Even Mac C libray and Linux C library are
different
for 64-bit.

Tcl core is built on the standard C library. The 64-bit cross-platform
deployment is more difficult because of the lack of standard on C
library.
To modify one int to long long may cause a series of modification in
Tcl source codes.

> Both points suggest that there will be more than a bit of code
> separation between 32-bit and 64-bit ports.
>

I don't how many codes need to rewrite if Tcl use the 64-bit memory
allocation. So a block length and index could be 64-bit. That may
breakup
many Tcl codes. In this case 64-bit core is greatly different from the
32-bit core. In the semi 64-bit there are few breakups.

> In a perfect world, I'd rather have everything in one source tree. I
> agree that spliting the two ports will cause headaches. But the
> problem of not spliting is that there's never be a good time to start
> the 64-bit port [until it's a day late and a dollar short].
>

If we want to use the full advantage of the flat 64-bit memory in the
Tcl core
two source code branches will be greatly different. Otherwise if we
adapt to semi approach there are few differents. So the choice is to
take the full
advantage of 64-bit hardware or consider the compatibility of
software.

Chang

> --gah

Donal K. Fellows

unread,
Jul 5, 2002, 10:47:17 AM7/5/02
to
"George A. Howlett" wrote:
> I think there are two basic points.
>
> 1. A 64-bit port will certainly force basic internal changes to both Tcl
> and Tk. Structures will need to be reordered. Many variables will need
> to change from 32-bit ints to 64-bit ints. This will break binary
> compatibility with the current 32-bit sources.

Yes. (It'll also break compatability with the current sources as built on
64-bit platforms, but that's kind-of the point.)

> 2. If binary compatibility was a worthwhile goal for the previous
> versions of Tcl/Tk (who'd of thought I'd be saying this), then it's
> still a worthwhile goal [...at least until 32-bit platforms become as
> scarce as a 80286].

But not between major versions. That's (one reason) why we have major versions
at all. :^) It's been a long time since the last major version...

> Both points suggest that there will be more than a bit of code separation
> between 32-bit and 64-bit ports.

That was a leap in the dark that I didn't follow.

> In a perfect world, I'd rather have everything in one source tree. I agree
> that spliting the two ports will cause headaches. But the problem of not
> spliting is that there's never be a good time to start the 64-bit port
> [until it's a day late and a dollar short].

Could you wait until mid-September?

Donal.
--
"[He] would have needed to sell not only his own soul, but have somehow gotten
in on the ground floor of an Amway-like pryamid scheme delivering the souls
of kindergarten students to Satan by the truckload like so many boxes of Girl
Scout Cookies." -- John S. Novak, III <j...@concentric.net>

Donal K. Fellows

unread,
Jul 5, 2002, 10:55:20 AM7/5/02
to
Chang Li wrote:
> I don't how many codes need to rewrite if Tcl use the 64-bit memory
> allocation. So a block length and index could be 64-bit. That may
> breakup
> many Tcl codes. In this case 64-bit core is greatly different from the
> 32-bit core. In the semi 64-bit there are few breakups.

The way to handle this is for the core to stop being quite so laissez-faire
about what should be an int (mainly results and flags) and what is a long or
long long (lengths and values.) At the moment, the core is rather messy in this
regard...

> If we want to use the full advantage of the flat 64-bit memory in the
> Tcl core two source code branches will be greatly different.

I don't quite follow that. How will they be different? Why won't a few
typedefs/#defines do the trick once the conversion itself is done?

There's very little code in the core that actually depends on a particular size
of numeric type (and that's usually dealt with so that it works on 64-bit
platforms as well as 32-bit platforms) but there's a lot that is strongly
non-optimal on 64-bit. However, I'm reasonably sure at this stage that the
non-optimality can be significantly reduced without seriously impacting upon the
32-bit builds (though not in a binary-compatible way.)

Just not in 8.4.

George A. Howlett

unread,
Jul 5, 2002, 10:47:49 PM7/5/02
to
Donal K. Fellows <fell...@cs.man.ac.uk> wrote:

>> 2. If binary compatibility was a worthwhile goal for the previous
>> versions of Tcl/Tk (who'd of thought I'd be saying this), then it's
>> still a worthwhile goal [...at least until 32-bit platforms become as
>> scarce as a 80286].

> But not between major versions. That's (one reason) why we have
> major versions at all. :^) It's been a long time since the last
> major version...

I'm not a believer in backward binary-compatibility, but there are
people who do make the argument that it's necessary. But I do think
that one needs a better reason than the fact that 64-bit platforms
exist (for breaking compatibility).

If you break binary compatibility for 32-bit platforms, you may force
extension writers to go back to the bad-old-days of supporting several
versions of Tcl/Tk: 32-bit 9.0, 64-bit 9.0, and 32-bit 8.x.

And users may vote with their feet by holding off. [If downloads are
any indication, there's still a significant portion of users running
with versions previous to 8.3] Every transition between major versions
has been painful. There has to be compelling reasons to keep two
different versions of Tcl around (8.x had it: internationalization,
native look-and-feel, threads, etc.)

If 9.0 is going be incompatible, then it's not the right starting
point for the 64-bit port. People who want a 64-bit port, want to run
their old Tcl scripts. They really prefer stability over new features
(especially if it means that they have to edit their old scripts).
Right now it mainly depends upon the stability of the 8.4 final
release. I could work from a late 8.3 version (it's very tested and I
can compare the results).

--gah

0 new messages