ANNOUNCE: Colibri version 0.2

Frédéric Bonnet

unread,

Apr 16, 2009, 10:19:13 AM4/16/09

to

ANNOUNCE: Colibri version 0.2
=============================
As part of my ongoing work on the Cloverfield project, I am pleased to announce
the second version of colibri.

WHAT IS CLOVERFIELD?
====================

Wiki page: http://wiki.tcl.tk/Cloverfield

WHAT DOES COLIBRI STAND FOR?
============================

Colibris, known in English as hummingbirds, are a family of birds known for
their small size and high wing speed. The bee hummingbird (Mellisuga helenae),
found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total
length of about 5cm. They are renown for their stationary and backward flight
abilities on par with flying insects, which allow them to feed on the nectar of
plants and flowers in-flight.

I've chosen this name for this project because its goal is to be fast and
lightweight, and to follow the feather and bird theme shared with Tcl and many
related projects.

WHAT IS COLIBRI?
================

Colibri is a string and data type infrastructure. It features:

- Rope-based strings (see http://wiki.tcl.tk/20690) :

* ropes are immutable
... but ...
* extraction, insertion, deletion, concatenation... are fast
* limited compatibility with native C strings allocated in static space
(or guaranteed to survive the whole application lifetime)
* support for the full Unicode range (up to 32-bit codepoints)
* 1/2/4-byte fixed-width encodings as well as variable-width UTF-8 with
transparent conversions and access
* custom rope types allows for lazy or procedural representations
* iterator interface for easy access

- Extensible data type sytem dubbed "words"

* similar to Tcl_Obj
* minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes, but can
take more space if needed
* words can have synonyms, i.e. words of any type
* synonyms can form chains of arbitrary length, so a word can have several
synonyms with different representations -- no more shimmering!
* predefined types such as ints, single chars and small strings (up to 3
chars on 32-bit systems) are represented as immediate values: the value
is stored directly in the pointer rather than in an allocated structure

- Fast and efficient cell-based allocator

* page-based allocation for optimal locality of reference and cache use
* 16-byte cells on 32-bit systems fit most needs, but elements can allocate
more cells if needed (up to 63 cells on 32-bit systems)
* overhead is small: 2 bits per 16-byte cell.
* raw alloc performances are competitive with the stock system malloc,
and in many cases much faster (up to 5 times as fast on small strings and
on words vs. Tcl_Obj-like structs)
* single cell allocation (the most frequent case) is very fast

- Automatic memory management thanks to an exact (AKA accurate or precise),
generational, copying, mark-and-sweep, garbage collector

* exact GC implies that roots (externally referenced elements) and
parent-child relations must be declared by the application, using a
simple API
* custom types can define a finalizer; one of the applications is to plug
in externally managed structures (e.g. malloc/free)
* the GC process is fully controllable (pause/resume) so that the
application don't get interrupted unexpectedly
* generational GC limits the overhead by restricting the collection to
newer elements, which are typically short-lived
* longer-living elements are collected less often as they get promoted to
older generations
* promotion is done by moving whole pages between memory pools, optionally
performing compaction when fragmentation exceeds a certain threshold,
thus limiting the overhead and improving the cache-friendliness over time
* contrary to reference-counting schemes (RC), GCs allow circular
references without memory leaks; word synonym chains take advantage of
this, being implemented as circular lists
* studies show that GCs consistently outperform RC in raw performances and
real-world cases, because the cost (both space and time) of maintaining
reference counters outweights the amortized GC overhead, especially with
generational GCs

Colibri is written in plain C and is free of any dependency besides system
libraries. The compiled binary DLL on Windows is about 32kB. The source code is
heavily commented and follows the Tcl quality standards.

HOW DOES COLIBRI RELATE TO TCL?
===============================

From the Cloverfield announcement:

" The last major point of the project is related to implementation and low
level issues. The existing Tcl implementation is notoriously solid,
however some changes are too radical to be performed on the current code
base, so a reimplementation is certainly needed on vast portions. For
example, the string representations, object structures, and virtual
machines will certainly require complete rewrite to accommodate with the
needed changes, which means that a lot of code won't be reused. However
vast portions of library code and algorithms certainly will (clock scan,
bigint, regexp...), as well as the high coding standards and QA that are
the trademarks of our beloved language. "

So Colibri is a candidate infrastructure for Tcl9 as an alternative to the
current Tcl_Obj-based core implementation. I believe that the features provided
by Colibri shall yield higher levels of performances than the current
architecture, at the price of an ABI incompatibility (for which major versions
are made anyway), but with a similar philosophy that should ease conversion of
existing code.

CHANGES SINCE VERSION 0.1
=========================

1. Major rewrite of GC code:

- Roots are now sorted by generation for better performances with large number
of instances.
- Reworked parent-child relation handling. Now uses one cell per parent
instead of one per parent-child pair, in combination with a parent flag
(replaces the now suppressed generation flag).
- Roots and parent-child relations now use cells allocated in a dedicated
memory pool.
- No more per-cell generation flag; all cells in a page belong to the same
generation. Promotion is now immediate upon GC.
- Promotion is done per page rather than per cell: whole pages are moved up
one generation at each GC. CPU overhead is much smaller.
- As a consequence, the first generation is always emptied at each GC, which
speeds up further allocations.
- The last collected generation is by default merged with the first
uncollected one, but may be relocated if fragmentation reaches a certain
threshold; this is done by copying individual cells instead of moving pages.
- Relocation and redirection are now performed inline during the mark phase,
versus during later steps in previous versions. The resulting code is much
simpler, and avoids cache-intensive traversal of whole memory pools.

2. Improved performances in every domain thanks to point 1.

3. Added two high-level data types: vectors and lists.

- Vectors are flat collections of words. Their size is limited, as they must
fit within one single page. They are directly addressable.
- Lists are linear collections of words, made by assembling vectors into
balanced binary trees. They are to vectors what ropes are to strings. Like
ropes, lists are immutable. Iterator and traversal APIs are provided.

PLANNED FEATURES FOR NEXT VERSION
=================================

1. Mutable data types with circular references and graceful degradation to
immutable types:

- Mutable lists. Candidate models include unrolled linked lists and skip lists.
- Maps (AKA dicts) with string keys. Candidate models include Patricia trees
and Ternary Search Trees, but not hash tables.

2. Proper internal and user documentation

WHAT NEEDS TO BE DONE?
======================

My main development platform is Windows, so for now the source archive only
includes Microsoft Visual Studio project files. Microsoft provides a free
edition of their development environment known as Visual Studio Express for
those willing to compile and test the library without having to buy an expensive
license. Other systems need a makefile and/or autoconf scripts.

The code is fairly portable on 32-bit systems. 64-bit support will need more
work because all the internals are fine-tuned and optimized at the bit level;
however porting to 64-bit should be rather straightforward: the algorithms will
remain unchanged, structure access is abstracted behing macros, and cell size
is proportional to the word size (a cell should be able to store 4 pointers,
which add up to 16 bytes on 32-bit systems).

The only part that needs platform-specific code is the low-level page allocator.
Colibri needs system calls that allocate boundary-aligned pages. At present only
a Windows version is provided, but porting to other systems should require
minimal effort, as the platform-specific code is limited to a handful of
functions. Platform-specific peculiarities should not impact the overall
architecture.

Exception handling is poor, especially because Colibri is meant to be part of a
larger system that provides the appropriate infrastructure.

The current implementation is not thread-safe as it uses unprotected global
structures. However I don't think that adding synchronization is necessary given
that it should eventually be integrated into a larger application with the same
appartment threading model as Tcl. The main advantage of having a global
allocator is inter-thread data sharing, and Tcl typically shares no data between
threads. As Colibri is not designed to be a general-purpose solution, and unless
there is a real benefit in using a global thread-safe allocator, converting
global structures to thread-local should fit our needs.

Tests have been run extensively during development, both on stability and
performance. However Colibri lacks a real test suite. The source includes a
test application that has to be hand-edited to run or customize specific tests.
Note that the current values defined by this application require a system with
a large memory size (2GB).

Last, it lacks user and design documentation, although the source code is
extensively commented.

GRAND UNIFICATION SCHEME
========================

A great side project would be to reimplement Tcl over Colibri as a replacement
for the current Tcl_Obj based code. Of course the resulting library would be
incompatible on an ABI level, but this would provide a great testbed for
Colibri in real-world cases (using pure script applications) as well as a
comparison point with the official Tcl core, and will possibly open the path to
Tcl9. This most likely involves a lot of work.

WHERE CAN IT BE FOUND?
======================

Wiki page: http://wiki.tcl.tk/Colibri
Project page @ SourceForge.net: https://sourceforge.net/projects/tcl9/
Mailing list: https://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri
Direct Download:
- source: http://downloads.sourceforge.net/tcl9/colibri0.2.src.zip?use_mirror=
- Windows binary:
http://downloads.sourceforge.net/tcl9/colibri0.2.win32.zip?use_mirror=

LICENSE
=======

The license is the same as for Tcl.

Gerald W. Lester

unread,

Apr 16, 2009, 11:14:42 AM4/16/09

to

Frédéric Bonnet wrote:
> ANNOUNCE: Colibri version 0.2
> ...

>
> So Colibri is a candidate infrastructure for Tcl9 as an alternative to the
> current Tcl_Obj-based core implementation. I believe that the features
> provided
> by Colibri shall yield higher levels of performances than the current
> architecture, at the price of an ABI incompatibility (for which major
> versions
> are made anyway), but with a similar philosophy that should ease
> conversion of
> existing code.

> ...

>
>
> WHAT NEEDS TO BE DONE?
> ======================
>

> ...

>
> Tests have been run extensively during development, both on stability and
> performance. However Colibri lacks a real test suite. The source includes a
> test application that has to be hand-edited to run or customize specific
> tests.
> Note that the current values defined by this application require a
> system with
> a large memory size (2GB).

>...

First you say you believe it shall yeild higher levels of performance than
Tcl_Obj, then you say performance testing have been done -- what do those
test show in comparison to Tcl_Obj?

Lastly, you state it requires a large memory size (2GB) -- Tcl currently has
a very small footprint and is used extensively in things such as routers
that have very small amounts of memory. The memory issue would likely be a
very big problem in getting this to replace Tcl_Obj, at least IMHO.

--
+------------------------------------------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+

Frédéric Bonnet

unread,

Apr 16, 2009, 12:08:33 PM4/16/09

to

Gerald W. Lester wrote:
>> Tests have been run extensively during development, both on stability and
>> performance. However Colibri lacks a real test suite. The source
>> includes a
>> test application that has to be hand-edited to run or customize
>> specific tests.
>> Note that the current values defined by this application require a
>> system with
>> a large memory size (2GB).
>> ...
>
>
> First you say you believe it shall yeild higher levels of performance
> than Tcl_Obj, then you say performance testing have been done -- what do
> those test show in comparison to Tcl_Obj?

The test application includes a benchmark procedure called testAllocInts that
compares colibri integer words allocation to malloc'd Tcl_Obj-like structures.
The test creates then destroys 20 million integers in a row. On my system I get
the following output (time in seconds):

Words: 20000000 int words...
0.968 create (0% immediate) + 0.172 GC = 1.140
malloc: 20000000 24-byte structures...
3.469 malloc + value + 3.766 free = 7.235

Not only that, but the malloc version takes up more than twice as much memory.
Of course YMMV and it depends on the malloc implementation. But note that while
free takes about the same time as malloc, the GC phase is much faster. And this
is a best case scenario (linear access) for malloc, random malloc/free calls
should perform worse. Tcl_Objs may perform better than malloc because of
pooling, but I believe that colibri should perform better on the long run thanks
to improved memory fragmentation and locality. And in practice, and all things
being equal, GC should give better overall performances than refcounting.

Speaking of integers, colibri also supports so-called immediate values. Contrary
to allocated structures, immediate values are stored within the pointer. This is
a well-known technique among dynamic languages such as Lisp or Smalltalk. As
pointers to structures are always aligned on word boundaries, this leaves the
least significant bits free for tagging. In colibri, if the two lsbs are 0, then
it points to an allocated structure. If the lsb is 1, then the remaining 31 bits
(on 32-bit arch) are used to store an integer value. That way half of the
integer domain can be represented as immediate values, which saves memory and
CPU. When dealing with sole immediate values, the above test gives the following
results:

Words: 20000000 int words...
0.562 create (100% immediate) + 0.000 GC = 0.562
malloc: 20000000 24-byte structures...
3.875 malloc + value + 3.735 free = 7.610

And with zero allocated memory of course. And since integers are, with strings,
the most common data type in real world applications, I believe that immediate
values alone should give a performance boost.

> Lastly, you state it requires a large memory size (2GB) -- Tcl currently
> has a very small footprint and is used extensively in things such as
> routers that have very small amounts of memory. The memory issue would
> likely be a very big problem in getting this to replace Tcl_Obj, at
> least IMHO.

No, colibri is very frugal hopefully. Only the test application requires such an
amount for benchmarking purpose. Colibri allocates memory one page at a time,
and can run in a very small footprint.

Cheers, Fred

George Peter Staplin

unread,

Apr 16, 2009, 3:17:35 PM4/16/09

to

Frédéric Bonnet wrote:

> ANNOUNCE: Colibri version 0.2
> =============================
> As part of my ongoing work on the Cloverfield project, I am pleased to
> announce the second version of colibri.
>
>
> WHAT IS CLOVERFIELD?
> ====================
>
> Wiki page: http://wiki.tcl.tk/Cloverfield

[snip]

> - Automatic memory management thanks to an exact (AKA accurate or
> precise),
> generational, copying, mark-and-sweep, garbage collector
>
> * exact GC implies that roots (externally referenced elements) and
> parent-child relations must be declared by the application, using
> a simple API
> * custom types can define a finalizer; one of the applications is to
> plug
> in externally managed structures (e.g. malloc/free)
> * the GC process is fully controllable (pause/resume) so that the
> application don't get interrupted unexpectedly

That sounds very cool.

> * generational GC limits the overhead by restricting the collection
> to
> newer elements, which are typically short-lived

Wow, this is sounding very amazing. I have the Garbage Collection book and
I have built some small GC in C and assembly language. It sounds like
you're far ahead of me though. I was not able to make a proper
generational collector.

Please do *not* add thread support. I have a feeling that would muck it all
up.

> * longer-living elements are collected less often as they get
> promoted to
> older generations
> * promotion is done by moving whole pages between memory pools,
> optionally
> performing compaction when fragmentation exceeds a certain
> threshold, thus limiting the overhead and improving the
> cache-friendliness over time
> * contrary to reference-counting schemes (RC), GCs allow circular
> references without memory leaks; word synonym chains take
> advantage of this, being implemented as circular lists
> * studies show that GCs consistently outperform RC in raw
> performances and
> real-world cases, because the cost (both space and time) of
> maintaining reference counters outweights the amortized GC
> overhead, especially with generational GCs
>
>
> Colibri is written in plain C and is free of any dependency besides system
> libraries. The compiled binary DLL on Windows is about 32kB. The source
> code is heavily commented and follows the Tcl quality standards.

Are you using mmap() in unix and VirtualAlloc() in Windows for your page
allocation?

Tcl quality standards vary. I personally don't care if code has a lot of
comments, in fact I often find such comments distracting. I care a lot
more about the code having a general overview commentary (if it needs
that), and comments that point out unexpected things, or order dependencies
for state.

With regard to quality: it's much harder to verify a 300 line function with
50 variables, in comparison to the same code with 50 line functions and
better data structuring. I know that some TCT members don't agree with me
on this...

> HOW DOES COLIBRI RELATE TO TCL?
> ===============================
>
> From the Cloverfield announcement:
>
> " The last major point of the project is related to implementation and
> low
> level issues. The existing Tcl implementation is notoriously solid,
> however some changes are too radical to be performed on the current
> code base, so a reimplementation is certainly needed on vast
> portions. For example, the string representations, object structures,
> and virtual machines will certainly require complete rewrite to
> accommodate with the needed changes, which means that a lot of code
> won't be reused. However vast portions of library code and algorithms
> certainly will (clock scan, bigint, regexp...), as well as the high
> coding standards and QA that are
> the trademarks of our beloved language. "

I think we may have to disagree here. My opinion of Tcl's solidity is often
that some areas are solid, and others are really in need of life support.

I mentioned to some other Tcl maintainers/developers my concerns about
the "broken windows." My philosophy is that some code in Tcl has had
broken windows, and rather than cleaning up that code, it gets messier over
time.

If you read some of the bugs that people find in Tcl, or examine the
plentiful bug collection on SourceForge you may change your mind.

Tcl is pretty good, but it could definitely use simplification where
possible.

> WHAT NEEDS TO BE DONE?
> ======================
>
> My main development platform is Windows, so for now the source archive
> only includes Microsoft Visual Studio project files. Microsoft provides a
> free edition of their development environment known as Visual Studio
> Express for those willing to compile and test the library without having
> to buy an expensive license. Other systems need a makefile and/or autoconf
> scripts.

Why aren't you using MinGW or something like Open Watcom (which I think
works well in Windows)?

> The code is fairly portable on 32-bit systems. 64-bit support will need
> more work because all the internals are fine-tuned and optimized at the
> bit level; however porting to 64-bit should be rather straightforward: the
> algorithms will remain unchanged, structure access is abstracted behing
> macros, and cell size is proportional to the word size (a cell should be
> able to store 4 pointers, which add up to 16 bytes on 32-bit systems).

Hmm... Why not write it in assembler if it's going to have such
restrictions?

> The only part that needs platform-specific code is the low-level page
> allocator. Colibri needs system calls that allocate boundary-aligned
> pages. At present only a Windows version is provided, but porting to other
> systems should require minimal effort, as the platform-specific code is
> limited to a handful of functions. Platform-specific peculiarities should
> not impact the overall architecture.

I see. So I guess I would have to port it to unix, if the code meets my
needs for an idea I'm working on.

> Exception handling is poor, especially because Colibri is meant to be part
> of a larger system that provides the appropriate infrastructure.

What does that mean? Can you expand on that?

> The current implementation is not thread-safe as it uses unprotected
> global structures. However I don't think that adding synchronization is
> necessary given that it should eventually be integrated into a larger
> application with the same appartment threading model as Tcl. The main
> advantage of having a global allocator is inter-thread data sharing, and
> Tcl typically shares no data between threads. As Colibri is not designed
> to be a general-purpose solution, and unless there is a real benefit in
> using a global thread-safe allocator, converting global structures to
> thread-local should fit our needs.

Please do *NOT* add thread support.

> Tests have been run extensively during development, both on stability and
> performance. However Colibri lacks a real test suite. The source includes
> a test application that has to be hand-edited to run or customize specific
> tests. Note that the current values defined by this application require a
> system with a large memory size (2GB).
>
> Last, it lacks user and design documentation, although the source code is
> extensively commented.
>
>
> GRAND UNIFICATION SCHEME
> ========================
>
> A great side project would be to reimplement Tcl over Colibri as a
> replacement for the current Tcl_Obj based code. Of course the resulting
> library would be incompatible on an ABI level, but this would provide a
> great testbed for Colibri in real-world cases (using pure script
> applications) as well as a comparison point with the official Tcl core,
> and will possibly open the path to Tcl9. This most likely involves a lot
> of work.

I am interested in this. I know that many hands make light work. I'll
probably post a followup after reviewing your code further...

-George

George Peter Staplin

unread,

Apr 16, 2009, 3:49:48 PM4/16/09

to

Frédéric Bonnet wrote:

> ANNOUNCE: Colibri version 0.2
> =============================
> As part of my ongoing work on the Cloverfield project, I am pleased to
> announce the second version of colibri.
>
>
> WHAT IS CLOVERFIELD?
> ====================
>
> Wiki page: http://wiki.tcl.tk/Cloverfield
>
>

> WHERE CAN IT BE FOUND?
> ======================
>
> Wiki page: http://wiki.tcl.tk/Colibri
> Project page @ SourceForge.net: https://sourceforge.net/projects/tcl9/
> Mailing list:
> https://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri
> Direct Download:
> - source:
> http://downloads.sourceforge.net/tcl9/colibri0.2.src.zip?use_mirror=
> - Windows binary:
> http://downloads.sourceforge.net/tcl9/colibri0.2.win32.zip?use_mirror=

OK, so I got the source code.

I would suggest some cleanups almost immediately. This code is going to get
messy if people port it to systems other than Win32 and Unix. In fact even
having a unix port would be messy the way things are now. I really don't
like what you have done with the various #ifdef WIN32 paths.

Here is what I would suggest:

Have something like a platform directory. So you have:
platform/win32/platform.h
platform/unix/platform.h

Now, every platform must provide a certain set of features to work with
things like colAlloc.c

Your build system can define the necessary stuff for the "unix" or "win32"
intermediate path.

So I would suggest that platform.h provides a set of macros specific to each
platform. You might even want:

platform/win32/arch/i386
platform/win32/arch/x86_64

and so on, perhaps making that 32 or 64 if the differences really are 32-bit
and 64-bit portability issues.

Your code in colAlloc.c says a lot about what it's doing, but less about
*why*.

This code in colInt.h:
#define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))
#define SWAP_PTR(a, b) (((*(int *) &(a)) ^= (*(int *) &(b))),
((*(int *) &(b)) ^= (*(int *) &(a))), ((*(int *) &(a)) ^= (*(int *) &(b))))

says to me: "why!?" Really why not use a temporary variable? Your code is
indeed non-portable from what I see. This kind of interpretation of
pointers as int is really not ideal. If you must do that use an *optional*
C99 feature that allows pointers to be represented in integer types called
intptr_t or uintptr_t.

This seems bogus to me in colInt.h:
#define PAGE_SIZE 1024

The average page is now 4096 bytes or greater. In unix you can use:
sysconf(_SC_PAGESIZE); to get the page size for a system (portably).

Windows has another function for gathering the page size, and IIRC MSDN
mentions that for use with VirtualAlloc().

Note: some systems won't allocate pages if you pass an invalid size for a
page allocation request, while others will round up. BSD systems typically
round up. Linux expects an exact size that is a multiple of the page size
with mmap and munmap.

colInt.h again:

#ifdef WORDS_BIGENDIAN
# define WORD_GET_TYPE_ADDR(word) \
((addr) = ((Col_WordType *)(*(unsigned int *)(word)&~3)),
SWAP(((char *)&(addr))[0], ((char *)&(addr))[3]))
# define WORD_SET_TYPE_ADDR(word, addr) \
(*(unsigned int *)(word) = ((unsigned int)(addr))|2, SWAP(((char *
(word))[0], ((char *)(word))[3]))

This is really looking very non-portable. Why didn't you start off with the
architecture-specific bits more organized?

colRope.c I think could be improved. I would really suggest storing a
length with each C string.
Col_RopeLength():

if ((*data2 & 0xC0) != 0x80) {length++;}

Reusing Tcl's 0xc0 0x80 hack to have the ASCII NUL terminator translated to
an invalid UTF-8 sequence is problematic. Please don't do that. I think
it leads to problems, especially if the string is used somewhere you don't
expect. If you're going to do Cloverfield right, then please use size_t
for string lengths, and avoid using strlen(). strlen() also doesn't scale
to very large strings very well.

Overall I think you have a lot of work to do, especially if you want this to
be portable to a system other than 32-bit Windows.

-George

Donal K. Fellows

unread,

Apr 16, 2009, 4:12:40 PM4/16/09

to

George Peter Staplin wrote:
> Frédéric Bonnet wrote:
>> ANNOUNCE: Colibri version 0.2
>> =============================
>> As part of my ongoing work on the Cloverfield project, I am pleased to
>> announce the second version of colibri.

[...]
> Tcl quality standards vary.

Having standards for how code is formatted helps a lot. If you feel I'm
wrong on this, try marking student projects where every last bit of each
file is formatted differently, sometimes with no indenting, sometimes
with lines over 400 characters long, weird variable naming, etc. Strict
formatting rules mean that people can come to the code and read it
immediately. Tcl has its own rules (the Engineering manual) but to be
honest it doesn't matter what they are really; it's just having the
enforced rules that makes the difference.

Not all code is as clear as it should be, but it is at least usually a
problem at the semantic level and not at the simple/syntax levels. (For
example, the notifier is scary but not because it can't be read; it's
just really very subtle and big enough overall to be hard to hold in
your head at once...)

> I personally don't care if code has a lot of
> comments, in fact I often find such comments distracting. I care a lot
> more about the code having a general overview commentary (if it needs
> that), and comments that point out unexpected things, or order dependencies
> for state.
>
> With regard to quality: it's much harder to verify a 300 line function with
> 50 variables, in comparison to the same code with 50 line functions and
> better data structuring. I know that some TCT members don't agree with me
> on this...

Splitting for the hell of it doesn't help. The good points to divide a
big function up are where the extracted part has its own clear purpose,
simple code paths in and out (preferably one!) and a small "exposure
surface" of shared variables (i.e. a clear "interface contract"). Being
able to share invocations of the extracted function between locations is
double-plus good. If such points can be identified, splitting makes
great sense.

TEBC is a special (extra horrific) case; our very own Augean Stables.

> I think we may have to disagree here. My opinion of Tcl's solidity is often
> that some areas are solid, and others are really in need of life support.
>
> I mentioned to some other Tcl maintainers/developers my concerns about
> the "broken windows." My philosophy is that some code in Tcl has had
> broken windows, and rather than cleaning up that code, it gets messier over
> time.
>
> If you read some of the bugs that people find in Tcl, or examine the
> plentiful bug collection on SourceForge you may change your mind.
>
> Tcl is pretty good, but it could definitely use simplification where
> possible.

I think a discussion of which parts are good or not is a thing for
another (probably long) thread.

>> The code is fairly portable on 32-bit systems. 64-bit support will need
>> more work because all the internals are fine-tuned and optimized at the
>> bit level; however porting to 64-bit should be rather straightforward: the
>> algorithms will remain unchanged, structure access is abstracted behing
>> macros, and cell size is proportional to the word size (a cell should be
>> able to store 4 pointers, which add up to 16 bytes on 32-bit systems).
>
> Hmm... Why not write it in assembler if it's going to have such
> restrictions?

Hmm indeed. (Supporting 64-bit platforms is becoming very important,
especially for server systems.)

>> The current implementation is not thread-safe as it uses unprotected
>> global structures. However I don't think that adding synchronization is
>> necessary given that it should eventually be integrated into a larger
>> application with the same appartment threading model as Tcl. The main
>> advantage of having a global allocator is inter-thread data sharing, and
>> Tcl typically shares no data between threads. As Colibri is not designed
>> to be a general-purpose solution, and unless there is a real benefit in
>> using a global thread-safe allocator, converting global structures to
>> thread-local should fit our needs.
>
> Please do *NOT* add thread support.

Whether or not you (Frédéric, that is) add thread support next is a good
question, but it most certainly should remain as a goal as many of the
heavy-duty Tcl applications (especially server apps) use threads a lot.
It's a fact of modern computing that the number of cores per machine is
going up, and threading is an important way to leverage this. The
reality is that Tcl 9.0 most certainly *must* be thread-supporting.

Donal.

George Peter Staplin

unread,

Apr 16, 2009, 4:26:00 PM4/16/09

to

Donal K. Fellows wrote:

> George Peter Staplin wrote:
>> Frédéric Bonnet wrote:
>>> ANNOUNCE: Colibri version 0.2

>>> The current implementation is not thread-safe as it uses unprotected
>>> global structures. However I don't think that adding synchronization is
>>> necessary given that it should eventually be integrated into a larger
>>> application with the same appartment threading model as Tcl. The main
>>> advantage of having a global allocator is inter-thread data sharing, and
>>> Tcl typically shares no data between threads. As Colibri is not designed
>>> to be a general-purpose solution, and unless there is a real benefit in
>>> using a global thread-safe allocator, converting global structures to
>>> thread-local should fit our needs.
>>
>> Please do *NOT* add thread support.
>
> Whether or not you (Frédéric, that is) add thread support next is a good
> question, but it most certainly should remain as a goal as many of the
> heavy-duty Tcl applications (especially server apps) use threads a lot.
> It's a fact of modern computing that the number of cores per machine is
> going up, and threading is an important way to leverage this. The
> reality is that Tcl 9.0 most certainly *must* be thread-supporting.
>
> Donal.

Donal, I'm well aware of the "cores per machine" argument. Processes can
take advantage of a large number of cores too. Threading as Tcl implements
it is in general no better than spawning another process and communicating
with it over a pipe. You can in fact implement an apartment-like model
with processes and [open |]. You can also do so by using the same script
with different code paths that are run via the argv.

There is in general little advantage to using threads, especially with the
traditional SRC-style threads. See: http://swtch.com/~rsc/thread/

We are after all talking about Cloverfield too, and Cloverfield has already
taken a radical departure from the Tcl syntax in some ways.

-George

George Peter Staplin

unread,

Apr 16, 2009, 4:56:34 PM4/16/09

to

George Peter Staplin wrote:

> Frédéric Bonnet wrote:
>
>> ANNOUNCE: Colibri version 0.2
>> =============================
>> As part of my ongoing work on the Cloverfield project, I am pleased to
>> announce the second version of colibri.
>>
>>
>> WHAT IS CLOVERFIELD?
>> ====================
>>
>> Wiki page: http://wiki.tcl.tk/Cloverfield
>>
>>
>> WHERE CAN IT BE FOUND?
>> ======================
>>
>> Wiki page: http://wiki.tcl.tk/Colibri
>> Project page @ SourceForge.net: https://sourceforge.net/projects/tcl9/
>> Mailing list:
>> https://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri
>> Direct Download:
>> - source:
>> http://downloads.sourceforge.net/tcl9/colibri0.2.src.zip?use_mirror=
>> - Windows binary:
>> http://downloads.sourceforge.net/tcl9/colibri0.2.win32.zip?use_mirror=

[snip list of problems]

> Overall I think you have a lot of work to do, especially if you want this
> to be portable to a system other than 32-bit Windows.
>
> -George

I have more critical review of this code.

You're using malloc.h (which isn't standard or portable). Standard code
uses stdlib.h for malloc().

The code uses alloca. alloca can fail silently and result in faults in
various areas, when storing values to the pointer returned by alloca.
alloca is also known to disable some optimizations with GCC and other
compilers. alloca is also not a standard function, so systems aren't
required to support it. It also can lead to security problems in some
cases.

The code doesn't check the return value or report errors when VirtualAlloc()
fails.

-George

Donal K. Fellows

unread,

Apr 16, 2009, 4:57:57 PM4/16/09

to

Frédéric Bonnet wrote:
> Speaking of integers, colibri also supports so-called immediate values.
> Contrary to allocated structures, immediate values are stored within the
> pointer. This is a well-known technique among dynamic languages such as
> Lisp or Smalltalk. As pointers to structures are always aligned on word
> boundaries, this leaves the least significant bits free for tagging.

Actually, structure pointers seem to be 8-byte aligned on all modern
desktops and servers, though pointers to components of structures or
members of arrays are not so constrained. This has been true (of
non-embedded systems) for a long time now.

> And with zero allocated memory of course. And since integers are, with
> strings, the most common data type in real world applications, I believe
> that immediate values alone should give a performance boost.

If you really want to cover all bases, add "floating point number" to
that list. They're really pretty common (especially in scientific and
engineering applications) and you don't want to convert them to/from
strings if you can help it. Those conversions are fairly expensive and
there tend to be lots of doubles used in those cases when the
performance matters at all...

Donal.

George Peter Staplin

unread,

Apr 16, 2009, 5:04:23 PM4/16/09

to

Donal K. Fellows wrote:

> Frédéric Bonnet wrote:
>> Speaking of integers, colibri also supports so-called immediate values.
>> Contrary to allocated structures, immediate values are stored within the
>> pointer. This is a well-known technique among dynamic languages such as
>> Lisp or Smalltalk. As pointers to structures are always aligned on word
>> boundaries, this leaves the least significant bits free for tagging.
>
> Actually, structure pointers seem to be 8-byte aligned on all modern
> desktops and servers, though pointers to components of structures or
> members of arrays are not so constrained. This has been true (of
> non-embedded systems) for a long time now.

If you go digging much deeper you might find that the average allocator
these days is doing 16-byte alignment for things like "long double" and SSE
instructions that operate on 128-bit values. Though often users should use
things like posix_memalign() rather than assuming.

-George

Óscar Fuentes

unread,

Apr 16, 2009, 6:04:07 PM4/16/09

to

George Peter Staplin <geor...@xmission.com> writes:

> Donal, I'm well aware of the "cores per machine" argument. Processes
> can take advantage of a large number of cores too. Threading as Tcl
> implements it is in general no better than spawning another process
> and communicating with it over a pipe. You can in fact implement an
> apartment-like model with processes and [open |]. You can also do so
> by using the same script with different code paths that are run via
> the argv.

I'm no expert on Tcl's threading model, but AFAIK it allows to implement
a chain of producers/consumers as M:N:... where M>1, N>1... This is hard
to achieve with piped processes.

OTOH, Tcl's threading model is quite simplistic, which is a good thing
for some tasks and not so good for others.

> There is in general little advantage to using threads, especially with the
> traditional SRC-style threads. See: http://swtch.com/~rsc/thread/

Maybe I'm misinterpreting you (perhaps you are restricting the
discussion to certain programming applications, etc) but the fact is
that chosing among multi-thread programming and multi-processes is not
an implementation detail, but an architectural cornerstone. Like
choosing the programming paradigm the implentation language must
support. It can simplify (or harden) things on orders of
magnitude. Definitely, threads are the superior choice for some
applications.

[snip]

--
Oscar

Donal K. Fellows

unread,

Apr 16, 2009, 6:54:49 PM4/16/09

to

George Peter Staplin wrote:
> Donal, I'm well aware of the "cores per machine" argument. Processes can
> take advantage of a large number of cores too. Threading as Tcl implements
> it is in general no better than spawning another process and communicating
> with it over a pipe. You can in fact implement an apartment-like model
> with processes and [open |]. You can also do so by using the same script
> with different code paths that are run via the argv.

Actually, no. There are substantial per-process overheads too (such as
the encoding and filesystem subsystems) and you avoid having to work
with shared memory segments for dealing with the case where you're
shipping a large chunk of memory to another thread for processing
(shared memory segments are their own special kind of bunch of problems
IIRC). Going to processes only really starts to make sense when you're
working with algorithms that split up nicely; some do, some don't.
There's a huge literature on the subject.

> There is in general little advantage to using threads, especially with the
> traditional SRC-style threads. See: http://swtch.com/~rsc/thread/

I've known about the different thread models for quite a long time now.
There's a tremendous amount of hyperbole spread around about them, often
because the majority of practitioners are not people with CS training.
The fundamental two models are "shared memory" and "message passing". On
a small scale, if you can wrap your head around the techniques required
to code that way, shared memory schemes are very good indeed. However,
there is excellent evidence from across a large space of the whole
computing industry that message passing schemes scale up much *much*
larger (i.e., the internet!)

Donal.

George Peter Staplin

unread,

Apr 16, 2009, 9:52:05 PM4/16/09

to

Donal K. Fellows wrote:

> George Peter Staplin wrote:
>> Donal, I'm well aware of the "cores per machine" argument. Processes can
>> take advantage of a large number of cores too. Threading as Tcl
>> implements it is in general no better than spawning another process and
>> communicating
>> with it over a pipe. You can in fact implement an apartment-like model
>> with processes and [open |]. You can also do so by using the same script
>> with different code paths that are run via the argv.
>
> Actually, no. There are substantial per-process overheads too (such as
> the encoding and filesystem subsystems) and you avoid having to work
> with shared memory segments for dealing with the case where you're
> shipping a large chunk of memory to another thread for processing
> (shared memory segments are their own special kind of bunch of problems
> IIRC). Going to processes only really starts to make sense when you're
> working with algorithms that split up nicely; some do, some don't.
> There's a huge literature on the subject.

How can you work with Tcl when you should really only use one interp with
each thread? The thread sending is done using a pipe in Tcl, and the
notifier. How is that much different than using pipes between processes?
It doesn't copy as much, and the pipe usage is in fact potentially buggy in
some cases in Tcl's thread support.

So far I haven't convinced others that we should fix it, and my fix would
probably be reverted. Hint: writing to a full pipe() with write() and not
checking to see if the write returned an error is a problem. It's also a
problem especially when you consider that it's using non-blocking I/O.
This means the events just get thrown away, so you were supposed to have X
wakeups, and you get Y instead, where X > Y. So the notifier gradually
builds up garbage I would imagine in some rare cases, and some thread
waiters could hang wondering what happened to the event for them (see the
triggerPipe usage in tclUnixNotfy.c). Luckily the timing and interactions
of most threads don't hit this.

There is also the issue of thread safety in Tcl. Tcl assumes certain
operations are atomic, without using gcc atomic ops, or similar things.
See the thread-specific data layer (tclThreadStorage.c) which I know you
(DKF) worked on recently. I cleaned up some of the original, but I still
disagree that it's the best way forward. Before it was using many hash
tables, and now it uses sig_atomic_t. But eh, it works, usually.

>> There is in general little advantage to using threads, especially with
>> the
>> traditional SRC-style threads. See: http://swtch.com/~rsc/thread/
>
> I've known about the different thread models for quite a long time now.
> There's a tremendous amount of hyperbole spread around about them, often
> because the majority of practitioners are not people with CS training.
> The fundamental two models are "shared memory" and "message passing". On
> a small scale, if you can wrap your head around the techniques required
> to code that way, shared memory schemes are very good indeed. However,
> there is excellent evidence from across a large space of the whole
> computing industry that message passing schemes scale up much *much*
> larger (i.e., the internet!)
>
> Donal.

Yes, we need people with CS training, so the ivory tower fellows can tell us
how to do it! ;-) Oh wait, now some ivory tower people at Berkeley are
against threads!?[1] Whatever will we do? It must be those darn Berkeley
people with their LSD creating things like Tcl, and BSD. Their ivory tower
isn't so pure as some others, is it?

http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf

In closing: we are the knights that say meh! :-)

-George

George Peter Staplin

unread,

Apr 16, 2009, 10:08:54 PM4/16/09

to

George Peter Staplin wrote:
>
> Yes, we need people with CS training, so the ivory tower fellows can tell
> us
> how to do it! ;-) Oh wait, now some ivory tower people at Berkeley are
> against threads!?[1] Whatever will we do? It must be those darn Berkeley
> people with their LSD creating things like Tcl, and BSD. Their ivory
> tower isn't so pure as some others, is it?
>
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf
>

The thought occurred to me that some of you might not read that paper all
the way to the conclusion. Here's part of the conclusion:

"If we expect concurrent programming to be mainstream, and if we demand
reliability and predictability from programs, then we must discard threads
as a programming model. Concurrent programming models can be constructed
that are much more predictable and understandable than threads."

-George

Frédéric Bonnet

unread,

Apr 17, 2009, 4:33:26 AM4/17/09

to

Donal K. Fellows wrote:
> Frédéric Bonnet wrote:
>> Speaking of integers, colibri also supports so-called immediate
>> values. Contrary to allocated structures, immediate values are stored
>> within the pointer. This is a well-known technique among dynamic
>> languages such as Lisp or Smalltalk. As pointers to structures are
>> always aligned on word boundaries, this leaves the least significant
>> bits free for tagging.
>
> Actually, structure pointers seem to be 8-byte aligned on all modern
> desktops and servers, though pointers to components of structures or
> members of arrays are not so constrained. This has been true (of
> non-embedded systems) for a long time now.

That's right, however the minimal alignment is 4 bytes on 32-bit systems anyway
(some architectures such as ARM will crash & burn when dealing with unaligned
data), which leaves 2 lsbs in any case. Anyway Colibri uses 16-byte cells, which
leaves 4 lsbs free for tags. At present I'm using the following tag policy:

> If you really want to cover all bases, add "floating point number" to
> that list. They're really pretty common (especially in scientific and
> engineering applications) and you don't want to convert them to/from
> strings if you can help it.

Double support is only possible on 64-bit systems, only floats can be
represented as immediate values on 32-bit systems. However we can give it a try,
as long as we're using standard IEEE representation: the lsbs lie in the
fractional part. But the precision loss will be 2 bits if we use binary tag 10
(this means that short strings will use 0100), which seriously limits the
usefulness on 32-bit systems (precision drops to 22 bits).

Anyway doubles can still be implemented as words: the available payload size is
8 bytes for a single-cell word, the same as with Tcl_Objs. And values that don't
fit within a pointer (as immediate values) will be upconverted to cell-based
words (this is already the case with integers, see Col_NewIntWord).

> Those conversions are fairly expensive and
> there tend to be lots of doubles used in those cases when the
> performance matters at all..

There is no conversion involved with immediate values, only changes in
representation (simple bit arithmetics).

dkf

unread,

Apr 17, 2009, 5:21:11 AM4/17/09

to

On 17 Apr, 03:08, George Peter Staplin <georg...@xmission.com> wrote:
> The thought occurred to me that some of you might not read that paper all
> the way to the conclusion. Here's part of the conclusion:
>
> "If we expect concurrent programming to be mainstream, and if we demand
> reliability and predictability from programs, then we must discard threads
> as a programming model. Concurrent programming models can be constructed
> that are much more predictable and understandable than threads."

To put it in simple terms, global locks are terrible for performance
(even assuming you don't have deadlock problems) and shared memory
tends to cause headaches. The message-passing model is much easier to
grasp, but can have performance problems if the message size is large.

The Thread package doesn't do too bad at trying to manage this
dichotomy, since it's basically message passing but with some shared
memory facilities that you can opt into.

Donal.

Frédéric Bonnet

unread,

Apr 17, 2009, 5:30:31 AM4/17/09

to

George Peter Staplin wrote:
> I would suggest some cleanups almost immediately. This code is going to get
> messy if people port it to systems other than Win32 and Unix. In fact even
> having a unix port would be messy the way things are now. I really don't
> like what you have done with the various #ifdef WIN32 paths.
>
> Here is what I would suggest:

[suggestion on build system and source tree]

That's something I deliberately let aside until now to concentrate on the core
problems, as the system-specific parts are very small, and I believe that things
should get sorted out once it gets ported to other systems.

> Your code in colAlloc.c says a lot about what it's doing, but less about
> *why*.

Overall, Colibri needs decent design documentation.

> This code in colInt.h:
> #define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))
> #define SWAP_PTR(a, b) (((*(int *) &(a)) ^= (*(int *) &(b))),
> ((*(int *) &(b)) ^= (*(int *) &(a))), ((*(int *) &(a)) ^= (*(int *) &(b))))
>
> says to me: "why!?" Really why not use a temporary variable?

That's the whole point actually. For example, SWAP() is used in other macros,
and so using temp variables in unpractical. FYI I got this technique from the
following page:

http://graphics.stanford.edu/~seander/bithacks.html#SwappingValuesXOR

> Your code is
> indeed non-portable from what I see. This kind of interpretation of
> pointers as int is really not ideal. If you must do that use an *optional*
> C99 feature that allows pointers to be represented in integer types called
> intptr_t or uintptr_t.

But what if the compiler doesn't support C99? I guess I'll have to define the
missing types in the platform stuff. But thanks for your suggestion, I'll
cleanup the code accordingly. I've read somewhere that ints on 64-bit are still
32 bits, so I'd have to change this when porting to 64 bit systems anyway.

> This seems bogus to me in colInt.h:
> #define PAGE_SIZE 1024
>
>
> The average page is now 4096 bytes or greater. In unix you can use:
> sysconf(_SC_PAGESIZE); to get the page size for a system (portably).

Indeed, but PAGE_SIZE defines the *logical* page size, the physical (read:
system) page size is provided by the system (GetSytemInfo() on Win32). So on
systems with 4KB pages, each phys page holds 4 log pages. PAGE_SIZE is used in
pointer arithmetics to know which page a cell belongs to, so it must be known
and constant. Moreover, benchmarks shows that giving PAGE_SIZE a larger value
(e.g. 4096) degrades performances, so using a log page size that is distinct
from the phys page size makes the code more portable and easier to tune.

> colInt.h again:
>
> #ifdef WORDS_BIGENDIAN
> # define WORD_GET_TYPE_ADDR(word) \
> ((addr) = ((Col_WordType *)(*(unsigned int *)(word)&~3)),
> SWAP(((char *)&(addr))[0], ((char *)&(addr))[3]))
> # define WORD_SET_TYPE_ADDR(word, addr) \
> (*(unsigned int *)(word) = ((unsigned int)(addr))|2, SWAP(((char *
> (word))[0], ((char *)(word))[3]))
>
> This is really looking very non-portable. Why didn't you start off with the
> architecture-specific bits more organized?

That's a tradeoff between gathering system-dependent vs. functional units stuff.
Here I chose to keep all word-related definitions in the same place. But once
the build tree is sorted out, I may define a special macro that abstracts the
whole big-endian byte swapping away.

> colRope.c I think could be improved. I would really suggest storing a
> length with each C string.

No! C strings are only there for convenience, i.e. with string litterals. Not
for handling malloc'd C strings. This means that the following is OK:

Col_RopeConcat("a", "b");

Whereas this is not:

char *a = <some string>, *b = <some string>;
Col_RopeConcat(a, b);

So either ropes are pure C string literals (typically short), or they eventually
get stored within rope structures that include a length field. malloc'd strings
must be converted to ropes, actually the whole point of ropes is to replace
dynamic strings.

> Col_RopeLength():
>
> if ((*data2 & 0xC0) != 0x80) {length++;}
>
> Reusing Tcl's 0xc0 0x80 hack to have the ASCII NUL terminator translated to
> an invalid UTF-8 sequence is problematic. Please don't do that.

I already don't ;-) The above code only checks whether the character is the
beginning of a multibyte sequence (it tests the msb), following the UTF-8 format.

> I think
> it leads to problems, especially if the string is used somewhere you don't
> expect.

I don't get this, could you explain? Colibri ropes don't check for UTF-8
correctness, it assumes that the UTF-8 strings it gets are properly validated by
the caller.

> If you're going to do Cloverfield right, then please use size_t
> for string lengths, and avoid using strlen(). strlen() also doesn't scale
> to very large strings very well.

Same remark as above, C strings are only supposed to be string literals.

Moreover, I think I've used size_t everywhere. If not, can you give me the
places where I failed to do so, so I can fix them?

(last point: this is Colibri here, not Cloverfield ;)

> You're using malloc.h (which isn't standard or portable). Standard code
> uses stdlib.h for malloc().

OK, I'll fix that.

> The code uses alloca. alloca can fail silently and result in faults in
> various areas, when storing values to the pointer returned by alloca.
> alloca is also known to disable some optimizations with GCC and other
> compilers. alloca is also not a standard function, so systems aren't
> required to support it. It also can lead to security problems in some
> cases.

The only place where alloca() is used are in variadic versions of procs that
otherwise accept an array of arguments. As argument lists are typically small, I
believe it's safe to use alloca() there, because the goal is to stack-allocate
argument arrays that don't get stored by the callee. But if this causes
portability problems, I'll simply rewrite these variadic procs, I don't want to
malloc temporary arrays in convenience procs.

> The code doesn't check the return value or report errors when VirtualAlloc()
> fails.

That's part of the whole "exception handling is poor" stuff. At this point I've
no idea how to deal with out-of-memory exception: panic and exit, or use
registered callbacks? But given the way it is implemented, such errors only
occur when the memory space is exhausted, so there isn't much to do. This
differs from malloc where allocation may fail due to heap fragmentation.

--
Fred

Frédéric Bonnet

unread,

Apr 17, 2009, 5:46:58 AM4/17/09

to

George Peter Staplin wrote:
> Please do *not* add thread support. I have a feeling that would muck it all
> up.

Thread support will only consist of thread-local storage instead of global
structures. I don't plan to move away from the apartment model, for various
reasons (later on in this thread).

> Are you using mmap() in unix and VirtualAlloc() in Windows for your page
> allocation?

Yes, that's the plan. Colibri doesn't use malloc or heaps, save for a handful of
structures, so it uses low-level system pages.

> Why aren't you using MinGW or something like Open Watcom (which I think
> works well in Windows)?

My primary environment has always been MSVC++. However I don't think there are
many obstacles to porting to other environments, just that I don't have the time
to work on the build system. It's an area where I totally suck, and where others
do a better job than I.

In case you wonder, yes it's a call for contributions ;-)

>> The code is fairly portable on 32-bit systems. 64-bit support will need
>> more work because all the internals are fine-tuned and optimized at the
>> bit level; however porting to 64-bit should be rather straightforward: the
>> algorithms will remain unchanged, structure access is abstracted behing
>> macros, and cell size is proportional to the word size (a cell should be
>> able to store 4 pointers, which add up to 16 bytes on 32-bit systems).
>
> Hmm... Why not write it in assembler if it's going to have such
> restrictions?

I don't see how assembler would help there.

>> GRAND UNIFICATION SCHEME
>> ========================
>>
>> A great side project would be to reimplement Tcl over Colibri as a
>> replacement for the current Tcl_Obj based code. Of course the resulting
>> library would be incompatible on an ABI level, but this would provide a
>> great testbed for Colibri in real-world cases (using pure script
>> applications) as well as a comparison point with the official Tcl core,
>> and will possibly open the path to Tcl9. This most likely involves a lot
>> of work.
>
> I am interested in this. I know that many hands make light work. I'll
> probably post a followup after reviewing your code further...

Thx!

Frédéric Bonnet

unread,

Apr 17, 2009, 5:52:59 AM4/17/09

to

George Peter Staplin wrote:
> We are after all talking about Cloverfield too, and Cloverfield has already
> taken a radical departure from the Tcl syntax in some ways.

Well, this is Colibri, and I'm taking great cares to separate core from
syntactic issues, among others. Even if I use Colibri as part of the Cloverfield
project, and especially to implement a Tcl-like language (which almost certainly
won't be named Cloverfield), it makes no assumption otherwise and can be used by
Tcl or other projects independently from my own work.

Cloverfield is a project, not a language or a library.

Frédéric Bonnet

unread,

Apr 17, 2009, 6:04:22 AM4/17/09

to

Back to the OT: I don't plan to use a threading model in Colibri other than
apartment, for the following reasons:

- global locks kill perfs
- in Tcl, objects are bound to one thread and don't get passed around, and Tcl
is the main target for Colibri
- heap fragmentation is not an issue as Colibri uses pages and makes the
best use of them
- per-thread overhead and allocation granularity are very small
- managing and GC'ing several smaller pools in distinct threads concurrently
is better than freezing the whole app for a longer time on a larger dataset.

Bottom line: per-thread allocator and GC are fine for our purpose. The only
remaining problem is object cloning when passing messages around, but this is
pretty trivial compared to the coding hell that is multi-threaded memory management.

George Peter Staplin

unread,

Apr 17, 2009, 6:52:37 AM4/17/09

to

Frédéric Bonnet wrote:

> George Peter Staplin wrote:
> > This code in colInt.h:
> > #define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^=

> > #(b)))

> > #define SWAP_PTR(a, b) (((*(int *) &(a)) ^= (*(int *) &(b))),
> > ((*(int *) &(b)) ^= (*(int *) &(a))), ((*(int *) &(a)) ^= (*(int *)
> > &(b))))
> >
> > says to me: "why!?" Really why not use a temporary variable?
>
> That's the whole point actually. For example, SWAP() is used in other
> macros, and so using temp variables in unpractical. FYI I got this
> technique from the following page:
>
> http://graphics.stanford.edu/~seander/bithacks.html#SwappingValuesXOR

It's probably more costly in terms of performance, because it involves more
CPU instructions for the XOR operations, and data dependencies. A good
optimizing compiler can just store the temporary variable's value in a
register.

>
> > Your code is
> > indeed non-portable from what I see. This kind of interpretation of
> > pointers as int is really not ideal. If you must do that use an
> > *optional* C99 feature that allows pointers to be represented in
> > integer types called intptr_t or uintptr_t.
>
> But what if the compiler doesn't support C99? I guess I'll have to define
> the missing types in the platform stuff. But thanks for your suggestion,
> I'll cleanup the code accordingly. I've read somewhere that ints on 64-bit
> are still 32 bits, so I'd have to change this when porting to 64 bit
> systems anyway.

While int is generally 32-bit, long is also 32-bit on some 64-bit systems,
such as Windows 64. Most other systems make long be 64-bit, but there were
problems with Windows that made that more difficult.

Tcl uses intptr_t and uintptr_t, and has since I believe 8.5, and possibly
even in 8.4. I'm not really a proponent of this -- just stating the facts.

So in essence Windows 64 would report the following with the sizeof
operator:

int = 4
long = 4
void * = 8

On most other systems, such as BSD, Linux...

int = 4
long = 8
void * = 8

> > colRope.c I think could be improved. I would really suggest storing a
> > length with each C string.
>
> No! C strings are only there for convenience, i.e. with string litterals.
> Not for handling malloc'd C strings. This means that the following is OK:
>
> Col_RopeConcat("a", "b");
>
> Whereas this is not:
>
> char *a = <some string>, *b = <some string>;
> Col_RopeConcat(a, b);
>
> So either ropes are pure C string literals (typically short), or they
> eventually get stored within rope structures that include a length field.
> malloc'd strings must be converted to ropes, actually the whole point of
> ropes is to replace dynamic strings.

OK, fair enough. Thanks for explaining.

> > Col_RopeLength():
> >
> > if ((*data2 & 0xC0) != 0x80) {length++;}
> >
> > Reusing Tcl's 0xc0 0x80 hack to have the ASCII NUL terminator
> > translated to
> > an invalid UTF-8 sequence is problematic. Please don't do that.
>
> I already don't ;-) The above code only checks whether the character is
> the beginning of a multibyte sequence (it tests the msb), following the
> UTF-8 format.

Ah, I see.

> > I think
> > it leads to problems, especially if the string is used somewhere you
> > don't expect.
>
> I don't get this, could you explain? Colibri ropes don't check for UTF-8
> correctness, it assumes that the UTF-8 strings it gets are properly
> validated by the caller.

Well, as you may know some Tcl APIs like Tcl_GetString() are used with
generic Tcl_Obj. Some of those Tcl_Obj may have had their value
constructed from a ByteArray representation (which could contain bytes with
0 values that don't work well when embedded in data with strlen() and
NUL-terminated string APIs). So the Tcl workaround for this is to use 0xc0
0x80 (an invalid UTF-8 sequence). When the string value is converted
(shimmers) back to a ByteArray the 0xc0 0x80 become the ASCII '\0'/NUL
again.

See: http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8

So the problem is that if a Tcl program happens to write out some of this
invalid sequence other programs may do strange things with the invalid
UTF-8.

If the 0xc0 0x80 makes it to the file system in a file name, other
applications may have issues with the file name. If used with X11 strings,
other problems can occur too.

>> You're using malloc.h (which isn't standard or portable). Standard code
>> uses stdlib.h for malloc().
>
> OK, I'll fix that.
>
>> The code uses alloca. alloca can fail silently and result in faults in
>> various areas, when storing values to the pointer returned by alloca.
>> alloca is also known to disable some optimizations with GCC and other
>> compilers. alloca is also not a standard function, so systems aren't
>> required to support it. It also can lead to security problems in some
>> cases.
>
> The only place where alloca() is used are in variadic versions of procs
> that otherwise accept an array of arguments. As argument lists are
> typically small, I believe it's safe to use alloca() there, because the
> goal is to stack-allocate argument arrays that don't get stored by the
> callee. But if this causes portability problems, I'll simply rewrite these
> variadic procs, I don't want to malloc temporary arrays in convenience
> procs.

If you're using alloca() you could probably get away with this type of
feature in C99, that has also been a feature of some compilers since before
1999:

void foo(int n) {
int length = n * 2;
int array[length];
...
}

In C99 there are more advanced things you can do than that with the VLA
(variable length array) feature.

>
>> The code doesn't check the return value or report errors when
>> VirtualAlloc() fails.
>
> That's part of the whole "exception handling is poor" stuff. At this point
> I've no idea how to deal with out-of-memory exception: panic and exit, or
> use registered callbacks? But given the way it is implemented, such errors
> only occur when the memory space is exhausted, so there isn't much to do.
> This differs from malloc where allocation may fail due to heap
> fragmentation.

I think having the callback is a good idea. Depending on what the user is
building with Colibri they may need to have the ability to do something
else when an out of memory condition occurs, but I would make the default
abort().

-George

Larry W. Virden

unread,

Apr 17, 2009, 7:26:30 AM4/17/09

to

On Apr 16, 10:19 am, Frédéric Bonnet <fredericbon...@free.fr> wrote:
> ANNOUNCE: Colibri version 0.2

> * support for the full Unicode range (up to 32-bit codepoints)

> * 1/2/4-byte fixed-width encodings as well as variable-width UTF-8 with
> transparent conversions and access
>

Wonderful! Many of Colibri's features have tremendous possibilities
for performance, etc.

But these particular features address one of Tcl's limits that full
text online and print publishers encounter - while Tcl's unicode
support is certainly better than many other development languages out
there, when someone has to deal with the larger unicode character
set , the limits of Tcl are encountered. I remember discussions in
Tdom about this over the past couple of years.

While there may be lots of discussion about performance and memory
use, right now, Colibri is the only case of which I am aware actively
working to solve the "wider Unicode" situation.

Congratulations on your progress!

dkf

unread,

Apr 17, 2009, 9:56:06 AM4/17/09

to

On 17 Apr, 11:04, Frédéric Bonnet <fredericbon...@free.fr> wrote:
> Bottom line: per-thread allocator and GC are fine for our purpose.

I think I'd agree with that, though there has been quite a bit of
research on threaded GC (especially by the Java and C# people).
Perhaps some of that can be repurposed? It's not an important thing
though.

Donal.

dkf

unread,

Apr 17, 2009, 10:13:50 AM4/17/09

to

On 17 Apr, 11:52, George Peter Staplin <georg...@xmission.com> wrote:

> Frédéric Bonnet wrote:
> > That's the whole point actually. For example, SWAP() is used in other
> > macros, and so using temp variables in unpractical. FYI I got this
> > technique from the following page:
>
> >http://graphics.stanford.edu/~seander/bithacks.html#SwappingValuesXOR
>
> It's probably more costly in terms of performance, because it involves more
> CPU instructions for the XOR operations, and data dependencies. A good
> optimizing compiler can just store the temporary variable's value in a
> register.

It's the data dependencies that are the problem, unless the compiler
is really good at optimization and recognizes the XOR trick. OTOH, a
tempvar shuffle is much easier for the compiler to do the right thing
with. (I'd just use a "standard" temporary for these things.)

> > But what if the compiler doesn't support C99? I guess I'll have to define
> > the missing types in the platform stuff. But thanks for your suggestion,
> > I'll cleanup the code accordingly. I've read somewhere that ints on 64-bit
> > are still 32 bits, so I'd have to change this when porting to 64 bit
> > systems anyway.

C99 integral types are fairly widely supported; they're just a header.
Some of the other features might be less common.

> While int is generally 32-bit, long is also 32-bit on some 64-bit systems,
> such as Windows 64. Most other systems make long be 64-bit, but there were
> problems with Windows that made that more difficult.

There are two main categories of 64-bit system out there,
characterized by the sizes of certain types. As you note above, Win is
P64 (64-bit pointers) whereas Unixes have now settled on LP64 (64-bit
longs and pointers). You even occasionally see ILP64, but they're
really odd.

Tcl gets many things in its API completely wrong in this area, but I
declined to fix any of them in the 8.* series since they're extremely
disruptive of the ABI. Colibri does not need to make those mistakes!

> If the 0xc0 0x80 makes it to the file system in a file name, other
> applications may have issues with the file name. If used with X11 strings,
> other problems can occur too.

OTOH, it's still not safe to assume that the OS uses UTF-8 as its
system encoding (except on OSX) so you're still stuck with internal/
external translation schemes. Which is a great time to make sure that
c0,80 sequences don't infect the outside world.

> > That's part of the whole "exception handling is poor" stuff. At this point
> > I've no idea how to deal with out-of-memory exception: panic and exit, or
> > use registered callbacks? But given the way it is implemented, such errors
> > only occur when the memory space is exhausted, so there isn't much to do.
> > This differs from malloc where allocation may fail due to heap
> > fragmentation.
>
> I think having the callback is a good idea. Depending on what the user is
> building with Colibri they may need to have the ability to do something
> else when an out of memory condition occurs, but I would make the default
> abort().

Depends if it is a "small" or a "large" allocation that fails. Large
ones tend to be recoverable from, but if you've not enough space to
even allocate something small, you're truly stuck as any recovery
strategy won't have enough space to work with. (What counts as "small"
and "large" is also unclear, as is the effect of other processes on
this.) Allocation failures are a PITA...

Donal.

dkf

unread,

Apr 17, 2009, 10:17:21 AM4/17/09

to

On 17 Apr, 12:26, "Larry W. Virden" <lvir...@gmail.com> wrote:
> But these particular features address one of Tcl's limits that full
> text online and print publishers encounter - while Tcl's unicode
> support is certainly better than many other development languages out
> there, when someone has to deal with the larger unicode character
> set , the limits of Tcl are encountered. I remember discussions in
> Tdom about this over the past couple of years.

The problem is that going further than we've got to now really
requires quite a lot of work to do normalization forms, etc. Not
impossible, but people have been short of effort; this is especially
the case for Kevin Kenny, who understands what's needed far more than
I do. I do understand that he's been thinking in terms of ropes too,
but how this has similarities with Frédéric's work, I really don't
know.

Donal.

Larry W. Virden

unread,

Apr 17, 2009, 11:29:21 AM4/17/09

to

I apologize if my response was tinting to indicate criticism of the
TCT. That was not my intent. I understand that there are limited hours
for support of Tcl, and that expanded Unicode is likely to be a lower
priority than other new features, and of course new features are lower
priority than correcting at least some percentage of the outstanding
bugs.

I was just excited to see someone doing some research work that might
lead to future improvements.

I agree that there are a number of issues that must be addressed to
adequately deal with providing Tcl's quality support to wider Unicode
characters.

Kevin Kenny

unread,

Apr 17, 2009, 11:02:51 PM4/17/09

to

George Peter Staplin wrote:

> If the 0xc0 0x80 makes it to the file system in a file name, other
> applications may have issues with the file name. If used with X11 strings,
> other problems can occur too.

Off topic:

If UTF-8 makes it to the file system without being converted to
[encoding system], or to X11 without being converted to the encoding
of the font or the system encoding of the X server, problems are
inevitable. Any string that's going to an external program has to
be converted to that program's encoding. The conversion of Tcl's
UTF-8 to external UTF-8 is not an identity.

I have several reasons for wanting (moving forward) to make it even
less of an identity. I want to hold Tcl's strings in NFC (it's
compact, it compares gracefully, and it's what X11 wants), but
file names on a MacOSX box are NFD, so we need that conversion.
In any case, I'm tired of the fact that Tcl considers \u00e0 to
be a different character from \u0061\u0300 - they are the same
letter and supposed to compare alike. (And before anyone jumps
in to complain, this change will not affect the interpretation
of any character in the range \x00-\xff, so your binary strings
are safe.)

--
73 de ke9tv/2, Kevin

dkf

unread,

Apr 18, 2009, 1:11:32 PM4/18/09

to

On 17 Apr, 16:29, "Larry W. Virden" <lvir...@gmail.com> wrote:
> I apologize if my response was tinting to indicate criticism of the
> TCT. That was not my intent.

I certainly wasn't interpreting it as criticism. I was just explaining
how things are in the hope that it would interest some people. (For
the record, I don't understand unicode normalization at the moment.
This is because I've saved the brain-cells for other things exactly
because I trust others to have that understanding for me.)

Donal.

Frédéric Bonnet

unread,

Apr 18, 2009, 5:58:32 PM4/18/09

to

dkf wrote:
> On 17 Apr, 11:04, Frédéric Bonnet <fredericbon...@free.fr> wrote:
>> Bottom line: per-thread allocator and GC are fine for our purpose.
>
> I think I'd agree with that, though there has been quite a bit of
> research on threaded GC (especially by the Java and C# people).

Indeed. Most use heap based allocation. As heaps are usually contiguous memory
areas, fragmentation must be kept low, and you don't want to manage separate
heaps in concurrent thread because fragmentation would waste too much resources
and lead to memory exhaustion. Besides, data can be alloc'd in one thread and
freed in another. For these reasons, multi-threading support is critical.

However Colibri uses paged allocation, which prevents large-scale fragmentation
and allows discontinuous regions. So memory resource allocation is optimal
across threads because each instance only allocates one page at a time (an
atomic operation). So the most compelling reason for multi-threading support is
gone. The only question mark left is the threading model. Appartment implies
message passing with data duplication. Shared data would require a mutex at the
cell allocation and GC levels. The former is doable but the latter is far from
trivial, and can lead to pathological cases.

However, even if Colibri ropes and words are thread-local, there are still ways
to use shared data models thanks to custom types. For example, a malloc'd area
(or any other externally managed resource) could be shared by several custom
ropes on potentially distinct threads. So we get the best of both worlds.