Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion ANNOUNCE: Colibri version 0.5
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Frédéric Bonnet  
View profile  
 More options Oct 8 2010, 8:44 am
Newsgroups: comp.lang.tcl
From: Frédéric Bonnet <fredericbon...@free.fr>
Date: Fri, 08 Oct 2010 14:44:21 +0200
Local: Fri, Oct 8 2010 8:44 am
Subject: ANNOUNCE: Colibri version 0.5
ANNOUNCE: Colibri version 0.5
=============================
As part of my ongoing work on the Cloverfield project, I am pleased to announce
the fifth version of Colibri.

CHANGES SINCE VERSION 0.4
=========================

1. Removed sequence type. It was awkward, hard to use, and didn't quite fit
    the purpose. Besides, doing a mutable version was too complicated. Replaced
    by list type improvements.

2. Improved list type.

   - support for cyclic lists: each list has a loop length field that, when
     non-zero, gives the length of the terminating loop. Cyclic loops can also
     easily be detected during iteration thanks to the iterator index.
   - support for sparse lists: they can now contain void parts (i.e. whose
     elements are all nil) of arbitrary lengths at the cost of a single machine
     word.

3. Added mutable vector type.

   - can grow and shrink arbitrarily within a reserved size set at creation time
   - can be "frozen" and downgraded to immutable vectors at no cost.

4. Preliminary work for mutable list type.

   - will use the new sparse list facility.
   - will downgrade to immutable form easily.

5. Basic exception handling.

   - catches critical conditions and validation errors with meaningful message
   - default handler calls abort(), can be overridden by user proc.

WHAT IS CLOVERFIELD?
====================

Wiki page: http://wiki.tcl.tk/Cloverfield

WHAT DOES COLIBRI STAND FOR?
============================

Colibris, known in English as hummingbirds, are a family of birds known for
their small size and high wing speed. The bee hummingbird (Mellisuga helenae),
found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total
length of about 5cm. They are renown for their stationary and backward flight
abilities on par with flying insects, which allow them to feed on the nectar of
plants and flowers in-flight.

I've chosen this name for this project because its goal is to be fast and
lightweight, and to follow the feather and bird theme shared with Tcl and many
related projects.

WHAT IS COLIBRI?
================

Colibri is a string and data type infrastructure. It features:

  - Rope-based strings (see http://wiki.tcl.tk/20690) :

      * ropes are immutable
        ... but ...
      * extraction, insertion, deletion, concatenation... are fast
      * limited compatibility with native C strings allocated in static space
        (or guaranteed to survive the whole application lifetime)
      * support for the full Unicode range (up to 32-bit codepoints)
      * 1/2/4-byte fixed-width encodings as well as variable-width UTF-8 with
        transparent conversions and access
      * custom rope types allows for lazy or procedural representations
      * iterator interface for easy access

  - Extensible data type sytem dubbed "words"

      * similar to Tcl_Obj
      * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes, but can
        take more space if needed
      * words can have synonyms, i.e. words of any type
      * synonyms can form chains of arbitrary length, so a word can have several
        synonyms with different representations -- no more shimmering!
      * predefined types such as ints, single chars and small strings (up to 3
        chars on 32-bit systems) are represented as immediate values: the value
        is stored directly in the pointer whenever possible rather than in an
        allocated structure. Everything is abstracted behind accessors.
      * several high level datastructures are provided, such as vectors (flat
        arrays) and lists (binary trees of vectors with support for cycles and
        sparse representation), along with an easy to use iterator interface

  - Fast and efficient cell-based allocator

      * page-based allocation for optimal locality of reference and cache use
      * 16-byte cells on 32-bit systems fit most needs, but elements can allocate
        more cells if needed (up to 63 cells on 32-bit systems)
      * overhead is small: 2 bits per 16-byte cell.
      * raw alloc performances are competitive with the stock system malloc,
        and in many cases much faster (up to 5 times as fast on small strings and
        on words vs. Tcl_Obj-like structs)
      * single cell allocation (the most frequent case) is very fast

  - Automatic memory management thanks to an exact (AKA accurate or precise),
    generational, copying, mark-and-sweep, garbage collector

      * exact GC implies that roots (externally referenced elements) and
        parent-child relations must be declared by the application, using a
        simple API
      * custom types can define a finalizer; one of the applications is to plug
        in externally managed structures (e.g. malloc/free)
      * the GC process is fully controllable (pause/resume) so that the
        application don't get interrupted unexpectedly
      * generational GC limits the overhead by restricting the collection to
        newer elements, which are typically short-lived
      * longer-living elements are collected less often as they get promoted to
        older generations
      * promotion is done by moving whole pages between memory pools, optionally
        performing compaction when fragmentation exceeds a certain threshold,
        thus limiting the overhead and improving the cache-friendliness over time
      * contrary to reference-counting schemes (RC), GCs allow circular
        references without memory leaks; word synonym chains take advantage of
        this, being implemented as circular lists
      * studies show that GCs consistently outperform RC in raw performances and
        real-world cases, because the cost (both space and time) of maintaining
        reference counters outweights the amortized GC overhead, especially with
        generational GCs

Colibri is written in plain C and is free of any dependency besides system
libraries. The compiled binary DLL on Windows is about 40kB. The source code is
heavily commented and follows the Tcl quality standards.

HOW DOES COLIBRI RELATE TO TCL?
===============================

 From the Cloverfield announcement:

"   The last major point of the project is related to implementation and low
     level issues. The existing Tcl implementation is notoriously solid,
     however some changes are too radical to be performed on the current code
     base, so a reimplementation is certainly needed on vast portions. For
     example, the string representations, object structures, and virtual
     machines will certainly require complete rewrite to accommodate with the
     needed changes, which means that a lot of code won't be reused. However
     vast portions of library code and algorithms certainly will (clock scan,
     bigint, regexp...), as well as the high coding standards and QA that are
     the trademarks of our beloved language.  "

So Colibri is a candidate infrastructure for Tcl9 as an alternative to the
current Tcl_Obj-based core implementation. I believe that the features provided
by Colibri shall yield higher levels of performances than the current
architecture, at the price of an ABI incompatibility (for which major versions
are made anyway), but with a similar philosophy that should ease conversion of
existing code.

PLANNED FEATURES FOR NEXT VERSIONS
==================================

1. Mutable data types with graceful degradation to immutable forms:

  - Mutable lists.
  - Maps (AKA dicts) with string keys. Candidate models include Patricia trees
    and Ternary Search Trees, but not hash tables.

2. Proper internal and user documentation

3. Better test suite

WHAT NEEDS TO BE DONE?
======================

My main development platform is Windows, so the source archive primarily
includes Microsoft Visual Studio project files. Microsoft provides a free
edition of their development environment known as Visual Studio Express for
those willing to compile and test the library without having to buy an expensive
license. Other systems need a makefile and/or autoconf scripts.

I also uses an Ubuntu Karmic Koala 9.10 system for Linux development, so the
archive also includes minimalist GNU makefiles for building using the included
GCC compiler. However it makes no use of other parts of the GNU toolchain
(autoconf and the like).

The code is fairly portable on 32-bit systems. 64-bit support will need more
work because all the internals are fine-tuned and optimized at the bit level;
however porting to 64-bit should be rather straightforward: the algorithms will
remain unchanged, structure access is abstracted behing macros, and cell size
is proportional to the machine word size (a cell should be able to store 4
words, which add up to 16 bytes on 32-bit systems).

The only part that needs platform-specific code is the low-level page allocator.
Colibri needs system calls that allocate boundary-aligned pages. At present both
Windows and Unix (Linux) version is provided, the latter using mmap. Porting to
other systems should require only minimal effort, as the platform-specific code
is limited to a handful of functions gathered in a platform-specific source
tree. Platform-specific peculiarities should not impact the overall
architecture.

Tests have been run extensively during development, both on stability and
performance. However Colibri lacks a real test suite (including unit testings).
The source includes a test application that has to be hand-edited to run or
customize specific tests. Note that some stress tests are configured in such a
way that they require a system with a large memory size (2GB), so make sure to
tune their parameters before running them.

Last, it lacks user and design documentation, although the source code is
extensively commented.

GRAND UNIFICATION SCHEME
========================

A great side project would be to reimplement Tcl over Colibri as a replacement
for the current Tcl_Obj based code. Of course the resulting library would be
incompatible on an ABI level, but this would provide a great testbed for
Colibri in real-world cases (using pure script applications) as well as a
comparison point with the official Tcl core, and will possibly open the path to
Tcl9. This most likely involves a lot of work.

WHERE CAN IT BE FOUND?
======================

Wiki page: http://wiki.tcl.tk/Colibri
Project page @ SourceForge.net: http://sourceforge.net/projects/tcl9/
Mailing list: http://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri
Direct Download:
     - source:
http://sourceforge.net/projects/tcl9/files/colibri/colibri0.5/colibri...
     - Windows binary:
http://sourceforge.net/projects/tcl9/files/colibri/colibri0.5/colibri...
     - Linux binary (x86):
http://sourceforge.net/projects/tcl9/files/colibri/colibri0.5/colibri...

LICENSE
=======

The license is the same as for Tcl.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.