Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

TIP #317: Extend binary Ensemble with Binary Encodings

13 views
Skip to first unread message

Donal K. Fellows

unread,
May 3, 2008, 9:26:53 AM5/3/08
to

TIP #317: EXTEND BINARY ENSEMBLE WITH BINARY ENCODINGS
========================================================
Version: $Revision: 1.1 $
Author: Pat Thoyts <patthoyts_at_users.sourceforge.net>
State: Draft
Type: Project
Tcl-Version: 8.6
Vote: Pending
Created: Saturday, 03 May 2008
URL: http://www.tcl.tk/cgi-bin/tct/tip/317.html
Post-History:

-------------------------------------------------------------------------

ABSTRACT
==========

This TIP extends the *binary* command with implementations in C of
commonly used binary encodings. In particular the /base64/ encoding
is
implemented but the Tcl ensemble scheme [TIP #112] can be used to
provide simple extension of the implemented formats.

SPECIFICATION
===============

The *binary* command ensemble will be extended to include two new
subcommands, *encode* and *decode*. Each subcommand will accept two
arguments. The first is the name of an encoding format and the
second
is the data to be operated upon.

*binary encode* /format data/

*binary decode* /format data/

In keeping with the nature of the *binary* command, the /data/
argument
is treated as a byte array. This means that users should ensure
their
data is already in a suitable character encoding before applying a
binary encoding. This is already a requirement for other
implementations of this functionality (e.g. the tcllib and Trf
packages).

The initial set of binary encodings consists of *base64*, *uuencode*
and *hex*. The implementation of the *encode* and *decode*
subcommands
will make use of the Tcl ensemble command mechanism ([TIP #112]) and
will therefore be extensible via the ensemble mechanism.

REFERENCE IMPLEMENTATION
==========================

A patch against the Tcl HEAD (8.6) is located at
<URL:http://sf.net/tracker/?
func=detail&aid=1956530&group_id=10894&atid=310894>

COPYRIGHT
===========

This document has been placed in the public domain.

-------------------------------------------------------------------------

TIP AutoGenerator - written by Donal K. Fellows

USCode

unread,
May 5, 2008, 7:17:33 PM5/5/08
to
Donal K. Fellows wrote:
> TIP #317: EXTEND BINARY ENSEMBLE WITH BINARY ENCODINGS
...

>
> This TIP extends the *binary* command with implementations in C of
> commonly used binary encodings. In particular the /base64/ encoding
> is
> implemented but the Tcl ensemble scheme [TIP #112] can be used to
> provide simple extension of the implemented formats.
>
...

> The initial set of binary encodings consists of *base64*, *uuencode*
> and *hex*. The implementation of the *encode* and *decode*
> subcommands
> will make use of the Tcl ensemble command mechanism ([TIP #112]) and
> will therefore be extensible via the ensemble mechanism.

What is the purpose of moving these into the core vs. enhancing tcllibc
with base64 and keeping the other encodings there as well? Seems like
something like this that has the potential to become stale (
http://en.wikipedia.org/wiki/Uuencode ) should stay out of the core and
remain in a separate package?

Jeff Hobbs

unread,
May 5, 2008, 7:24:21 PM5/5/08
to
> something like this that has the potential to become stale (http://en.wikipedia.org/wiki/Uuencode) should stay out of the core and

> remain in a separate package?

base64 has the potential to become stale?

Jeff

USCode

unread,
May 6, 2008, 12:06:34 AM5/6/08
to
=)
Well, I don't suppose base64 or hex will, but uuencode is falling out of
favor, isn't it? Maybe not, I could be wrong, and it wouldn't be the
first time! BUT *incremental* core bloat is a bad thing so shouldn't
every new feature be heavily scrutinized as to whether it REALLY needs
to be in the core or just implemented as an extension? Lean and mean.
Sounds like this is a case where the feature passed that scrutiny ...

Pat Thoyts

unread,
May 6, 2008, 6:17:34 AM5/6/08
to
USCode <do...@spamon.me> writes:

The difference between uuencode and base64 is a solely the table of
output chars. UUencode uses a different set from base64. The encode is
the same function. So the bloat is 65 bytes.
--
Pat Thoyts http://www.patthoyts.tk/
To reply, rot13 the return address or read the X-Address header.
PGP fingerprint 2C 6E 98 07 2C 59 C8 97 10 CE 11 E6 04 E0 B9 DD

Torsten Berg

unread,
May 6, 2008, 8:24:42 AM5/6/08
to

>  TIP #317: EXTEND BINARY ENSEMBLE WITH BINARY ENCODINGS
> ========================================================
>  Version:      $Revision: 1.1 $
>  Author:       Pat Thoyts <patthoyts_at_users.sourceforge.net>
>  State:        Draft
>  Type:         Project
>  Tcl-Version:  8.6
>  Vote:         Pending
>  Created:      Saturday, 03 May 2008
>  URL:          http://www.tcl.tk/cgi-bin/tct/tip/317.html
>  Post-History:

I have made a package from this code for testing and playing around
(with Tcl 8.4). If someone is interested: http://tcl.typoscriptics.de/misc/recode.c

It adds the two commands [encode] and [decode6] to Tcl and you use
them like this (similar to the proposed syntax in the TIP):

encode uu "data to be uuencoded"
decode hex "data to be recoded from hex"
encode base64 "data to be base64 encoded"

Very nice job, Pat! Thanks for this one. I'd vote yes, if I was allowd
to.

I had to do some adjustments to make it compile without warnings on my
Mac with gcc 4.0.1. The warning 'pointer targets in assignment differ
in signedness' went away when replacing

unsigned char *data = NULL;

with

char *data = NULL;

in some functions.


Torsten

USCode

unread,
May 6, 2008, 12:51:08 PM5/6/08
to
Pat Thoyts wrote:
> The difference between uuencode and base64 is a solely the table of
> output chars. UUencode uses a different set from base64. The encode is
> the same function. So the bloat is 65 bytes.
Wow, downright trim! Well it sounds like this one is a go! =)

USCode

unread,
May 6, 2008, 6:02:55 PM5/6/08
to
Pat Thoyts wrote:
> The difference between uuencode and base64 is a solely the table of
> output chars. UUencode uses a different set from base64. The encode is
> the same function. So the bloat is 65 bytes.

Thanks Pat. But why the TIP to put it in the core vs. keeping it in
tcllibc?

Andreas Leitgeb

unread,
May 7, 2008, 3:18:30 AM5/7/08
to
Pat Thoyts <cngg...@hfref.fbheprsbetr.arg> wrote:
> The difference between uuencode and base64 is a solely the table of
> output chars. UUencode uses a different set from base64. The encode is
> the same function. So the bloat is 65 bytes.

Are you sure? I vaguely remember some uuencode format, where each
line was preceded with a char that indicated the length of the line.

But then again, that might be outside the scope of the actual
encoding/decoding algorithm.

schlenk

unread,
May 7, 2008, 4:42:37 AM5/7/08
to

If base64 gets into the core (which is likely, because Tk includes a
base64 thing for GIF image data) it would be even more bloat to put it
into tcllibc (and most of the hashes are already in tcllib as c code
for example md5).

But that raises another question:
- How should an extension author support extensions of such a built in
ensemble?
a) Register some sub sub commands on load
b) Provide a command to register the appropriate parts on request
c) Let the user figure it out alone

Michael

Donal K. Fellows

unread,
May 7, 2008, 6:20:23 AM5/7/08
to
schlenk wrote:
> How should an extension author support extensions of such a built in
> ensemble?
> a) Register some sub sub commands on load
> b) Provide a command to register the appropriate parts on request
> c) Let the user figure it out alone

Easy. It's option a), always. Lazy loading when someone does a
[package require] is almost certainly a bad idea.

Donal.

schlenk

unread,
May 7, 2008, 7:16:35 AM5/7/08
to

Hmm, maybe i did not make clear what i really was trying to say, some
example might help:

a)
package require asn 2.0
binary decode asn $block

b)
package require asn 2.0
asn::register_decoders
binary decode asn $block

c)
package require asn 2.0
namespace ensemble ...
binary decode asn $block

So basically would it be a) or b) with those examples, neither do
'lazy loading'.

Michael

Donal K. Fellows

unread,
May 7, 2008, 9:21:09 AM5/7/08
to
schlenk wrote:
> So basically would it be a) or b) with those examples, neither do
> 'lazy loading'.

I'd go with a) there again. :-)

Donal.

USCode

unread,
May 7, 2008, 1:05:29 PM5/7/08
to
schlenk wrote:
> If base64 gets into the core (which is likely, because Tk includes a
> base64 thing for GIF image data) it would be even more bloat to put it
> into tcllibc (and most of the hashes are already in tcllib as c code
> for example md5).

But why put them in the core in the first place if the other encodings
have been living happily in tcllibc up to this point? Or maybe they
haven't? So what's the driving factor to move them to the core?
I'm not arguing either way, just not sure what the
motivation/justification is? There must be some reason driving it.

Pat Thoyts

unread,
May 7, 2008, 8:59:01 PM5/7/08
to
USCode <do...@spamon.me> writes:

Why the heck not? Its a small amount of code and provides a fast
method to do a standard job. I can add the code to tcllib's critcl
extension too - it can't count as bloat as its too small.

I actually wrote the code after seeing some comments asking for this
or something similar in tcl-core and this seems a good way to plug it
in. I should check Tk's usage of this too - possibly we could use a
more direct C API into this too.

USCode

unread,
May 8, 2008, 12:01:55 AM5/8/08
to
Pat Thoyts wrote:

> USCode <do...@spamon.me> writes:
> Why the heck not? Its a small amount of code and provides a fast
> method to do a standard job. I can add the code to tcllib's critcl
> extension too - it can't count as bloat as its too small.
>
> I actually wrote the code after seeing some comments asking for this
> or something similar in tcl-core and this seems a good way to plug it
> in. I should check Tk's usage of this too - possibly we could use a
> more direct C API into this too.
>
Just seems like something that doesn't belong in the core language, just
as you wouldn't want to put something like MIME or zlib into the core.

It's one of many available binary-to-text encoding schemes available,
with new ones likely to come down the road. It seems like its logical
home should remain in tcllibc, along with base32, uuencode, yEnc, etc.

Torsten Berg

unread,
May 8, 2008, 3:13:55 AM5/8/08
to
> Pat Thoyts wrote:
>   > Why the heck not? Its a small amount of code and provides a fast
> method to do a standard job. I can add the code to tcllib's critcl
> > extension too - it can't count as bloat as its too small.

I have to agree. The definition of 'bloat' is blurry. Who defines how
much bloat is? 20k, 200k, or 2M? This cannot be the point unless the
TIP would propose some non-standard feature, that is unlikely to be
used by many (oh, well, a matter of definition again). But the TIP
provides a framework for encoding and decoding of data, alongside with
some common implementations. I think this framework in itself is worth
having in the core. It makes extending the "encode/decode" subcommands
easy from the script level.

If I remember correctly, having the code in tcllib(c), you only
benefit from the speed-up, if you have a compiler on the machine
running the code. This may not always be the case.

On May 8, 6:01 am, USCode <d...@spamon.me> wrote:
> Just seems like something that doesn't belong in the core language, just
> as you wouldn't want to put something like MIME or zlib into the core.

Hm, there is TIP 234, which adds zlib support to the core. This is
another standard feature, I would love to have on board!! To have some
sort of compression algorithm built-in, is a plus. So why not zlib?
Many, many people use zlib.

But ok, I see the point. Next comes PDF support, printing, usb
communication, whatever. All standard features for a programming task.
All in the core? Perhaps we need a real tcllibc, a collection of
binary extensions that can be distributed alongside the core ... just
like tcllib, but not hidden behind the scenes but up-front.

Just my $0.02

Torsten

USCode

unread,
May 8, 2008, 11:58:33 AM5/8/08
to
Torsten Berg wrote:
>
> But ok, I see the point. Next comes PDF support, printing, usb
> communication, whatever. All standard features for a programming task.
> All in the core? Perhaps we need a real tcllibc, a collection of
> binary extensions that can be distributed alongside the core ... just
> like tcllib, but not hidden behind the scenes but up-front.
>
> Just my $0.02
>
> Torsten
Thanks Torsten. I think 'bloat' isn't the only concern here but a
byproduct of moving things into the core that, while convenient, aren't
in the long-term interests of Tcl (In my humble and insignificant
opinion of course as since I'm not a member of the TCT, my opinions are
merely academic!). Seems as if the Tcl core should provide the key
building blocks and infrastructure for building applications but not be
choosing the winners & losers of the various compression schemes,
encoding schemes, file output types, etc.

Kevin Kenny

unread,
May 8, 2008, 11:19:47 PM5/8/08
to
USCode wrote:
> Thanks Torsten. I think 'bloat' isn't the only concern here but a
> byproduct of moving things into the core that, while convenient, aren't
> in the long-term interests of Tcl (In my humble and insignificant
> opinion of course as since I'm not a member of the TCT, my opinions are
> merely academic!). Seems as if the Tcl core should provide the key
> building blocks and infrastructure for building applications but not be
> choosing the winners & losers of the various compression schemes,
> encoding schemes, file output types, etc.

So far, so good. But in the Core we use base64 ourselves, and it's
high time that was integrated at the script level. We also *want* to
have access to zlib for embedding virtual filesystems, so that
one is also likely to be bundled. In short, we should plan to
bundle the bits that the Core itself needs for other reasons.

--
73 de ke9tv/2, Kevin

USCode

unread,
May 9, 2008, 12:21:41 AM5/9/08
to
Thanks Kevin. Can you elaborate a bit more on the use of zlib for
embedding virtual filesystems?
Thanks!

Kevin Kenny

unread,
May 9, 2008, 1:04:53 AM5/9/08
to
USCode wrote:
> Thanks Kevin. Can you elaborate a bit more on the use of zlib for
> embedding virtual filesystems?

It's not my project, so I'm sort of sketchy on the details.
But the idea is to zip up things like the encoding tables,
the time zone tables, the message catalogs, and the library
scripts and embed them inside the DLL. Essentially, a core
version of Starkit technology. (Exported so that apps and
extensions can us it too!)

Pat Thoyts

unread,
May 10, 2008, 3:36:10 PM5/10/08
to
USCode <do...@spamon.me> writes:

>Thanks Torsten. I think 'bloat' isn't the only concern here but a
>byproduct of moving things into the core that, while convenient,
>aren't in the long-term interests of Tcl (In my humble and
>insignificant opinion of course as since I'm not a member of the TCT,
>my opinions are merely academic!). Seems as if the Tcl core should

Neither am I. Your opionion carries at least as much weight as my own.

>provide the key building blocks and infrastructure for building
>applications but not be choosing the winners & losers of the various
>compression schemes, encoding schemes, file output types, etc.

My impetus is practicality. In practice base64 is used a lot. In
practice this kind of encoding stuff is slow in pure-Tcl. In practice
large numbers of people for one reason or another do not make use of
extensions - so we get endless comments that tcl is slow to process
MIME data - or that it cannot process MIME messages over some limit or
whatever.

You mentioned zlib earlier. I want that in-core too.

USCode

unread,
May 10, 2008, 4:43:02 PM5/10/08
to
Pat Thoyts wrote:
> USCode <do...@spamon.me> writes:
>> Thanks Torsten. I think 'bloat' isn't the only concern here but a
>> byproduct of moving things into the core that, while convenient,
>> aren't in the long-term interests of Tcl (In my humble and
>> insignificant opinion of course as since I'm not a member of the TCT,
>> my opinions are merely academic!). Seems as if the Tcl core should
>
> Neither am I. Your opionion carries at least as much weight as my own.
>
Well I think your opinion definitely should more weight than mine,
you've been a key contributor and maintainer of many parts of Tcl/Tk and
I certainly appreciate it!!!

Fredderic

unread,
May 13, 2008, 2:37:57 PM5/13/08
to
On Sat, 10 May 2008 19:36:10 GMT,
Pat Thoyts <cngg...@hfref.fbheprsbetr.arg> wrote:

> You mentioned zlib earlier. I want that in-core too.

Having encode/decode in the [binary] command is all good and well if
you're processing an entire file, or some such. But what if you wish to
receive or send a base64 encoded stream (potentially un-ending).


I hope the underlying encode/decode framework will be designed to
operate in a stream-based fashion. ie. it would be divided into
separate functions;

1) set up and initialise a structure to hold any internal state.

2) read as much of the input data as possible, indicating how much of
it was consumed (ala [gets]), and deposit the converted data somewhere.

3) obtain the current state of the encoder (last-block-incomplete,
encoding error, etc.)

4) clean up the structure holding fore-mentioned internal state.

as well as any miscellaneous other useful stuff.

[binary] would then simply need to call each function in turn,
presumably indicating how many characters were actually consumed in
some fashion so that it can still be used as a poor-mans-substitute in
the absence of a proper channel stream-based approach. (Of course,
that means that a slightly more intelligent method of adding new
encodings would be preferable.)


That channel approach, on the other hand (and which can be implemented
at a later date), would invoke step 1 once at channel specification
time, step 2 repeatedly as new data is presented, and finally step 4
once when the channel is shut down, providing maximum through-put and
potentially the ability to produce some output from a partial input
block. CRLF translation, character encoding, and just about anything
else imaginable, would be re-implemented as stream filters not only
allowing them to be linked together in freakish new ways if someone so
desired, but also making eol translation available to the [binary]
command at the same time (perhaps it would be better provided by
[encoding]?).

This new channel mechanism could then provide the mechanics to
intelligently move data along a chain of these encodings (including, if
possible, handling the case where the data can be processed in-place
rather than being copied to another buffer). The channel mechanism,
when done right, would also allow either script or file/socket IO at
either end, obsoleting any special support for explicit pipes (which
may be used internally at either or both ends, or even elsewhere to
cross thread boundaries) or that evil [fcopy] command (which would now
become a wrapper to set up a channel with file IO drivers at both
ends, and its existing logic absorbed to provide file-to-file
background transfers). In the event that an encoding chain isn't
provided, the channel mechanism would simply construct one from
settings provided by [fconfigure] as soon as the channel used,
maintaining compatibility with the existing channel APIs (you could
have a pair of "standard" channel filters which implement the current
[fconfigure] settings, allowing it to be packed among custom elements).

You could then set up a channel to perform base64 decoding, zlib
decompression, and null translation, allowing you to feed in your raw
base64 data (either with [puts] or by having it attached to a file or
socket), and read off variable-length null-terminated records with a
series of simple [gets] statements.

Now THAT would be something to be proud of.


Fredderic

Pat Thoyts

unread,
May 14, 2008, 5:10:59 AM5/14/08
to
Fredderic <my-nam...@excite.com> writes:

>On Sat, 10 May 2008 19:36:10 GMT,
>Pat Thoyts <cngg...@hfref.fbheprsbetr.arg> wrote:
>
>> You mentioned zlib earlier. I want that in-core too.
>
>Having encode/decode in the [binary] command is all good and well if
>you're processing an entire file, or some such. But what if you wish to
>receive or send a base64 encoded stream (potentially un-ending).

[snip]

chan create can let you do this right now.

Once channel transformations are finished (TIP xxx) then you will be
able to write a channel transform in script and this will give you the
necessary building block for a base64 transform

Donal K. Fellows

unread,
May 14, 2008, 5:57:07 AM5/14/08
to
Pat Thoyts wrote:
> Once channel transformations are finished (TIP xxx) then you will be

TIP #230. http://tip.tcl.tk/230.html

Donal.

Fredderic

unread,
May 17, 2008, 2:38:17 AM5/17/08
to
On Wed, 14 May 2008 09:10:59 GMT,
Pat Thoyts <cngg...@hfref.fbheprsbetr.arg> wrote:

> Fredderic <my-nam...@excite.com> writes:
>>Pat Thoyts <cngg...@hfref.fbheprsbetr.arg> wrote:
>>> You mentioned zlib earlier. I want that in-core too.
>> Having encode/decode in the [binary] command is all good and well if
>> you're processing an entire file, or some such. But what if you
>> wish to receive or send a base64 encoded stream (potentially
>> un-ending).
> [snip]
> chan create can let you do this right now.
> Once channel transformations are finished (TIP xxx) then you will be
> able to write a channel transform in script and this will give you the
> necessary building block for a base64 transform

I had a look at [chan create] a while back, but it didn't seem to do
what I wanted at the time (can't remember exactly what that was,
now, other than it involved having an external command at the drain
end)... And hadn't really looked at it since...

TIP #230, as Donal says... Just having a look at that now, quite nice
indeed. I'm not sure yet why a transformation can be read-only or
write-only... I suppose it's a limitation of the API...?

But, why not implement the [binary encode/decode] stuff as per that
spec, as a transformation handler? [binary encode handler] could
simply invoke the passed transformation handler's init, write and
finalise methods as a unit, and return the result. Not sure how you'd
handle the case where not all of the data was transformable, but that
would be as simple as accepting a string, a variable to append the
result to, and returning the number of bytes actually consumed from the
string.

Likewise re-implement CRLF/NULL transformation, character set encoding,
etc., all as transformations, and [binary encode/decode] will be able
to do the lot. An example of this is for web page handling. You could
build your web page with TCL standard newlines, use [binary encode] to
apply CRLF translation, grab the size of the data for the Content-Length
field, and then toss it all out a binary-mode socket.


Another useful ability would be to have a channel that doesn't go
anywhere (presumably just a script-to-script channel with nothing taking
notice at the receiving end). You write data to the channel, it
performs the appropriate transformations and then simply buffers it up,
allocating decent sized chunks of memory at a time which it holds in a
list. (Instead of continually having to reallocate a single contiguous
piece of memory as the string grows piece by piece.) When you're done,
you can use [chan pending] to check the current size of the buffered
data, or whatever else, and then either serialise the whole buffer into
a string in one sweep (using a simple [read]), or [fcopy] it to another
channel. In the [fcopy] case, if the outgoing socket is pure binary, it
might even be able to get away with simply moving the channels buffers
in their entirety to the outgoing channel, without having to touch the
data at all until it finally gets transmitted.


Fredderic

Donal K. Fellows

unread,
May 17, 2008, 3:30:48 AM5/17/08
to
Fredderic wrote:
> But, why not implement the [binary encode/decode] stuff as per that
> spec, as a transformation handler?

Why require the use of a channel when all you want to do is encode or
decode some base64 data in a string already? A number of use-cases for
this sort of thing are *exactly* that situation (it comes up in
handling images embedded in Tcl scripts, and also when working with
XML InfoSets).

Donal.

0 new messages