On 2013-03-14, ais523 wrote:
> Also, always doing things the opposite of the intuitive way is something
> I'd warn you against; it's approaching on dangerous consistency. You
> should feel free to mix it up a little. In this case, I note that the
> shift codes have to strictly alternate between letters and figures in
> your system, which seems like an awkward inefficiency. I suggest you
> just use the same shift code to alternate between lengths and
> characters, freeing up the other shift code for something entirely
> different.
That's a good point. But it's not really inefficient as it allows to
convert a sequence even if the start is truncated by just skipping until
you find a length block; if you use just one shift code you can't do
that as you don't know what is length and what is data - I could make
sure that you can tell a length block from a data block from the
contents, but I suspect this would be more inefficient than using both
shift codes.
Also, you can still use sequences of shift codes to do something
completely different, just like CLC-INTERCAL uses them to introduce
lowercase letters and symbols not present in Baudot. I just have to
state that the meaning of empty blocks is undefined and leave that for
future extensions.
Alternatively I can use a different approach, interleaving the data and
length blocks, occasionally, introducing a shift code which means "start
of a new interleaved block". For example, a sequence of three symbols,
with lengths 3, 4 and 1 (call it A1,A2,A3,B1,B2,B3,B4,C1) could be
represented as:
length=3,A1,length=4,A2,length=1,A3,length=0,B1,B2,B3,B4,C1
where length=X means the appropriate Baudot symbol to indicate that
length (to be specified) and length=0 shifts from interleaved blocks to
just data block until the end of the block; or it could be:
length=3,A1,length=0,A2,A3,shift,length=4,B1,length=1,B2,length=0,B3,B4,C1
where the "shift" separates the two blocks. This also leaves the other
shift code for something completely different. Also, the encoding can
be made more efficient by introducing "double length" codes, which
specify two lengths in a single symbol: whether these can be used
depends on your data (there are 30 possible values and the maximum
length is 7 Baudot characters so length=0 to length=7 represent single
lengths, and length=8 to length=27 could represent two lengths in the
range 1 to 4 and 0 to 4 respectively; the remaining 2 Baudot characters
could also be used for something completely different). So the above
sequence could also be represented as one of (writing length=X:Y to
represent double lengths):
length=3:4,A1,length=1:0,A2,A3,B1,B2,B3,B4,C1
length=3:4,A1,A2,A3,B1,B2,B3,B4,shift,length=1:0,C1
length=3:4,A1,A2,A3,B1,B2,B3,B4,shift,length=1,C1,length=0
etc.
This seems very intuitive to me, so I'm not doing the opposite of
intuitive :-) I hope nobody copies it to make it a standard.
I also just thought of an advantage of using this sort of encoding,
where you have several possible representation of a sequence: you could
hide a different sequence in it like you can hide a whitespace program
in another program. How to do that is left as an exercise to the
reader.
C