Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Modified UTF8 encoding for APL input.

57 views

Skip to first unread message

luserdroog

unread,

Jul 19, 2016, 12:52:13 AM7/19/16

I've been slowly working on the external interfacing of my
interpreter so that it can take input other than the keyboard.
(Have not addressed alternate output yet, so it still assumes
a vt220-compatible terminal like xterm.)

I partially described the idea in this thread where I worried
about the finer points of UTF8 encoding, tangentially related
to my C code which as also reviewed in that group some time ago:
https://groups.google.com/d/topic/comp.lang.c/JtcjL5a4kmk/discussion

Olmec will accept UTF8-encoded input (once I finish implementing
this stuff). But it will also hijack the unused "control block"
of the first byte which corresponds to the 10xxxxxx prefix
reserved for continued bytes. So 6 bits=64 APL characters can
have a 1-byte encoding (for the golfers).

So the new idea I have not yet shared is this: take that set
of 64 APL characters and expose it as a Quad-variable, a
system variable. Then a program could dynamically alter its
encoding if it heavily used some other character that I didn't
choose for the default set of 64.

--
for the children

luserdroog

unread,

Jul 19, 2016, 1:20:33 AM7/19/16

Oops. sent too soon.

I did of course study the prior art helpfully collected by Adam:
http://meta.codegolf.stackexchange.com/questions/9428/when-can-apl-characters-be-counted-as-1-byte-each

But none of them work the way I want: with a single-byte
encoding that overlays nicely with utf8.

0 new messages