Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Support 🐰 (U+1F430) as ". combination character

28 views
Skip to first unread message

clark.m...@gmail.com

unread,
Dec 28, 2018, 4:07:41 PM12/28/18
to
The Intercal manual states:

> Note: In the interests of simplifying the sometimes overly-complex form of expressions, INTERCAL allows a spark-spot combination ('.) to be replaced with a wow (!). Thus '.1~.2' is equivalent to !1~.2', and 'V.1$.2' is equivalent to "V!1$.2'".

> Combining a rabbit-ears with a spot to form a rabbit is not permitted, although the programmer is free to use it should he find an EBCDIC reader which will properly translate a 12-3-7-8 punch.

I feel that now that Unicode has finally added a rabbit symbol, we would be remiss to fail to support 🐰 (U+1F430) to simplify ". when found in our source.

ais523

unread,
Dec 28, 2018, 4:52:49 PM12/28/18
to
Good idea! I decided to implement this; here's the patch I used:

=== CUT HERE ===
---
src/lexer.l | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/src/lexer.l b/src/lexer.l
index d1ade32..91da32e 100644
--- a/src/lexer.l
+++ b/src/lexer.l
@@ -106,19 +106,25 @@ void yyerror(const char *errtype);
* generated by lex(1) and does normal tokenizing.
*/

+static bool bangflag = false;
+
#undef getc
int getc(FILE *fp)
{
extern FILE* yyin;

- static bool bangflag = false;
static bool backflag = false;
-
static bool eolflag = false;
+ static bool bangcheck = false; /* AIS: for lexer-based bang checks */

if ((size_t)(lineptr - linebuf) > sizeof linebuf)
ick_lose(IE666, iyylineno, (char *)NULL);

+ if (bangcheck)
+ {
+ bangcheck = false;
+ return 0;
+ }
if (bangflag)
{
bangflag = false;
@@ -160,13 +166,14 @@ int getc(FILE *fp)

eolflag = false;

- if (c == '!')
+ if (c == '!' || c == 0xB0)
{
- *lineptr++ = '!';
- bangflag = true;
- return(c = '\'');
+ /* AIS: A potential bang/rabbit; insert a NUL character after it to
+ give the lexer time to set the bangflag */
+ bangcheck = true;
}
- else if (c == '\b') /* convert ctrl-H (backspace) to
+
+ if (c == '\b') /* convert ctrl-H (backspace) to
two chars "^" and "H" so lex can take it */
{
*lineptr++ = '\b';
@@ -226,6 +233,7 @@ I [A-Z]

%%

+\x00 ;
{D} {yylval.numval = myatoi(yytext); return(NUMBER);}
\_ {return(NOSPOT);}
\. {return(ick_ONESPOT);}
@@ -292,9 +300,17 @@ V {yylval.numval = OR; return(UNARY);}
CLOSE\(SPARK\|EARS\),
and CLEARSPARKEARSTACK */
return(temp?OPENSPARK:CLOSESPARK);}
+\! {char temp = sparkearsstack[sparkearslev/32]&1;
+ STACKSPARKEARS(0);
+ bangflag=1; /* AIS: New bangflag handling */
+ return(temp?OPENSPARK:CLOSESPARK);}
\" {char temp = sparkearsstack[sparkearslev/32]&1;
STACKSPARKEARS(1);
return(temp?CLOSEEARS:OPENEARS);}
+\xF0\x9F\x90\xB0 {char temp = sparkearsstack[sparkearslev/32]&1;
+ STACKSPARKEARS(1);
+ bangflag=1; /* AIS: The rabbit case is new */
+ return(temp?CLOSEEARS:OPENEARS);}

\({W}{D}\) {SETLINENO; yylval.numval = myatoi(yytext); return(LABEL);}
=== CUT HERE ===

It's pretty hacky, but then so was the existing patch for this, and I
wanted to get something working.

--
ais523

ais523

unread,
Dec 28, 2018, 4:54:32 PM12/28/18
to
ais523 wrote:
> clark.m...@gmail.com wrote:
>> The Intercal manual states:
>>
>>> Note: In the interests of simplifying the sometimes overly-complex
>>> form of expressions, INTERCAL allows a spark-spot combination ('.) to
>>> be replaced with a wow (!). Thus '.1~.2' is equivalent to !1~.2', and
>>> 'V.1$.2' is equivalent to "V!1$.2'".
>>
>>> Combining a rabbit-ears with a spot to form a rabbit is not permitted,
>>> although the programmer is free to use it should he find an EBCDIC
>>> reader which will properly translate a 12-3-7-8 punch.
>>
>>> I feel that now that Unicode has finally added a rabbit symbol, we
>>> would be remiss to fail to support 🐰 (U+1F430) to simplify ". when
>>> found in our source.
>
> Good idea! I decided to implement this; here's the patch I used:

In case this somehow ends up in a released version of C-INTERCAL, what
name should I use when crediting you for the inspiration behind this
change?

--
ais523

spartan.the

unread,
Feb 4, 2019, 6:54:25 PM2/4/19
to
I think this is a great moment in the history of INTERCAL. The modern wind of change. Carelessly ignoring complexities of UTF-8 and UTF-16 with all these BOM headers and stuff like that. That costs megabytes of sofware added into operating systems. (1.44 floppies are gone).

I think we need some kind of INTERCAL standard commitee to continue. The web site + discussions group to discuss the inovations. Plus the matrix of supported features by different compilerz, similar to what we can see about browsers and their versions. For example, it may take a bit more time for Mr. Claudio to adjust his compiler to support that unicode rabbit.

ais523

unread,
Feb 4, 2019, 7:25:34 PM2/4/19
to
spartan.the wrote:
> I think this is a great moment in the history of INTERCAL. The modern
> wind of change. Carelessly ignoring complexities of UTF-8 and UTF-16
> with all these BOM headers and stuff like that. That costs megabytes
> of sofware added into operating systems. (1.44 floppies are gone).

Well, C-INTERCAL is happy to let you mix encodings in a single file (you
can use mixed Latin-1 and UTF-8 if you wish; IIRC there are no
collisions within characters C-INTERCAL actually uses, and other
characters don't need to be decoded as they're only ever output
literally). I'm not sure whether that counts as ignoring complexities or
adding new ones.

That said, Latin-1 doesn't have a rabbit, so you have to use the UTF-8
version.

--
ais523

spartan.the

unread,
Feb 7, 2019, 7:13:01 AM2/7/19
to
ais523 wrote:
> Well, C-INTERCAL is happy to let you mix encodings in a single file (you
> can use mixed Latin-1 and UTF-8 if you wish; IIRC there are no
> collisions within characters C-INTERCAL actually uses, and other
> characters don't need to be decoded as they're only ever output
> literally). I'm not sure whether that counts as ignoring complexities or
> adding new ones.
>
> That said, Latin-1 doesn't have a rabbit, so you have to use the UTF-8
> version.

I'm glad to see C-INTERCAL becoming "industry standard" INTERCAL compiler.

clark.m...@gmail.com

unread,
Feb 17, 2019, 11:09:44 PM2/17/19
to
El viernes, 28 de diciembre de 2018, 14:54:32 (UTC-7), ais523 escribió:
> In case this somehow ends up in a released version of C-INTERCAL, what
> name should I use when crediting you for the inspiration behind this
> change?

Can just credit me as "Michael Clark" or as "Iiridayn" - (i i r i - no L in there). I mostly go by the latter online, when I don't care to avail myself of the pseudo-anonymity of my legal name.

ais523

unread,
Feb 17, 2019, 11:44:08 PM2/17/19
to
OK. The change is now online in my C-INTERCAL git repository at
<http://nethack4.org/media/intercal.git/> (along with some
documentation for the change). If and when the next version is released,
your suggestion will be implemented in it.

--
ais523
0 new messages