Are there INTERCAL obfuscators available?

spartan.the

unread,

Sep 3, 2014, 2:32:50 PM9/3/14

to

In subject.

ais523

unread,

Sep 3, 2014, 9:00:04 PM9/3/14

to

spartan.the wrote:
> Are there INTERCAL obfuscators available?

The most I've seen done towards obfuscating INTERCAL is whitespace
removal; this does make programs harder to read, although it's not that
difficult to automatically reindent INTERCAL i nto a sane syntax again
(perhaps someone should write an indent(1) equivalent for INTERCAL).

However, INTERCAL obfuscators mostly don't exist for the reason that
automatically doing any sort of transformation on INTERCAL source is
difficult. For instance, consider the following program:

DO COME FROM COMMENTS
DO READ OUT #1
DON'T REA
DO UT #2
PLEASE GIVE UP

This program is perfectly legal, and will print I infinitely many
times. However, if you remove all the whitespace, it does something
entirely different (printing I once, then exiting). I know at least one
INTERCAL programmer who is of the opinion that the standard technically
allows whitespace within keywords, in which case the program would be
the same with or without whitespace, but this is a minority view and
one not supported by any compiler I'm aware of.

In general, the issue of delimiting statements (shown by the above
program) is a tricky one. We know from the INTERCAL-72 manual that each
statement begins where the previous statement ends. However, because an
INTERCAL statement can be an effectively arbitrary sequence of
characters, the only way to tell where a statement ends is by looking
to see where the next statement begins. This infinite regress is
reasonably awkward for implementor's; C-INTERCAL's solution is to guess
that one statement transitions to the next at a statement identifier,
but perhaps there's a better approach.

Another possibility for obfuscation would be to automatically rewrite,
split, and combine expressions. For instance, the following relatively
clear sequence of two calculate statements:

DO .1 <- #4
DO .2 <- #5

can be written as one, less clear statement:

DO .1$.2 <- #4$#5

or even more confusingly, via constant-folding:

DO .1$.2 <- #49

(To gain more obfuscation, you would overload a variable to mean
".1$.2" then assign to that variable, which has the advantage that it
also works in CLC-INTERCAL.)

This sort of transformation can be done automatically in simple cases,
but still runs the risk of changing the meaning of the program. The
most obvious problem is if someone does COME FROM CALCULATING, but
there can be more subtle problems related to the speed at which threads
run. (The threading problem, at least, could be partially solved via
use of a no-op statement; the only safe no-op statement to use for the
purpose, incidentally, is DON'T GIVE UP, due to the lack of a GIVING UP
gerund.)

Another possibility is to use a program like yuk to convert INTERCAL
expressions into C, then convert the resulting C back into INTERCAL.
This may or may not make the program clearer. Perhaps it'd be more
useful for deobfuscating. (Another advantage of code like the reverse
assignment shown above is that it confuses C-INTERCAL's optimizer,
making automatic deobfuscation harder.)

--
ais523
C-INTERCAL comaintainer

Claudio Calvelli

unread,

Sep 4, 2014, 3:24:10 AM9/4/14

to

On 2014-09-04, ais523 <ais...@nethack4.org> wrote:
> However, INTERCAL obfuscators mostly don't exist for the reason that
> automatically doing any sort of transformation on INTERCAL source is
> difficult. For instance, consider the following program:

I tried that once... it's possible, but definitely not easy.

> (To gain more obfuscation, you would overload a variable to mean
> ".1$.2" then assign to that variable, which has the advantage that it
> also works in CLC-INTERCAL.)

Yes, I should allow assignment to calculations. We all know it's a
necessary feature which is planned for the next pre-escape, whenever
that happens. However I'd obfuscate by assigning to constants:

DO #1 <- #2
DO $1 <- #1

(this obviously assigns #2 to $2 and undoing the side effect on #1 is
left as an exercise to the reader; Oh, and this probably only works
in CLC-INTERCAL).

> (The threading problem, at least, could be partially solved via
> use of a no-op statement; the only safe no-op statement to use for the
> purpose, incidentally, is DON'T GIVE UP, due to the lack of a GIVING UP
> gerund.)

Very old versions of CLC-INTERCAL allowed that. That was due to a
documented compiler bug and of course all users are urged to upgrade to
a CLC-INTERCAL compiler which is less than 14 years old.

> Another possibility is to use a program like yuk to convert INTERCAL
> expressions into C, then convert the resulting C back into INTERCAL.

Well, that's similar the way I was going to do that all these years ago:
the CLC-INTERCAL compiler doesn't need to produce Perl, it can produce
other languages and the idea was to allow any input language to be used as
output too, possibly after (de)optimising. (There are traces of this all
over the CLC-INTERCAL compiler, the reason the grammars for the parser
look, ahem, unusual is because they are meant to to be operated as code
generators as well as parsers, although the functionality to do that is
simply not there).

One day when I feel like completing the next pre-escape of CLC-INTERCAL
I may look into obfuscators again. And think of the advantage of
operating CLC-INTERCAL in "reverse": in goes Perl (or C, or COBOL),
out goes INTERCAL (or Brainf*ck, or Whitespace).

Claudio Calvelli, author and allegedly maintainer of CLC-INTERCAL.

Claudio Calvelli

unread,

Sep 4, 2014, 3:29:47 AM9/4/14

to

On 2014-09-04, Claudio Calvelli <c.n...@w42.org.invalid> wrote:
> DO #1 <- #2
> DO $1 <- #1

I meant of course ".1" where I said "$1". I blame that on writing Perl
instead of sensible languages like INTERCAL (after all, the CLC-INTERCAL
compiler is sensible by definition, since "sick" was originally the
Sensible Intercal Compiler Kludge - although the manual now refers to a
different meaning for it).

Anyway, maybe it's time that I go back to writing compilers. If only I
could figure out what some of my code does (I'm referring to the
un-escaped CLC-INTERCAL 1.-94.-1, which is about 75% done apparently).

C

spartan.the

unread,

Sep 20, 2014, 2:26:08 PM9/20/14

to

I did not understood what DO UT means.

Claudio Calvelli

unread,

Sep 20, 2014, 2:52:46 PM9/20/14

to

On 2014-09-20, spartan.the <spart...@gmail.com> wrote:
> I did not understood what DO UT means.

You need to read it together with the previous line:

DO REA
DO UT #2

ignoring whitespace this becomes:

DOREADOUT#2

or adding spaces somewhere else for legibility

DO READ OUT #2

however all existing compilers ignore whitespace BETWEEN keywords, but
not INSIDE them, so

DOREADOUT#2

would be legal while

DO REA DO UT #2

would result in a runtime error (if executed).

C

spartan.the

unread,

Sep 20, 2014, 10:36:06 PM9/20/14

to

That's interesting. I'm worried if mixing ignorance of whitespaces within keywords and INTERCAL comments style would lead to undefined behaviour of what compiler is supposed to do?

spartan.the

unread,

Sep 22, 2014, 2:55:57 AM9/22/14

to

> (...) For instance, consider the following program:

>
> DO COME FROM COMMENTS
> DO READ OUT #1
> DON'T REA
> DO UT #2
> PLEASE GIVE UP
>

> This program is perfectly legal (...)

I was already looking at C-INTERCAL 0.29 Revamped Instruction Manual, finding this statement:

Whitespace is generally insignificant in INTERCAL programs; it cannot be added in the middle of a keyword (unless the keyword contains whitespace itself) or inside a decimal number, but it can be added more or less anywhere else, and it can be removed from anywhere in the program as well.

http://catb.org/esr/intercal/ick.htm#Syntax

And then remembered this elegant INTERCAL comment style... Oh, I've got your point here only now.

Definitely it's obvious that any decent INTERCAL obfuscator should add such keyword-alike comments to the code.

Then I though if this "obfuscate with comments" style can be applied to other languages as well. Consider C (or C++); you probably have seen code lines like this:

i = i + 1; // increment i by 1

Smarter scholars write it as:

i++; // increment i by 1

(This is like teasing Douglas Crockford but smarter scholars try to challenge authorities.) So I came with idea how to write NOOP at the same time pretending it's not NOOP:

i = i++; // increment i by 1

That's what I like INTERCAL for. It's opposite to what these agile folks do: nobody has yet learned what they program but budget is over and they go to do another program. INTERCAL scholars spend a time to learn from masters but once they learn it's difficult to forget the lessons.