Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Self-Reproducing Programs (long)

3 views
Skip to first unread message

Terry Reedy

unread,
Dec 10, 1997, 3:00:00 AM12/10/97
to

SHORTEST SELF-REPRODUCING PROGRAMS (SRPs)
FOR THE PYTHON COMPUTER LANGUAGE

Terry J. Reedy, tjr...@udel.edu
1997 Dec 10


*The SSRP Exercise*

In his 1983 Turing Award lecture, Ken Thompson discussed the following
exercise: "write a source program that, when compiled and executed,
will produce as output an exact copy of its source". In other words,
find a solution (as short as possible) to the following fixed-point
equation:
text = execution(text).

To make the challenge specific, we must specify both the execution
function and its domain/range of legal texts. Thompson wrote a 233
lines of about 2000 chars that solved
ascii-text-file = unix-execution(c-complilation&link(ascii-text-file))
This has since been considerably improved upon. Language specific
variations have been discussed at various times in several newsgroups,
including comp.lang.python (c.l.p). This article presents solutions to
six variations of the SRP problem using the Python interpreter as the
execution function.

Note: If you have never encountered the SRP problem before, you might
want to think about it first before reading onward.

*String Value Displays*

We first examine various ways to display a Python string, as
illustrated by the following interactive input and output.

>>>s='a'
>>>print s
a
>>>s, str(s), len(s), len(str(s))
('a', 'a', 1, 1)
>>>`s`, repr(s), len(`s`), len(repr(s))
("'a'", "'a'", 3, 3)

The output of repr() [or its abbreviation ``] looks a bit funny with
the extra quotes added to the string, but this is done so that
evaluating the result returns the original object. In other words,
repr() is a pre-inverse for eval(), so that any string (or None,
number, tuple, list, or dict) is a fixed point of the composed function
eval(repr()).

>>>s==eval(`s`)
1
# whereas eval(s) leads to a "NameError: a" message

Repr() is one key to compactly solving Python SSRP problems.

*Command Line Text File*

First let us solve text.py = batch-python(text.py), which in practice
looks like

prompt> python text.py >text.py

Equality can be verified by displaying both text.py (Unix cat or Dos
type) and the uncaptured output or by capturing to text2.py and
comparing with a file compare program.

The trivial solution is an empty file, which leads to an empty file.
Excluding this, output requires a print command, but attempts like
print 'print "print ... run into problems of quoting and infinite
recursion. We need another key: the Python 'format-string % value'
string construction syntax.

In 'Self-replicating Python' (c.l.p 1996/10/04) Greg Stein proposed a
two-line SRP.

x='x=%s\012print x%%`x`'
print x%`x`

Later the same day, W. Craig Trader replied with the slightly shorter

x='x=%s;print x%%`x`';print x%`x`

I believe this to be minimal for SRP file input. The expression x%`x`
inserts a quoted version of the format string x into the format string
x in place of %s. The %% within x condenses to % to reproduce the
expression x%`x`, while the %% within `x` is inserted intact to
reproduce the %% within x.

Note: On startup, the interpreter may, depending on its configuration,
output to stderr a version line and a copyright line. On shutdown, it
outputs to stderr a final newline, which appears on screen as a blank
line. These do not appear in a stdout-only redirected-output file and
are to be ignored when input and output are compared on the screen.

*Command Line Input String*

Like many programs, command-line python optionally takes its input from
a literal string instead of a file. If 'text' is a SRP for this
version of the problem, we will see

prompt>python -c "text"
text

The file solution given above almost works here also. But there is one
point previously glossed over. A Python print command outputs a
newline unless it is suppressed by a trailing ','. So an SRP.py file
must also end with a newline to compare exactly equal to its output.
However, "text" cannot contain a literal newline, so the output text
must not end with one either. If we try the following variation

x='x=%s;print x%%`x`,';print x%`x`,

the newline is (initially) suppressed, but at the cost of having a
space appended instead (as can be verified by executing the above with
';print 1' appended). If we try to compensate by adding a trailing ' '
to the string literal assigned to x, then the output ends with two
spaces, etcetera. Even if we could solve this problem, it would not
matter: before the interpreter exits, it 'finishes' an 'unfinished'
print line by appending the 'missing' newline. The newline suppression
is only temporary.

The solution is to switch to the file.write() method, which writes
exactly what we tell it to. The following one-line command, written
here on two lines here to fit width limits, is a command line input
string fixed point.

from sys import stdout;
x='from sys import
stdout;x=%s;stdout.write(x%%`x`)';stdout.write(x%`x`)

Slightly shorter is

import sys;f=sys.stdout;
x='import sys;f=sys.stdout;x=%s;f.write(x%%`x`)';f.write(x%`x`)

Note: The discussion ignores the trivial solution of no input, with
again leads to no output. It also assumes that python gets "text"
unchanged. In later versions of MS-DOS, '%identifier%' and '%%' on a
command line act much as '%(identified)s' and '%%' in a Python format
string. The former gets replaced with the environmental value of
'identifier' if one exists or with nothing if not. The latter reduces
to '%'. On such systems, each command-line text above would need each
% doubled.

*Interactive Text Input*

The solutions above all work interactively, but they are not minimal.
In its interactive mode read-eval-print loop, python echoes the values
of expressions without requiring a print statement. Condensing the
file SRP by deleting print and adjusting the remainer gives

>>>x='x=%s;x%%`x`';x%`x`
"x='x=%s;x%%`x`';x%`x`"

Do the surrounding double quotes on the output display matter? For the
moment, it's a moot point. Single-quoted strings and other non-string
literals echo as themselves. As pointed out by Andrew C. Greenberg, in
a response on 1996/10/05, the shortest interactive text input
'programs' are the minimal expressions of each type: '', (), [], {},
and the digits 0 - 9.

What about "the shortest one of all, the empty string!". Does it
quality as the interactive SSRP? No, but for a subtle technical
reason. All interactive inputs must be terminated by a newline, so
the shortest possible input is a newline, not nothing. By choice of
Python's designer, blank line inputs are not echoed. Consider the
following sequence:

>>>''
''
>>>0
0
>>>
>>>_ <= the cursor waiting for more input

There is no blank output line after the blank input line.

Note: Consistency would suggest that there should be, but leaving it
out economizes screen space. For the same reason, a calculated value
of 'None' is also not echoed.

*Interactive String Command*

Let us redefine the execution function and its domain-range so that we
compare the input string to the value calculated rather than to the
text displayed. This rule requires that the calculated value be a
string. It also eliminates all string literals since, after stripping
quotes, their values are at least two characters shorter than the
input. Consequently, we need a program ending with a string
expression. I believe the shortest possible to be the one already
given:

x='x=%s;x%%`x`';x%`x`

With our current rules, this is a SRP since the surrounding double
quotes that appear on display are not part of the calculated string
value.

Note: One theme of this essay is that solutions depend on the exact
specification of a problem. For text input and output, quotation and
termination are two issues needing careful consideration. At least for
Python, the concept 'self-reproducing program' is fuzzier than it might
at first seem.

*Interactive String Expression*

The SSRP above is an assignment statement followed by a string
expression. Let us prohibit explicit assignment and seek a single
self-reproducing string expression. In Python terms, we seek a
solution to 's=eval(s)'. We can test proposed solutions with
's==eval(s)'. This specification excludes string literals. As
mentioned in the second section, they solve a different equation:
s=eval(repr(s)), and generally crash eval(s).

Dr. Dobb's Journal, 1991 September, page 18, quotes the following Lisp
solution (reformatted for clarity) from John McCarthy and Carolyn
Talcott:

( (lambda (x) (list x (list (quote quote) x)))
(quote (lambda (x) (list x (list (quote quote) x))) ) )

This elegant program/expression is a list with structure (X (quote X)).
Its two members are a function defined in place and an argument
consisting of a quoted version of the function definition. The
function creates a new list consisting of its argument and a quoted
version thereof. This results in the original list. (For visual
equality, the input text needs the original spacing and line breaks
that match the output format of a particular Lisp interpreter.)

Python's use of commas to separate tuple/list members prohibits direct
translation; we can either write ((lambda definition)(argument)) or
((mem1),(mem2)) but not both at once. We can, however, rewrite the
lisp expression to calculate a string. After fiddling around with
single and double quotes and backslashes and getting some near misses,
I stumbled on `` and came up with a Python solution with structure
X('X'):

(lambda s: s+"("+`s`+")")('(lambda s: s+"("+`s`+")")')

When the resulting string is displayed, it is surrounded by single
quotes and has backslashes inserted before the interior 's. But, by the
rules of this section, these do not count. Success is more easily
verified by 'print'ing the expression, to avoid the added quotation, or
by
>>> s='''(lambda s: s+"("+`s`+")")('(lambda s: s+"("+`s`+")")')'''
>>> s==eval(s)
1

Using % gives us a shorter and less symmetrical SRP expression that I
suspect is minimal. Its structure is X('X(%s)'), except that X within
the quotes has %% instead of %.

(lambda x:x%`x`)('(lambda x:x%%`x`)(%s)') #

A variation of the current quest is to seek a single, self-reproducing,
'print expression' command. The print-equivalents of the above
expressions, which work in batch mode also, are

print(lambda s:s+"("+`s`+")")('print(lambda s:s+"("+`s`+")")')
print(lambda x:x%`x`)('print(lambda x:x%%`x`)(%s)')

*Educational Value*

Thompsom compared the writing of SRPs to the running of three-legged
races. I think this unnecessarily trivialized the problem. The
concept of fixed points is extremely important in mathematics. Finding
them, if there are any, is part of understanding any function or
process.

The concept also applies to programs. A sort is stable if each subset
of duplicate keys is fixed within itself. Many algorithms are expected
to map all input to a subset of fixed points. For instance, if just()
hyphenates and justifies a paragraph and y = just(x), we would like y =
just(y) also. On the other hand, if y equals both compress(x) and
compress(y) and x!=y, then expand(y) is ambiguous.

In any case, writing and understanding SRPs for a particular language
requires carefull attention to several of its features. This exercise
might be for programming courses, with instructors giving appropriate
hints.


0 new messages