Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

The purpose of Lisp syntax to model AST

560 views
Skip to first unread message

Rickert

unread,
Jul 4, 2012, 2:38:06 AM7/4/12
to
Lisp syntax represents AST as far as I know, but in high level format to allow human to easily read and modify, at the same time make it easy for the machine to process the source code as well.

For this reason, in Lisp, it is said that code is data and data is code, since code (s-epxression) is just AST, in essence. We can plug in more ASTs (which is our data, which is just lisp code) into other ASTs (lisp code) or independently to extend its functionality and manipulate it on the fly (runtime) without having to recompile the whole OS to integrate new code.In other languages, we have to recompile from to turn the human-language source code into valid AST before it is compiled into code.

Is this the reason for Lisp syntax to be designed like it is (represent an AST but is human readable, to satisfy both human and the machine) in the first place? To enable stronger (on the fly - runtime) as well as simpler (no recompile, faster) communication between man-machine?

I heard that the Lisp machine only has a single address space which holds all data. In operating system like Linux, the programmers only have virtual address space and pretend it to be the real physical address space and can do whatever they want. Data and code in Linux are separated regions, because effectively, data is data and data is code. In normal OS written in C (or C like language), it would be very messy if we only operate a single address space for the whole system(physical memory) and mixing data with code would be very messy.

In Lisp Machine, since code is data and data is code, is this the reason for this to have only a single address space (without the virtual layer)? Since we have GC and no pointer, should it be safe to operate on physical memory without breaking it (since having only 1 single space is a lot less complicated)?

RG

unread,
Jul 4, 2012, 6:05:43 AM7/4/12
to
In article <a1454336-49ee-49e2...@googlegroups.com>,
Rickert <solidi...@gmail.com> wrote:

> Lisp syntax represents AST as far as I know, but in high level format to
> allow human to easily read and modify, at the same time make it easy for the
> machine to process the source code as well.

No. Lisp syntax is (merely) a *serialization* format, like JSON. It is
neither "high level" nor "low level". Those concepts do not really
apply to serialization formats.

> For this reason, in Lisp, it is said that code is data and data is code,
> since code (s-epxression) is just AST, in essence.

Again no. In Lisp, the semantics of the language are defined directly
on the underlying data structures themselves, not on the serialization
as in other programming languages.

All languages have a syntax which is a serialization of an underlying
data structure. The difference is that in Lisp the underlying data
structure is directly and intentionally accessible to the user whereas
in other languages the underlying data structure is hidden and
accessible only to the compiler.

> I heard that the Lisp machine only has a single address space which holds all
> data. In operating system like Linux, the programmers only have virtual
> address space and pretend it to be the real physical address space and can do
> whatever they want. Data and code in Linux are separated regions, because
> effectively, data is data and data is code. In normal OS written in C (or C
> like language), it would be very messy if we only operate a single address
> space for the whole system(physical memory) and mixing data with code would
> be very messy.

Again no. Lisp is no different from C in this regard. Data structures
are just much easier to implement if the underlying representations
share an address space. It is possible in any language to implement
data structures that span multiple address spaces, but it's not easy and
very rarely done.

rg

Pascal J. Bourguignon

unread,
Jul 4, 2012, 6:56:59 AM7/4/12
to
RG <rNOS...@flownet.com> writes:

> In article <a1454336-49ee-49e2...@googlegroups.com>,
> Rickert <solidi...@gmail.com> wrote:
>
>> Lisp syntax represents AST as far as I know, but in high level format to
>> allow human to easily read and modify, at the same time make it easy for the
>> machine to process the source code as well.
>
> No. Lisp syntax is (merely) a *serialization* format, like JSON. It is
> neither "high level" nor "low level". Those concepts do not really
> apply to serialization formats.

What difference do you make between "X represents Y" and "X is a
serialization format for Y"?


Otherwise, indeed, the terminology for syntax trees is not "high level"
or "low level", but "abstract" and "concrete". Abstract Syntax Tree is
the purified tree, which is what we usually have with lisp forms, while
the Concrete Syntax Tree has leaves to represent spurious characters.


if condexpr then thenexpr else elseexpr fi -- concrete syntax
| | | | | | |
\ expr | expr | expr /
\ | | | | | / -- CST
-----------------------------------
ifexpr



(if condexpr thenexpr elsexpr) -- abstract syntax

which represents the tree:

condexpr thenexpr elseexpr
\ | /
\ | /
--------------- -- AST
if



>> For this reason, in Lisp, it is said that code is data and data is code,
>> since code (s-epxression) is just AST, in essence.
>
> Again no. In Lisp, the semantics of the language are defined directly
> on the underlying data structures themselves, not on the serialization
> as in other programming languages.
>
> All languages have a syntax which is a serialization of an underlying
> data structure. The difference is that in Lisp the underlying data
> structure is directly and intentionally accessible to the user whereas
> in other languages the underlying data structure is hidden and
> accessible only to the compiler.

The important word here is "directly". For S-expression, there's a
simple and direct isomorphism between the textual representation and the
data structure represented. No other language has this direct
isomorphism. (Not even the CL:LOOP language, which is why it's so much
critisized, and of course, this includes the more sophisticated reader
macros that implement other concrete syntaxes).


This is because of this isomorphism that we are entitled to (and can
easily in general) ignore the difference between the textual
representation and the data structure of S-expressions, and just talk
about S-expressions.

If I say that I have a list (a b c), it is obvious to all that I don't
have a sequence of characters "(a b c)", but a chain of cons cells
referencing symbols!




>> I heard that the Lisp machine only has a single address space which holds all
>> data. In operating system like Linux, the programmers only have virtual
>> address space and pretend it to be the real physical address space and can do
>> whatever they want. Data and code in Linux are separated regions, because
>> effectively, data is data and data is code. In normal OS written in C (or C
>> like language), it would be very messy if we only operate a single address
>> space for the whole system(physical memory) and mixing data with code would
>> be very messy.
>
> Again no. Lisp is no different from C in this regard. Data structures
> are just much easier to implement if the underlying representations
> share an address space. It is possible in any language to implement
> data structures that span multiple address spaces, but it's not easy and
> very rarely done.

That wasn't the point.

But indeed, it's not strictly speaking relative to the language.
AFAIK, in lisp machines, there were lisp primitives that allowed to
compute random locations in memory. And on the other hand, AFAIK, there
is nothing in the C standard that allows you to compute random locations
in memory.

That is, at the language standard level, the memory is as protected in
ANSI C as it is in ANSI Common Lisp.

But at the implementation level, there are ways to obtain pointers to
random locations in memory from a random integer.

Now if you use an implementation that is more restrictive in this
respect, (eg. in C an implementation could signal an error everytime you
cast an integer to a pointer that is not an integer that was obtained
from casting a pointer (of the same type) to that integer, which is all
that the ANSI C language allows IIRC), then you can indeed have complex
systems running in the same address space without encountering often the
problem you get when you don't have this control.


Or, said otherwise, the protection that is provided by the hardware
(MMU) and the OS can also be provided by a bug-free compiler, with the
restriction that all programs running be compiled with that compiler,
and that no uncontrolled memory accesses be allowed (at least in user
code).


--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.

Stefan Mandl

unread,
Jul 4, 2012, 2:40:31 PM7/4/12
to
> [...]
> The important word here is "directly". For S-expression, there's a
> simple and direct isomorphism between the textual representation and the
> data structure represented. No other language has this direct
> isomorphism.

You are forgetting about Prolog which is very similar to Lisp regarding
this matter ;-)

> [...]

-- Stefan

RG

unread,
Jul 5, 2012, 12:47:17 AM7/5/12
to
In article <87pq8bg...@kuiper.lan.informatimago.com>,
"Pascal J. Bourguignon" <p...@informatimago.com> wrote:

> RG <rNOS...@flownet.com> writes:
>
> > In article <a1454336-49ee-49e2...@googlegroups.com>,
> > Rickert <solidi...@gmail.com> wrote:
> >
> >> Lisp syntax represents AST as far as I know, but in high level format to
> >> allow human to easily read and modify, at the same time make it easy for
> >> the
> >> machine to process the source code as well.
> >
> > No. Lisp syntax is (merely) a *serialization* format, like JSON. It is
> > neither "high level" nor "low level". Those concepts do not really
> > apply to serialization formats.
>
> What difference do you make between "X represents Y" and "X is a
> serialization format for Y"?

Serialization is a very particular kind of representation. All
serializations are representations, but not all representations are
serializations.

> If I say that I have a list (a b c), it is obvious to all that I don't
> have a sequence of characters "(a b c)", but a chain of cons cells
> referencing symbols!

Actually, that's not obvious. It is only because of a fairly obscure
(to non-Lisp programmers, which is most programmers) syntactic
convention, namely, that (a b c) is a shorthand for (a . (b . (c .
nil))), that this is so. It's only "obvious" once you internalize this
convention. To illustrate the non-obviousness, consider the character
sequence "#(a b c)". Is should be obvious that what this represents is
not obvious. :-)

rg

Giorgos Keramidas

unread,
Jul 5, 2012, 5:15:23 AM7/5/12
to
It's also conceivable that a Lisp implementor may opt to store
'constant' data in a read-only memory segment, so that e.g. when someone
tries to modify the symbol bound by a defconst form, the operating
system will trap this and handle it as an error. It's not something
I've seen done until now, but there's no reason why it wouldn't be
possible; like what some C implementations may do for what C calls
'string constants'.

Duane Rettig

unread,
Jul 6, 2012, 11:24:25 AM7/6/12
to
On Jul 5, 2:15 am, Giorgos Keramidas <keram...@ceid.upatras.gr> wrote:
> On Wed, 04 Jul 2012 03:05:43 -0700, RG <rNOSPA...@flownet.com> wrote:
> > In article <a1454336-49ee-49e2...@googlegroups.com>,
Kinda like this?

International Allegro CL [master]
8.2 [Linux (x86)] (Jun 15, 2012 10:17)
Copyright (C) 1985-2010, Franz Inc., Oakland, CA, USA. All Rights
Reserved.

This development copy of Allegro CL is licensed to:
Franz Inc. Staff

;; Optimization settings: safety 1, space 1, speed 1, debug 2.
;; For a complete description of all compiler switches given the
;; current optimization settings evaluate (EXPLAIN-COMPILER-SETTINGS).
CL-USER(1): (make-package :foo :use nil)
#<The FOO package>
CL-USER(2): (in-package :foo)
#<The FOO package>
FOO(3): (cl:eq 'list 'cl:list)
COMMON-LISP:NIL
FOO(4): (cl:setf (cl:schar (cl:symbol-name 'list) 0) #\F)
Error: Attempt to store into purespace address #x1d4cc38.
[condition type: PURESPACE-WRITE-ERROR]

Restart actions (select using :continue):
0: Return to Top Level (an "abort" restart).
1: Abort entirely from this (lisp) process.
[1] FOO(5):

String data is easy to make read-only. It is when you start to deal
with objects that have pointers to other objects that the prospect
becomes less appealing, because the same mechanism which keeps the
user from accidentally modifying the constant also makes it harder for
the system to manipulate, e.g. for relocation at startup time.

Duane

Barry Margolin

unread,
Jul 6, 2012, 1:41:26 PM7/6/12
to
In article
<ea3b90e3-ca73-4dd8...@f8g2000pbf.googlegroups.com>,
That's not usually too much of a handicap, since the system can change
the read-only status of a memory page. So during startup it can be
read-write, and then it makes it read-only when it's done initializing.

I remember back when I used to use Lisp Machines, one of our
applications had a bug that inadvertently modified the value cell of
NIL. After we reported this to Symbolics, they moved T and NIL to
read-only regions.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

Giorgos Keramidas

unread,
Jul 7, 2012, 4:49:11 PM7/7/12
to
On Fri, 6 Jul 2012 08:24:25 -0700 (PDT), Duane Rettig <d...@franz.com> wrote:
>On Jul 5, 2:15 am, Giorgos Keramidas <keram...@ceid.upatras.gr> wrote:
>> It's also conceivable that a Lisp implementor may opt to store
>> 'constant' data in a read-only memory segment, so that e.g. when someone
>> tries to modify the symbol bound by a defconst form, the operating
>> system will trap this and handle it as an error.  It's not something
>> I've seen done until now, but there's no reason why it wouldn't be
>> possible; like what some C implementations may do for what C calls
>> 'string constants'.
>
> Kinda like this?
>
> International Allegro CL [master]
> 8.2 [Linux (x86)] (Jun 15, 2012 10:17)
...
> CL-USER(1): (make-package :foo :use nil)
> #<The FOO package>
> CL-USER(2): (in-package :foo)
> #<The FOO package>
> FOO(3): (cl:eq 'list 'cl:list)
> COMMON-LISP:NIL
> FOO(4): (cl:setf (cl:schar (cl:symbol-name 'list) 0) #\F)
> Error: Attempt to store into purespace address #x1d4cc38.
> [condition type: PURESPACE-WRITE-ERROR]
>
> Restart actions (select using :continue):
> 0: Return to Top Level (an "abort" restart).
> 1: Abort entirely from this (lisp) process.
> [1] FOO(5):

Yep, exactly like this :-)

> String data is easy to make read-only. It is when you start to deal
> with objects that have pointers to other objects that the prospect
> becomes less appealing, because the same mechanism which keeps the
> user from accidentally modifying the constant also makes it harder for
> the system to manipulate, e.g. for relocation at startup time.

That's true. I can think of hacks like:

(let ((*read-only-pages-are-relocatable* t))
(relocate 'foo 'bar))

but this is not going to address _any_ problem with objects that contain
pointers to other objects, or what the garbage collector may have to do
if it has to shuffle stuff around to make space.

That's ok though, since the original poster wondered if Lisp *requires*
a single address space, and I think that has been conclusively answered
by the Allegro CL trace above.

evrim

unread,
Jul 11, 2012, 2:46:24 AM7/11/12
to

Rickert

unread,
Jul 11, 2012, 11:42:20 PM7/11/12
to
Thanks for all the answer so far. I ask this because it is said that one of the advantage of Lisp is single address space:

http://linuxfinances.info/info/lisposes.html
>
>A safe language means a reliable environment without the need to separate tasks out into their own separate memory spaces.
>
>The "clearly separated process" model characteristic of Unix has potent merits when dealing with software that might be unreliable to the point of being unsafe, as is the case with code written in C or C++ , where an invalid pointer access can "take down the system." MS-DOS and its heirs are very unreliable in that sense, where just about any program bug can take the whole system down; "Blue Screen of Death" and the likes.
>
>If the whole system is constructed and coded in Lisp, the system is as reliable as the Lisp environment. Typically this is quite safe, as once you get to the standards-compliant layers, they are quite reliable, and don't offer direct pointer access that would allow the system to self-destruct.

http://www.loper-os.org/?p=231
>Volatile storage devices (i.e. RAM) shall serve exclusively as read/write cache for non-volatile storage devices. From the perspective of all software except for the operating system, the machine must present a single address space which can be considered non-volatile. No computer system obeys this law which takes longer to fully recover its state from a disruption of its power source than an electric lamp would.

Single address space, as it is stated, holds all the running processes in the same memory space. I am just curious why people insist that single address space is better. With virtual memory, the environment is safer because it is managed by the OS. Processes can't violate the memory space of each other, and programmers won't have to manage physical memory directly to make their apps not clash with other memory regions. Further more, with virtual memory, we can have more memory than actual RAM.

I relate it to the AST like syntax of Lisp, to try to explain how it fits the single space model. As many of you said, Lisp syntax is a serialization format in the form of AST, its code can be treated as textual data and later be evaluated to comprehend the meanings behind the symbolic data, and the textual data can turn into running code. This is what I think. Since code is data and data is code, there should be distinct segments between data and code in memory (like in Linux and C), but a single segment for code and data. Since code and data do not live in separated and fixed size segments, it should be easier to extend the functionalities ,for example, of a running process without having to recompile the whole program to reallocate static memory segments for both code and data.
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages