At the heart of Lisp and Prolog

Pete Abel

unread,

Dec 19, 2001, 2:28:33 PM12/19/01

to

Background:
Im interested in the way knowledge can be represented and processed in
computers; from a programming point of view.

Question 1:
What is the "exact" model used to represent "lists" internally (in
memory) in Lisp? Is it, for example, a simple straightforward "tree"
data structure?

Question 2:
What is the exact model used to represent FACTS internally (in memory)
in Prolog? Is it simply a "tree" data structure?

I'd appreciate "as many" details as possible.
(Are there any detailed resources on this subject on the web?)

Your help is most appreciated. Thank you.
Pete

Wouter Van Santvliet

unread,

Dec 19, 2001, 8:28:43 PM12/19/01

to

A Lisp implementation typically uses a single linked list data type. That
means that the basic building blocks for the representation of lists are
pairs. The first element in the pair (car) holds an element of the list. The
second element in the pair (cdr) holds a reference (pointer) to the
remainder of the list. (or at least the first pair of the remainder of the
list) The end of a list is represented by nil, a special value indicating
that further dereferencing is not permitted. The car and cdr primitives
allow one to access both elements of such a pair. The cons primitive creates
such a pair, and initializes it with its arguments. The car of such a pair
can be a list itself, for representing nested lists. For a so called "dotted
pair", the cdr does not hold a list, but a non-list atom. (notation: (a .
b)) A schematic example might help here. You will find it in most Lisp
text-books.

Note that a "program" or function in Lisp is also represented as a list. An
expression like a+b would typically be represented by a list (+ a b). This
eliminated the barrier between code and the data it manipulates, and is a
common feature in AI languages.

Prolog facts are represented as terms. Implementations vary as far as memory
management is concerned. Basically, a term is stored in a consecutive series
of elements in memory (an array), where the first element of the array holds
the functor of the term, and the reaminder of the array holds the arguments
of the term, in the correct order. The term f(a,b) for example is
represented by an array |f|a|b|. An argument of a term can be (a reference
to) another term. Prolog engines allocate memory for term representations on
a stack or a heap. The memory management of prolog is too complex to
describe in a couple of lines. Text-books will often ignore the
representation of terms in Prolog.

The same remark as for Lisp applies: prolog programs (clauses and facts) are
represented as terms too, eliminating the barrier between code and data. A
clause p :- q for example is an infix notation for the term ':-'(p,q).

Prolog supports a list data type, but this is "emulated" by means of terms
with a dot (.) as a functor. The list [a,b,c] for example is represented by
the term .(a,.(b,.(c,[]))).

In fact the list and term data types as supported by Lisp and Prolog
respectively are equivalent in terms of what they can represent. The prolog
list data type shows that for any list there is an equivalent representation
using terms. The Lisp representation of terms (a+b basically is a term with
funcor +, using infix notation) illustrates that terms can be represented
using lists.

Conceptually, both nested lists and nested terms can be viewed as trees
also. (and both nested lists or terms can be used to represent tree
structures) The physical representation used by implementations of both
languages however is as explained before.

"Pete Abel" <pete...@hotmail.com> wrote in message
news:b119f458.01121...@posting.google.com...

Bart Demoen

unread,

Dec 20, 2001, 4:57:23 AM12/20/01

to

I am not quite happy with what

Wouter Van Santvliet wrote:

> Prolog facts are represented as terms.

...

> The same remark as for Lisp applies: prolog programs (clauses and facts) are
> represented as terms too, eliminating the barrier between code and data. A
> clause p :- q for example is an infix notation for the term ':-'(p,q).

Prolog facts (and rules) (can) have more than one representation: the user sees them in
sources files as terms. Whether internally they have the same representation as runtime
generated terms (like in X =.. [Y,A,B]) ... most often not: static Prolog facts (and rules) are
in most systems represented as virtual machine code (this is for emulators) or as machine
code (some native code systems like SICStus Prolog on Sparc or GNU-Prolog). Dynamic
facts and rules ... it depends on the system: some like to keep them as Prolog terms (not
on the heap usually) and have a fast assert, but a slower execution of these dynamic code;
others like to compile up to a certain point on the fly (Prolog systems have a form of
Jit compilation since at least 1885) and have a slower assert but a faster execution of the
dynamic code ...

>
> Prolog supports a list data type, but this is "emulated" by means of terms
> with a dot (.) as a functor. The list [a,b,c] for example is represented by
> the term .(a,.(b,.(c,[]))).
>

Again there is a difference between source representation and internal representation: whether
you use [a,b,c] or .(a,.(b,.(c,[]))) in a file, most systems will have a representation internally that is
close to the LISP internal representation with a car and a cdr (not cudr-coding though). I know
of only one system (BinProlog) that uses internally the . as well (in some way at least).

Cheers

Bart Demoen

Wouter Van Santvliet

unread,

Dec 20, 2001, 3:17:03 PM12/20/01

to

"Bart Demoen" <b...@cs.kuleuven.ac.be> wrote in message
news:3C21B603...@cs.kuleuven.ac.be...

The remarks of Bart are correct, and much more specific than the generic
explanation I have provided. They refer to implementations of Lisp and
Prolog environments however, and the efforts that have been made to improve
the behaviour of such environments, in particular wrt performance.

The representations I described are "original" in the sense that they have
been the foundation on which the languages have been designed. A large
number of primitives relate directly to the list and term data structures.
Any implementation that respects the language specification will have to
provide those primitives. (I am not only thinking of the (de)composition
primitives for lists and terms, but also of dynamically added functions or
facts and clauses, introspection, and in the case of Lisp of the subtle
semantics of a substantial part of the primitives.)

For understanding the implementations of Lisp or Prolog better, the remarks
of Bart are most valuable. If one is
> ... interested in the way knowledge can be represented and processed in

> computers; from a programming point of view.

however, they seem to complicate matters more instead of adding to the
general insight.

Bart Demoen

unread,

Dec 21, 2001, 3:15:28 AM12/21/01

to

Wouter Van Santvliet wrote:

> For understanding the implementations of Lisp or Prolog better, the remarks
> of Bart are most valuable. If one is
> > ... interested in the way knowledge can be represented and processed in
> > computers; from a programming point of view.
> however, they seem to complicate matters more instead of adding to the
> general insight.

On better reading the original question, I realize it was asking about many levels at once.
I reacted to the part "What is the exact model used to represent FACTS internally
(in memory) in Prolog?" (and the same about lists)
Anyway - is the original poster satisfied or does he want more :-)

Bart Demoen

Jens Kilian

unread,

Dec 21, 2001, 7:06:21 AM12/21/01

to

Bart Demoen <b...@cs.kuleuven.ac.be> writes:
> (Prolog systems have a form of
> Jit compilation since at least 1885)

Wow. Prolog seems to be older than I ever thought!

;-)

> Again there is a difference between source representation and
> internal representation: whether you use [a,b,c] or
> .(a,.(b,.(c,[]))) in a file, most systems will have a representation
> internally that is close to the LISP internal representation with a
> car and a cdr (not cudr-coding though). I know of only one system
> (BinProlog) that uses internally the . as well (in some way at
> least).

IIRC, BinProlog uses a generalized form of cdr-coding when it can, putting
the functor cell of the last argument of a term into the space of that
argument. BinProlog puts type tags on data objects to make this work;
most other Prologs (that I know of) use tagged pointers.

Bye,
Jens.
--
mailto:j...@acm.org phone:+49-7031-464-7698 (TELNET 778-7698)
http://www.bawue.de/~jjk/ fax:+49-7031-464-7351
PGP: 06 04 1C 35 7B DC 1F 26 As the air to a bird, or the sea to a fish,
0x555DA8B5 BB A2 F0 66 77 75 E1 08 so is contempt to the contemptible. [Blake]

Thorsten Seelend

unread,

Dec 23, 2001, 9:47:23 AM12/23/01

to

"Pete Abel" <pete...@hotmail.com> schrieb im Newsbeitrag
news:b119f458.01121...@posting.google.com...

An additional remark.

In Lisp the base data structure isn't a list but a so called
Structured Expression (SEXpression). Every non atomic data
internally is represented as a "dotted pair". For example:

(peter . 12) is represented as

---------
| . | . |
----------
. |
....|..> peter
|
|--> 12

Historically based on the machine language of the IBM computer on
that Lisp first was implemented, these to parts (pointers) are
called CAR (Contents of Address Register) and CDR (Contents of
Displacement Register) respectively.

Because of the fundemantal meaning of lists, they got a special
representation:

(a b c)

which in fact is just an abbreviation and identical to:

(a . (b . (c . nil)))

So internally, Prolog and Lisp represents all structured data the
same way; even the "operator" is the same: The dot.

By
Thorsten

Rahul Jain

unread,

Dec 23, 2001, 6:33:27 PM12/23/01

to

"Thorsten Seelend" <see...@gmx.de> writes:

> In Lisp the base data structure isn't a list but a so called
> Structured Expression (SEXpression). Every non atomic data
> internally is represented as a "dotted pair". For example:

How is a hash-table internally represented as a cons cell?

See cmucl for an example implementation where it's not. I'm at a loss
to find a sample implementation where it is a cons cell.

--
-> -/- - Rahul Jain - -\- <-
-> -\- http://linux.rice.edu/~rahul -=- mailto:rahul...@usa.net -/- <-
-> -/- "I never could get the hang of Thursdays." - HHGTTG by DNA -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
Version 11.423.999.220020101.23.50110101.042
(c)1996-2000, All rights reserved. Disclaimer available upon request.

Erik Naggum

unread,

Dec 24, 2001, 1:26:32 AM12/24/01

to

* "Thorsten Seelend" <see...@gmx.de>

| In Lisp the base data structure isn't a list but a so called Structured
| Expression (SEXpression). Every non atomic data internally is
| represented as a "dotted pair".

That was an amazing number of errors in so little space! Are you coming
from the Scheme camp with this "insight" into Lisp? Since you have
managed to believe this stuff despite a huge amount of corrections every
time someone says something equally wrong, all textbooks on Lisp correct
these misconceptions, and and no textbook on Lisp reinforce these strange
ideas, I presume that they are taken out of nowhere, you believe it very
strongly (or you would not be unafraid to post this nonsense in public
view of thousands of Lisp experts), and that you will not listen to
anyone who corrects you this time, either, this is for the people who
could mistake your article for containing information:

There is no _one_ basic data structure in Lisp. First, "basic" depends
on what you are looking at, and how you are looking at it. _If_ you are
looking at the symbolic (note: not "structured") expression, well, then
the list implemented with the cons cell has some interesting properties,
and atom/non-atom is a valid distinction. Not otherwise. Second, Lisp,
like every other language, has arrays. Lisp 1.5 had arrays. (In LISP
1.5 Programmer's Manual, 4.4 The Array Feature starts "Provision is made
in LISP 1.5 for allocating blocks of storage for data.") Even Scheme has
arrays, so they must be among the _truly_ basic types. Third, while it
is true that a cons cell is not an atom, the incompetent wording "every
non-atomic data" implies that there are more than one type of non-atomic
data. There is only _one_ non-atomic data type, the cons cell. The cons
cell _is_ a dotted pair, so the statement you made is really incompetent.
Oh, and the adjective "base" has a different meaning than you think --
look it up in a dictionary.

The Symbolic Expression is abbreviated "S-expression". The way you have
written it, the function sexp would always return true. <-- joke.

An S-expression is a list of S-expressions _or_ an atom, so "atom" covers
the whole rest of the type hierarchy. Now, it really did look like you
said that all _atoms_ are represented as a dotted pair, because the way
you wrote about the cons cell is so incompetent.

| Historically based on the machine language of the IBM computer on that
| Lisp first was implemented, these to parts (pointers) are called CAR
| (Contents of Address Register) and CDR (Contents of Displacement
| Register) respectively.

Sigh. It was DECREMENT, OK? What _is_ your source for all this crap?

These operators referred to halves of a machine word because memory was
usually less than half of what machine words could address. A common way
to implement a push-down list (now called a stack) was with a machine
word that contained a pointer to the top of the stack in one half and the
negative of the size of the allocated area in the other half. Then you
would add 1 to each half when you pushed something on the push-down list
(which grew upward) and when the decrement half overflowed to (positive)
zero, you had a stack overflow condition. Checking for this condition
was part of the push and pop instructions. All pretty clever. Using the
two halves of a machine word for a cons cell actually required the cdr to
be negative, which is all documented in the Lisp 1.5 Programmer's Manual,
which you should acquire and read before the next time you wish to shoot
your mouth off in a Lisp experts forum.

| So internally, Prolog and Lisp represents all structured data the
| same way; even the "operator" is the same: The dot.

This is Just Plain Wrong.

I think you need to actually _read_ the LISP 1.5 Programmer's Manualą
before you talk any more about it, but more than that, read some of the
stuff that has been done in Lisp over the past 40 years since the first
article in the CACM. Incidentally, I have not been able to acquire a
copy of the LISP 1 Programmer's Manual.

I _really_ hope you know more about Prolog than you know about Lisp.

///
ą ISBN 0-262-13011-4
<URL:http://www.amazon.com/exec/obidos/ASIN/0262130114>
--
The past is not more important than the future, despite what your culture
has taught you. Your future observations, conclusions, and beliefs are
more important to you than those in your past ever will be. The world is
changing so fast the balance between the past and the future has shifted.

Pete Abel

unread,

Dec 24, 2001, 12:09:29 PM12/24/01

to

Wouter Van Santvliet wrote:
{
The term f(a,b) for example is represented by an array |f|a|b|. An

argument of a term can be (a reference to) another term. Prolog
engines allocate memory for term representations on a stack or a
heap. The memory management of prolog is too complex to describe in a

couple of lines. Text-books will often IGNORE the representation of
terms in Prolog.
}

This was the "closest" answer to what I've asked about:
{
Question:

What is the exact model used to represent FACTS internally (in memory)
in Prolog? Is it simply a "tree" data structure?
}

A few "facts" first, I use Turbo Prolog 2.0 for the purpose of
experimentation. And I use C++/VC++ to write code. And my "goal"
starts by trying to simulate the functionality of Prolog's inference
engine... first by finding the right data structure to represent ALL
facts and rules, then by finding the right search algorithm based on
"that" data structure.... but the main problem was the immense
complexity of the "data structure management algorithm" that I've
deduced by "tracing" Turbo Prolog 2.0 goal-execution strategy!

For example, using "arrays" only to represent "terms" is relatively
easy, but requires more "pointers" work... on the other hand, trees
are far more complicated but require less attention to the "framework"
pointers!

/* Question */
In other words, what is the model used (e.g. in Turbo Prolog 2.0) to
represent data (MULTIPLE facts of the form: f(a,b)) in memory so that
future "goals" can be answered (found: looked for in the data
structure) easily?

I'd appreciate more details on "this specific point". Thank you.
Pete

Vadim Radionov

unread,

Dec 24, 2001, 1:34:32 PM12/24/01

to

In comp.lang.lisp Thorsten Seelend <see...@gmx.de> wrote:

TS> In Lisp the base data structure isn't a list but a so called
TS> Structured Expression (SEXpression).

As I know, the abbrivation s-expressition has meaning: "Symbolic expression"

---
RVP

Kalle Olavi Niemitalo

unread,

Dec 26, 2001, 4:40:13 PM12/26/01

to

Rahul Jain <rj...@sid-1129.sid.rice.edu> writes:

> "Thorsten Seelend" <see...@gmx.de> writes:
>
> > Every non atomic data internally is represented as a "dotted pair".
>

> How is a hash-table internally represented as a cons cell?

Hash tables are atomic. ;-)

Jens Kilian

unread,

Jan 7, 2002, 8:58:03 AM1/7/02

to

pete...@hotmail.com (Pete Abel) writes:
> /* Question */
> In other words, what is the model used (e.g. in Turbo Prolog 2.0) to
> represent data (MULTIPLE facts of the form: f(a,b)) in memory so that
> future "goals" can be answered (found: looked for in the data
> structure) easily?

If you can get your hands on a copy of the first edition of _Prolog for
Programmers_ (F. Kluzniak & S. Szpakowicz, Academic Press), you'll find
source code for a tiny Prolog interpreter written in Pascal.
It's not very powerful, though.

Most modern Prologs are compiled, using a nice abstract machine invented
by D.H.D. Warren (the WAM). IIRC, Hassan Ait-Kaci's book on the WAM is
available online somewhere.

HTH,