Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Memory layout for tagged values?

68 views
Skip to first unread message

rupert...@googlemail.com

unread,
Feb 10, 2015, 6:14:12 AM2/10/15
to
Hi,

My WAM implementation is a toy compiler, so I didn't worry too hard yet about what the best layout for byte code instructions, or memory layout for the heap needs to be, as it evolves to become less of a toy. I am now at that point of thinking this out and would appreciate some advice. How do other Prolog compilers deal with tagged values on the heap?

WAM values on the heap are tagged, either as STR for a tuple, or REF for a pointer. To this we can add LIS and CONS for abbreviating the representation of lists as tuples.

This is all I have at the present, I am using a 32-bit heap word, and using the top 2 bits to hold the tag. This gives a max pointer size of 30 bits, but that's good enough for a toy compiler.

To bring my work on to a fuller implementation of Prolog, I need to add integer, character and real types too. I presume these need to be tagged, not just blindly cast at runtime, since ISO has queries such as integer/1, real/1 to test types at runtime.

This gives me 7 tags in total, or 3 bits. I think using a 64-bit word and taking the top 3 for the tag would be a little odd and limiting. Also I would like to keep some spare room in the tag list for experimenting with non-standard types, so an 8 bit tag at least.

If I use 8 bits, and a 64-bit word, that's 72 bits, which if packed in the heap, means unaligned memory access. Should I just got for a full 64/64 tag/value, and keep everything aligned (that also means there is room for longer floating point values). Yes, tagged values in Prolog is a bit of a pain, its going to eat memory bandwidth, but I guess so what speed is not really its essence.

Should room be left in the tag for other purposes, reference counts for a garbage collector, for example?

Does anyone know off-hand what layouts the main Prolog compilers use?

Rupert

R Kym Horsell

unread,
Feb 10, 2015, 7:05:27 AM2/10/15
to
[truncating to 78 cols]
rupert...@googlemail.com <rupert...@googlemail.com> wrote:
> Hi,
> My WAM implementation is a toy compiler, so I didn't worry too hard yet abou
...

All of these things have been largely hashed out in the literature
over the past 30 years. And the literature is vast, since it is
not restricted to just Prolog but includes the functional,
threaded, and OO implementations, too.

Depending on your aims you may head down the YAWN (Yet Another WAM
Nachine) route and have heap-based terms with REF links to save
on space, or you can imagine this is some era when a video chip
has 100s of 256-bit processors, GB of memory, and can afford to
be a dataflow version of a Prolog evaluator that passes around
self-contained copies of terms with no REF ptrs.

Considering the low-level concept of tags -- you can have
the top few bits as tags, some old LISPs used the bottom few bits
as tags because it was more efficient on the architectures
of the time (and could be used as offsets into tables directly),
or you can not have the tags on the pointers but tags in the head
of each object, or if you are considering a 64-bit architecture then
have one half a pointer to an object/class, and the other half a ptr
to a param block and use FORTH-type threaded eval to select
the appropriate put/get/unify code from the "class pointer" (aka tag).

Or you can forget tags altogether and just use address comparisons.
If a ptr is in a given range then it's a struct, another range it's
an atom, another range a list, etc. Sometimes comparing a 32bit or
64bit val is faster than trying to extract 4 bits even from the lsb of a ptr.

Objects can be heap-based, or using some ideas from functional
term-rewrite type systems might be constructed and manipulated on
a stack to take advantage of memory locality or TOS-relative addressing.

And if you're aiming to make unification or indexing efficient
you might consider ideas from parsing -- I'm sure I'm not the only
one that built a toy using yacc/lex to "recognise" clause heads from their
tags and functors and eliminated choicepoints at "compile" time.


--
Maybe if atmospheric CO2 levels increase to 10 times what they are now
(and hence start being comparable to those over most of the time these
creatures have existed) you should start worrying.
[Because you would be in 350 mn BC -- see
<http://geocraft.com/WVFossils/PageMill_Images/image277.gif> ].
-- Peter Webb <r.peter.webb...@gmail.com>, 14 Mar 2012 00:40 +1100

rupert...@googlemail.com

unread,
Feb 10, 2015, 10:18:47 AM2/10/15
to
On Tuesday, February 10, 2015 at 12:05:27 PM UTC, kym wrote:
> [truncating to 78 cols]
> rupert...@googlemail.com <rupert...@googlemail.com> wrote:
> > Hi,
> > My WAM implementation is a toy compiler, so I didn't worry too hard yet abou
> ...
>
> Depending on your aims you may head down the YAWN (Yet Another WAM
> Nachine)

Its going to be a YAWN, as you call it. My aim at the moment is to get from a toy system, to just enough Prolog to support the small amount of code I have written, which really doesn't push the envelope.

Thanks for your suggestions though, some good ideas there to look into longer term.

Do you have any thoughts on what the simplest garbage collector for a WAM would be?

Rupert

Ulrich Neumerkel

unread,
Feb 10, 2015, 11:35:32 AM2/10/15
to
"rupert...@googlemail.com" <rupert...@googlemail.com> writes:
>This is all I have at the present, I am using a 32-bit heap word, and using
> the top 2 bits to hold the tag. This gives a max pointer size of 30 bits,
>but that's good enough for a toy compiler.

You could use the lower two bits and thus get the full address space.

>To bring my work on to a fuller implementation of Prolog, I need to add
>integer, character and real types too. I presume these need to be tagged, not
>just blindly cast at runtime, since ISO has queries such as integer/1, real/1
>to test types at runtime.

There is no cast in ISO Prolog nor any other Prolog I know of.
And there is no separate character type. ISO Prolog uses atoms of
length one as characters. And it's not real/1 but float/1. In fact,
there is a note on this in ISO/IEC 13211-1:

| 2 The floating point type has commonly, but misleadingly,
| been known as "real" in many Prolog processors.

rupert...@googlemail.com

unread,
Feb 10, 2015, 4:31:03 PM2/10/15
to
On Tuesday, February 10, 2015 at 4:35:32 PM UTC, Ulrich Neumerkel wrote:
> "rupert...@googlemail.com" <rupert...@googlemail.com> writes:
> >This is all I have at the present, I am using a 32-bit heap word, and using
> > the top 2 bits to hold the tag. This gives a max pointer size of 30 bits,
> >but that's good enough for a toy compiler.
>
> You could use the lower two bits and thus get the full address space.

Pointer compression, yes I had considered that, but I still need tags for other types too, so going to run to over 2 bits. And I don't think I really want to use the bottom/top few bits of integer values for example, or I end up with some weird 61-bit arithmetic.

I suppose I could have 8 tag types in 3 bits, and use the bottom 3, effectively giving a 64-bit address space with pointer compression and everything word aligned to 64-bits. If the tag type is not a pointer, then int/float/whatever comes in the next word(s), so can be 64/128/whatever bits long.

I'm still wondering though, am I going to need to leave some space somewhere for reference counts for garbage collection?

> There is no cast in ISO Prolog nor any other Prolog I know of.

Yes, so float and int must be tagged so their type is known at runtime.

Prolog:
?- A is a+b.
ERROR: is/2: Arithmetic: `a/0' is not a function

C:
int A = 'a' + 'b'; // A = 97 + 98 = 195, the sum of the ASCII codes.

Ulrich Neumerkel

unread,
Feb 10, 2015, 5:20:56 PM2/10/15
to
"rupert...@googlemail.com" <rupert...@googlemail.com> writes:
>On Tuesday, February 10, 2015 at 4:35:32 PM UTC, Ulrich Neumerkel wrote:
>> "rupert...@googlemail.com" <rupert...@googlemail.com> writes:
>> >This is all I have at the present, I am using a 32-bit heap word, and using
>> > the top 2 bits to hold the tag. This gives a max pointer size of 30 bits,
>> >but that's good enough for a toy compiler.
>>=20
>> You could use the lower two bits and thus get the full address space.
>
>Pointer compression, yes I had considered that, but I still need tags for other types too,

Usually 32-bit words are aligned anyway, so 00 is the tag for adresses.

>> There is no cast in ISO Prolog nor any other Prolog I know of.
>
>Yes, so float and int must be tagged so their type is known at runtime.
>
>Prolog:
>?- A is a+b.
>ERROR: is/2: Arithmetic: `a/0' is not a function
>
>C:
>int A = 'a' + 'b'; // A = 97 + 98 =3D 195, the sum of the ASCII codes.

We must have some very different terminology.

rupert...@googlemail.com

unread,
Feb 11, 2015, 3:39:21 AM2/11/15
to
On Tuesday, February 10, 2015 at 10:20:56 PM UTC, Ulrich Neumerkel wrote:
> "rupert...@googlemail.com" <rupert...@googlemail.com> writes:
> >On Tuesday, February 10, 2015 at 4:35:32 PM UTC, Ulrich Neumerkel wrote:
> >> "rupert...@googlemail.com" <rupert...@googlemail.com> writes:
> >> >This is all I have at the present, I am using a 32-bit heap word, and using
> >> > the top 2 bits to hold the tag. This gives a max pointer size of 30 bits,
> >> >but that's good enough for a toy compiler.
> >>=20
> >> You could use the lower two bits and thus get the full address space.
> >
> >Pointer compression, yes I had considered that, but I still need tags for other types too,
>
> Usually 32-bit words are aligned anyway, so 00 is the tag for adresses.

I'll go for 64-bit and use 000 for the REF tag. The LIS tag for lists holds the address of the first element of the pair that make up the list element object, so I can't use 000 for all address types. Masking a register with AND 000 won't add a noticeable overhead on todays CPUs.

That could give me up to 7 address types, and 111 means a non-address type, look in some of the higher up bits to find out what the actual type is.

> >> There is no cast in ISO Prolog nor any other Prolog I know of.
> >
> >Yes, so float and int must be tagged so their type is known at runtime.
> >
> >Prolog:
> >?- A is a+b.
> >ERROR: is/2: Arithmetic: `a/0' is not a function
> >
> >C:
> >int A = 'a' + 'b'; // A = 97 + 98 =3D 195, the sum of the ASCII codes.
>
> We must have some very different terminology.

Not really, I'm just being too approximate in how I'm explaining things. In C, there is no runtime type information, you can cast a pointer to void* then cast it again to anything else, int* say, and access any memory as integers even though it may originally have been used to store a string for example (perhaps going via void* is not necessary, I just mention that as a way to scrub a pointer of all type information at compile time, in case the compiler tries to stop you doing this). So that is why I say you can 'blindly' cast things in C. Prolog knows the type at runtime as evidenced by the runtime type checks, so it must be necessary to have different tag types for int and float.

add(X, Y, R) :- R is X + Y.
|: % user://1 compiled 0.00 sec, 824 bytes
true.

?- add(2,3,R).
R = 5.

?- add(2.5,5.6,R).
R = 8.1.

?- add(2.5,5,R).
R = 7.5.

Perhaps the compiler constructed 3 versions of the add predicate, depending on the types of the arguments that I called it with? Or perhaps it just selected the right kind of '+' operation at runtime, depending on the type tags.

Rupert

rupert...@googlemail.com

unread,
Feb 11, 2015, 4:10:47 AM2/11/15
to
On Tuesday, February 10, 2015 at 9:31:03 PM UTC, rupert...@googlemail.com wrote:
> I'm still wondering though, am I going to need to leave some space somewhere
> for reference counts for garbage collection?

Having scanned some of the literature on GC in Prolog, I think I may well need to leave some space in the tags to support GC. For example, somewhere to put the 'mark' in a simple mark and sweep type algorithm. However, I can always make room for this by putting it in a separate array, with one entry for each heap cell, so I don't think I need to worry about this at the moment.

Rupert

Ulrich Neumerkel

unread,
Feb 11, 2015, 9:42:20 AM2/11/15
to
"rupert...@googlemail.com" <rupert...@googlemail.com> writes:
>> Usually 32-bit words are aligned anyway, so 00 is the tag for adresses.
>
>I'll go for 64-bit and use 000 for the REF tag. The LIS tag for lists holds
> the address of the first element of the pair that make up the list element
> object, so I can't use 000 for all address types. Masking a register with
>AND 000 won't add a noticeable overhead on todays CPUs.
>
>That could give me up to 7 address types, and 111 means a non-address type,=
> look in some of the higher up bits to find out what the actual type is.

Try tag-on-data. That is, have a **single** pointer type, use the
remaining 1 or 2 bits for GC, and encode all other information
in the other bits. It makes for a much cleaner representation.

There is no reason to be worried about integers not being large enough,
since you want bignums anyway. IF, SWI, YAP, B, SICStus, Ciao all have
them. And for floats you will use 64-bit doubles.

>y' cast things in C. Prolog knows the type at runtime as evidenced by the
>runtime type checks, so it must be necessary to have different tag types for
> int and float.
>
>add(X, Y, R) :- R is X + Y.
>|: % user://1 compiled 0.00 sec, 824 bytes
>true.
>
>?- add(2,3,R).
>R = 5.
>
>?- add(2.5,5.6,R).
>R = 8.1.
>
>?- add(2.5,5,R).
>R = 7.5.
>
>Perhaps the compiler constructed 3 versions of the add predicate, depending
> on the types of the arguments that I called it with? Or perhaps it just
> se lected the right kind of '+' operation at runtime, depending on the type
>tags.

That is done at runtime.

rupert...@googlemail.com

unread,
Feb 12, 2015, 9:53:51 AM2/12/15
to
On Wednesday, February 11, 2015 at 2:42:20 PM UTC, Ulrich Neumerkel wrote:
> >I'll go for 64-bit and use 000 for the REF tag. The LIS tag for lists holds
> > the address of the first element of the pair that make up the list element
> > object, so I can't use 000 for all address types. Masking a register with
> >AND 000 won't add a noticeable overhead on todays CPUs.
> >
> >That could give me up to 7 address types, and 111 means a non-address type,=
> > look in some of the higher up bits to find out what the actual type is.
>
> Try tag-on-data. That is, have a **single** pointer type, use the
> remaining 1 or 2 bits for GC, and encode all other information
> in the other bits. It makes for a much cleaner representation.

Hi, Can you explain how to arrive at a single pointer type? The problem is that I have two, REF and LIS, and I need to know which is which. Thanks.

Jan Burse

unread,
Feb 12, 2015, 10:29:46 AM2/12/15
to
Hi,

I guess I will not contribute to your answer. Instead
I will challenge your question, regarding its underlying
assumption.

What if we are not anymore in the year 1970, and if we
are now in the year 2010. What if memory is cheap.
How about place some tag in the record the pointer is
pointing to, instead of the pointer itself.

Taking the same one step further. What about not
placing a tag there, but instead directly start
the record with another pointer, that points into
a function table. This is actually OO.

What can be done with it, where will it break? Is
it true that it will break for atomics? Why not
preallocate some objects for a couple of atomics(*),
and go with the heap for the rest?

Bye

(*) Like the numbers -128 .. 127, which
are frequently used.

rupert...@googlemail.com schrieb:

Jan Burse

unread,
Feb 12, 2015, 11:13:40 AM2/12/15
to
Jan Burse schrieb:
> and go with the heap for the rest?
Or stack, or environment, what ever you have for
the life cycle of the objects in Prolog.

Ulrich Neumerkel

unread,
Feb 12, 2015, 12:20:33 PM2/12/15
to

Ulrich Neumerkel

unread,
Feb 12, 2015, 12:25:11 PM2/12/15
to
You know a system that stores floats, or lists somewhere else?

Jan Burse

unread,
Feb 12, 2015, 1:34:41 PM2/12/15
to
ulr...@mips.complang.tuwien.ac.at (Ulrich Neumerkel) schrieb:
Jekejeke Prolog doesn't have a special stack array
or special environment array, everything is a non-array
data structure in the heap. The only small and large in
number arrays I use is for compounds.

Including floats which are the Java Double datatype,
and lists which have not a special datatype in
Jekejeke Prolog, which are just compounds. See
for example here:

http://www.jekejeke.ch/idatab/doclet/prod/en/docs/05_run/15_stdy/06_bench/04_object/01_memory.html

And Jekejeke Prolog is kind of tagless. At least I
wouldn't know that Java uses tagged pointers. Just
pointers pointing to records that share some function
maps defined via the Java classes.

The thread stack is only used for some ops, for
example unification, copy, etc.. But then also
tail recursive algorithms are used, which don't
need much stack.

As a result when Jekejeke Prolog is started there
are no parameters for the size of trail, stack,
environment or whatever. Everything can grow
arbitrarily provided it fits in the heap, also
when multiple threads run.

Bye

Jan Burse

unread,
Feb 12, 2015, 1:42:36 PM2/12/15
to
Jan Burse schrieb:
> The thread stack is only used for some ops, for
> example unification, copy, etc.. But then also
> tail recursive algorithms are used, which don't
> need much stack.

The thread stack --> Each thread stack

BTW: There is also no copying like in the WAM
when a predicate is invoked:

"It should be noted, that we divert from the architecture of the Warren
Abstract Machine (WAM) in two respects. The first difference can be seen
in the way goals of a clause are invoked. In the WAM machine the
skeleton display pointer pair approach is broken for goals. It is
explicitly noted in [4] that goal invocations are dynamically created by
assembling the arguments and then invoking the predicate. In our
architecture we stick to the skeleton display pointer pair approach.
Each goal in a clause is simply a different skeleton pointer which is
completed by the display pointer of the clause instantiation."

Bye

P.S.: "The second difference is now found in the way how we deal with
the cut. Jekejeke Prolog differs from other Prolog as far as we allow
naked variables and cut transparent predicates. For this purpose the
frame also contains a prune field, which points to the true frame that
will belong to a cut. As a result we can invoke cuts inside
meta-predicates with the desired result. Eventually this could also be
realized inside a WAM, but the default mode of analysis and execution is
that a cut is statically recognized inside a clause and that the cut
will only prune its own frame and not an eventually inherited frame."

http://www.jekejeke.ch/idatab/doclet/prod/en/docs/05_run/15_stdy/06_bench/04_object/02_resolution.html

Jan Burse

unread,
Feb 12, 2015, 1:54:38 PM2/12/15
to
Hi,

Again I guess I will not contribute to your answer. Instead
I will challenge one more time your question, regarding
its underlying assumption.

The next underlying assumption I am challenging is the
idea of having executable code in the form a code array,
and a program counter that points into this code array.

It is already known that often operational semantics can
be expressed in so called small step-semantics. If we
look at small-step semantics, we see that the program
counter has been replaced by sub structures.

Why not also apply this idea to Prolog? Why need code,
when we can work with structures?

Bye

"Structural operational semantics (also called structured operational
semantics or small-step semantics) was introduced by Gordon Plotkin in
(Plotkin81) as a logical means to define operational semantics. The
basic idea behind SOS is to define the behavior of a program in terms of
the behavior of its parts, thus providing a structural, i.e., syntax
oriented and inductive, view on operational semantics. An SOS
specification defines the behavior of a program in terms of a (set of)
transition relation(s). SOS specifications take the form of a set of
inference rules which define the valid transitions of a composite piece
of syntax in terms of the transitions of its components."

http://en.wikipedia.org/wiki/Operational_semantics#Structural_operational_semantics

rupert...@googlemail.com schrieb:

rupert...@googlemail.com

unread,
Feb 12, 2015, 4:20:58 PM2/12/15
to
On Thursday, February 12, 2015 at 3:29:46 PM UTC, Jan Burse wrote:
> I guess I will not contribute to your answer.
> ...
> How about place some tag in the record the pointer is
> pointing to, instead of the pointer itself.

Well, I think this does answer one question, which is how to arrive at a single pointer type.

Rupert

R Kym Horsell

unread,
Feb 12, 2015, 5:01:43 PM2/12/15
to
You perhaps have some additional, unstated, assumptions going.

Having type information inside the object itself is the extreme
case where a pointer is just a pointer with no tag bits in it.
Hence it comes in notionally one kind.

--
["There has been no warming for 18 years":]
BBC: Do you agree that from 1995 to the present there has been no
statistically-significant global warming
Phil Jones: Yes, but only just. I also calculated the trend for the
period 1995 to 2009. This trend (0.12C per decade) is positive, but
not significant at the 95% significance level. The positive trend is
quite close to the significance level. Achieving statistical
significance in scientific terms is much more likely for longer
periods, and much less likely for shorter periods.

R Kym Horsell

unread,
Feb 12, 2015, 5:19:21 PM2/12/15
to
R Kym Horsell <k...@kymhorsell.com> wrote:
> rupert...@googlemail.com <rupert...@googlemail.com> wrote:
>> On Thursday, February 12, 2015 at 3:29:46 PM UTC, Jan Burse wrote:
>>> I guess I will not contribute to your answer.
>>> ...
>>> How about place some tag in the record the pointer is
>>> pointing to, instead of the pointer itself.
>> Well, I think this does answer one question, which is how to arrive at a single pointer type.
>> Rupert
> You perhaps have some additional, unstated, assumptions going.
> Having type information inside the object itself is the extreme
> case where a pointer is just a pointer with no tag bits in it.
> Hence it comes in notionally one kind.
...

Perhaps to re-iterate:

If you don't want tags anywhere -- even inside the object -- you
can separate the address spaces into the different types of objects.
Determining the type is just then a range check on the addr (still
just a simple pointer).

You can optimise this code a bit to check ranges in order
with the most common type (e.g. list or ref) first, or use compile
time checking to eliminate ranges that can not possibly apply at
that point in the code. (E.g. mode declarations available at compile time).

If you do some timing checks on even modern architectures you will find
shifting can be very very slow because a full 64 bit shifter is
a very expensive and not very utilized piece of real estate on a chip.

It's sometimes easier for the maker to have a shifter that does a few
selected shifts and rotate the quantity around a few times to extract the
desired result.

On those types of hardware a simple pointer comparison can be 2-5x or more
faster than trying to extract type bits from either the pointer and
esp the object, let alone try to use those bits to jump to some
appropriate piece of code. ("Pipeline stall").
Even if you want to use typed pointers e.g. it's sometimes
more efficient to use a cascade of conditional instructions
(i.e. some machines allow an individual instruction to be executed
iff some condition bit or comparison is "true"; conditional moves,
conditional exchanges and even conditional arithmetic is implemented
on some fairly common h/w) than a conditional branch off to some piece of
code to handle atoms, nil, variables, etc.

--
(Chumpsky initiated thread: Thanks To The IPCC, the Public Doesn't Know
Water Vapor Is Most Important Greenhouse Gas).
Water vapour is the most important GHG, so it says right in IPCC
reports. But the public doesn't read detailed IPCC reports, they
don't even read the digested summaries written for politicians. Hence
the public is easily misled about CO2 being the only important
greenhouse gas.
-- K Dobranski aka D'Oh D'Oh Dobrainsky, Feb 2015
[Putting science material in reports and report summaries is the way to
hide them and mislead the public? Convoluted!]

Jan Burse

unread,
Feb 12, 2015, 5:50:35 PM2/12/15
to
R Kym Horsell schrieb:
> On those types of hardware a simple pointer comparison can be 2-5x or more
> faster than trying to extract type bits from either the pointer and
> esp the object, let alone try to use those bits to jump to some
> appropriate piece of code. ("Pipeline stall").

You can look at the performance for the OO approach from
the following angle. It is to expect that the vtables of
the records remain in cache, since there are only a
couple of classes.(*)

So I guess calling a function from the vtable might
be relatively cheap compared to tags. Compared to tags
realized as bits or compared to tags realized as
range checks.

But the OO approach is practically difficult to code,
the usual languages that support OO will force one
to distribute the code among a couple of classes, and
this code is difficult to maintain.

So I ended up with some tagging like code, by using
instanceof checks, like I would check a pointer, so
to have related code in one place.

If the language were a little bit more flexible, I
would group the same aspect of different classes in
one file, something in pseudo Prolog module
code like:

File unify.p:

class1:unify(Other) :- ... Self ...

class2:unify(Other) :- ... Self ...

etc..

Bye

(*)
http://stackoverflow.com/a/1292773/502187


Jan Burse

unread,
Feb 12, 2015, 6:03:18 PM2/12/15
to
Jan Burse schrieb:
>
> You can look at the performance for the OO approach from
> the following angle. It is to expect that the vtables of
> the records remain in cache, since there are only a
> couple of classes.(*)

This paper says median overhead from 5.2% to 13.7%
compared to a direct call with known address or
some such.

http://www.cs.ucsb.edu/~urs/oocsb/papers/oopsla96.pdf

R Kym Horsell

unread,
Feb 12, 2015, 6:33:06 PM2/12/15
to
Jan Burse <janb...@fastmail.fm> wrote:
> R Kym Horsell schrieb:
>> On those types of hardware a simple pointer comparison can be 2-5x or more
>> faster than trying to extract type bits from either the pointer and
>> esp the object, let alone try to use those bits to jump to some
>> appropriate piece of code. ("Pipeline stall").
> You can look at the performance for the OO approach from
> the following angle. It is to expect that the vtables of
> the records remain in cache, since there are only a
> couple of classes.(*)

Yes, there are various attractions for the OO approach.
Primarily -- as has been said -- clean because you never should
have to test the type of an object; just call the virt proc.

Modern chips can also do a bit better job on the low-level
execution of subroutine calls compared with conditional jumps/branches.

With a subroutine you can definitely start fetching the subroutine
code into the pipeline as soon as you see it because you *will* be going there;
but with a conditional branch "if tag==nil then goto L42" you need to start
fetching from 2 places in case the test succeeds or in case it fails.

When I last looked the major chips could do this
fetch-ahead to a depth of 3 or 4 -- i.e. fetch in paralell from
a dozen different places in the memory space or instr cache. Pretty
amazing compared with what even "supercomputers" did back in the mid-90s
when I was last in school. :)

> So I guess calling a function from the vtable might
> be relatively cheap compared to tags. Compared to tags
> realized as bits or compared to tags realized as
> range checks.
> But the OO approach is practically difficult to code,
> the usual languages that support OO will force one
> to distribute the code among a couple of classes, and
> this code is difficult to maintain.
> So I ended up with some tagging like code, by using
> instanceof checks, like I would check a pointer, so
> to have related code in one place.

Sounds like a case of needing a little special-purpose
language and a pre-processor or code generator to split
the code into appropriate subclasses.
But, sigh, it would probably end up looking like the definition
of the usual specialised WAM instructions. :)

> If the language were a little bit more flexible, I
> would group the same aspect of different classes in
> one file, something in pseudo Prolog module
> code like:
> File unify.p:
> class1:unify(Other) :- ... Self ...
> class2:unify(Other) :- ... Self ...
> etc..
> Bye
> (*)
> http://stackoverflow.com/a/1292773/502187

--
<http://en.wikipedia.org/wiki/File:2000_Year_Temperature_Comparison.png>
Anybody with a I.Q. level of 80 or higher can see that what we are
experiencing right now... is perfectly normal on this planet..
-- Cartooni aka Cartoony aka Robert Williamson, 20 Dec 2013

rupert...@googlemail.com

unread,
Feb 13, 2015, 4:21:38 AM2/13/15
to
That made for an interesting read. I'm only interested in Intel x86 CPUs at the moment, do you know if this is accurate on the latest architecture?

I had assumed that bit twiddling, shifting, masking etc. as well as basic arithmetic were all very fast, single cycle operations on registers, and in many cases likely to be able to be run in parallel with other operations. I generally assume that packing bits into a word, despite requiring some code to unpack them, is going to be a win simply because it reduces memory bandwidth.

rupert...@googlemail.com

unread,
Feb 13, 2015, 4:29:44 AM2/13/15
to
On Thursday, February 12, 2015 at 6:54:38 PM UTC, Jan Burse wrote:
> The next underlying assumption I am challenging is the
> idea of having executable code in the form a code array,
> and a program counter that points into this code array.
>
> It is already known that often operational semantics can
> be expressed in so called small step-semantics. If we
> look at small-step semantics, we see that the program
> counter has been replaced by sub structures.
>
> Why not also apply this idea to Prolog? Why need code,
> when we can work with structures?

Hi Jan, A question about this:

Using this approach do you end up with something comparable to a binary byte-code format in which code can be distributed? That is, can your structures easily be serialized into files, or are you distributing programs as source code?

Rupert

Jan Burse

unread,
Feb 13, 2015, 5:48:39 AM2/13/15
to
rupert...@googlemail.com schrieb:
>> Why not also apply this idea to Prolog? Why need code,
>> >when we can work with structures?
> Hi Jan, A question about this:
>
> Using this approach do you end up with something
> comparable to a binary byte-code format in which
> code can be distributed? That is, can your
> structures easily be serialized into files, or
> are you distributing programs as source code?
>
> Rupert

I have the feeling here is also a misconception
at work. One reason for converting a Prolog code
into some other code than the Prolog text is for
example to solve the bootstrapping problem.

Take the following bootstrapping problem:
- You have a term reader written in Prolog.
- You have a consult command written in Prolog.
- But when you startup you Prolog you don't have
them yet ready, so you must load some other code,
for example byte code, to have the term reader
and the consult command, so that you then can
load other Prolog text (ASCII).

But if you have written the term reader anyway
in another language, for example Java, and the
consult command/intelligent loader also in an
other language, for example Java, then there is
never need for some other code, for example
byte code.

You can just go with this functionality which
you have in your Host language, for example
Java, so that you then can load other Prolog
text (ASCII).

The remaining question is only, is it fast.
Well you can measure yourself. There is hardly
any advantage by other code, for example byte
code, over Prolog text (ASCII). SWI-Prolog can
very very quickly parse and load Prolog text
(ASCII), I did some testing, it is faster than
Jekejeke Prolog Prolog text parsing.

And I consider Jekejeke Prolog text parsing
already fast. So if you are not planning to
do the byte code bootstrap stuff, there is no
need to have some other code. Actually I was
researching into the minimality of the kernel
recently. Besides have read/consult ops, you
need also to manually add some ur-predicates.

You can then quasi bootstrap also a great
part of the interpreter, for example file name
resolution etc.., need not be put into the
initial code. So you can have very large portion
of the interpreter itself also in Prolog text
(ASCII).

Bye

P.S.: Replace (ASCII) by (Unicode).






Jan Burse

unread,
Feb 13, 2015, 5:52:22 AM2/13/15
to
Jan Burse schrieb:
> And I consider Jekejeke Prolog text parsing
> already fast. So if you are not planning to
> do the byte code bootstrap stuff, there is no
> need to have some other code. Actually I was
> researching into the minimality of the kernel
> recently. Besides have read/consult ops, you
> need also to manually add some ur-predicates.

Most of the time the parser is anyway faster
than the hard disk. What you probably measure
is only the hard disk. But have to check again,
probably there are also papers around showing
that issue.

What can slow down parsing is some term
expansion, especially if it applies some
algorithms. For example my forward chaining
component does a rule conversion, using a
naive algorithm, is still a little slow.

Bye

Jan Burse

unread,
Feb 13, 2015, 6:09:04 AM2/13/15
to
Jan Burse schrieb:
>
> You can then quasi bootstrap also a great
> part of the interpreter, for example file name
> resolution etc.., need not be put into the
> initial code. So you can have very large portion
> of the interpreter itself also in Prolog text
> (ASCII).

Well the above is of course the poor man's
solution. Some advanced systems would have some
code and loading is probably done by just memory
mapping some disk image.

But I havent yet seen a situation I would wish
for that. There are situations, where the startup
time of a Prolog interpreter matters. For example
if you drive a web server CGI style.

But you can drive a web server also thread style,
have your interpreter loaded, and serve pages
by threads.

A further option would be to provide something
like class sharing in Java/Android, a zygote(*) that
does provide a new process, but where there is
some sharing of a kernel.

Bye

Copy On Write (COW).
http://anatomyofandroid.com/2013/10/15/zygote/

http://www.google.com.ar/patents/US8261263

rupert...@googlemail.com

unread,
Feb 13, 2015, 6:13:43 AM2/13/15
to
On Friday, February 13, 2015 at 10:48:39 AM UTC, Jan Burse wrote:
> rupert...@googlemail.com schrieb:
> >> Why not also apply this idea to Prolog? Why need code,
> >> >when we can work with structures?
> > Hi Jan, A question about this:
> >
> > Using this approach do you end up with something
> > comparable to a binary byte-code format in which
> > code can be distributed? That is, can your
> > structures easily be serialized into files, or
> > are you distributing programs as source code?
>
> I have the feeling here is also a misconception
> at work.

I was thinking of another reason besides bootstrapping and 'is it fast'.

Prolog is a comparatively obscure language, and a bytecode compiled program written in it even more so. Distributing programs in binary form will provide some protection by obscurity of my intellectual property. I was also considering using techniques like encrypting the byte code and weaving the decryption into a special version of the byte code machine, and other techniques that could contribute to protecting the byte code itself. Of course, none of this guarantees not being hacked, but it will provide a significant deterrent.

The product I am thinking of creating will have an AI core written in Prolog, which will be its real value. Other parts will likely be in Java and a web application stack, and will be much easier to reverse engineer. Without the AI core though, there will not be much value in them, they will be better protected by being open sourced and GPL'ed.

Another reason for byte code, which is perhaps more historical, is that it provides a convenient stepping stone to compiling down to native code. I wrote an LLVM back-end for an earlier version of one of the byte code machines I wrote, as a proof of concept, and taking the naive approach of directly implementing the same machine architecture and spitting out LLVM code to manipulate it in exactly the same way that the interpreter works turned out to be fairly easy to do. This could be considered a straight mapping of the byte code machine down to native, I think a lot could be gained in performance terms by considering a less straight mapping and what optimizations could be applied at that stage.

No reason at all though why LLVM code could not be output that directly maps your more OO approach. A non-WAM machine architecture might make a far better starting point for the LLVm stage.

You'll be getting native compilation through the JVM anyway, and its a well optimized machine, so I think your approach is a very good one.

Rupert

Jan Burse

unread,
Feb 14, 2015, 3:03:26 PM2/14/15
to
rupert...@googlemail.com schrieb:
> You'll be getting native compilation through the JVM anyway,
> and its a well optimized machine, so I think your approach
> is a very good one.

Thanks for the roses. But I don't generate some byte code
for the JVM. In the small-step semantics implementation
there is no programm counter and no code what ever, as
I already said. Its just structures that are followed somehow.

Take for example the following is/2 goal:

?- X is 1+2.

In the code what ever approach this might be compiled
as follows, not doing some constant evaluation
optimization:

LDI 1 /* pushes 1 on the stack */
LDI 2 /* pushes 2 on the stack */
ADD /* pop two elements, add them, push the result */
UNIFY X /* unify the result with X */

But in the small-step semantics, well you would somewhere
have the following tree in the memory:

+
/ \
1 2

And a single evaluator algorithm, evaluating this tree.
So the small-step semantics is an interpreter approach,
not a compiler approach.

For evaluating arithmetic expressions the small-step
semantics is not so interesting, one could go with
the big-step semantics.

But for Prolog search it is interesting, since the
small-steps are interated, and this leads naturally
to a trampolin realization of the interpreter. Namely
not using the procedure stack itself for the search
algorithm.

But the JVM helps in implementing these algorithms. I
actually see a massive improvement in performance
between cold run and warm run, when the JIT has
kicked in.

But using JVM or any other hostlanguage for the
interpreter algorithms is not all glitter. There are
a lot of problems, which might be simpler in other
VMs, for example the Erlang VM.

What bothers me most is the Foreign Function
Interface (FFI). A paper that illustrates the
problems is found here:

Unipycation: A Case Study in Cross-Language Tracing
Edd Barrett Carl Friedrich Bolz Laurence Tratt, 2003
http://eddbarrett.co.uk/pubs/pdf/barrett2013unipycation.pdf

Bye

rupert...@googlemail.com

unread,
Feb 17, 2015, 2:54:52 AM2/17/15
to
On Saturday, February 14, 2015 at 8:03:26 PM UTC, Jan Burse wrote:
> So the small-step semantics is an interpreter approach,
> not a compiler approach.

So is it fast? ;-)

> For evaluating arithmetic expressions the small-step
> semantics is not so interesting, one could go with
> the big-step semantics.
>
> But for Prolog search it is interesting, since the
> small-steps are interated, and this leads naturally
> to a trampolin realization of the interpreter. Namely
> not using the procedure stack itself for the search
> algorithm.

I see. A stack is well suited to depth first search. Does moving away from using a stack make it easier to implement other search algorithms? Constraint guided searches, heuristic searches or searches guided by probabilities and so on?

Rupert

Jan Burse

unread,
Feb 17, 2015, 3:36:41 AM2/17/15
to
rupert...@googlemail.com schrieb:
> I see. A stack is well suited to depth first search.
> Does moving away from using a stack make it easier
> to implement other search algorithms? Constraint
> guided searches, heuristic searches or searches
> guided by probabilities and so on?

There still will be a stack. But in the trampolin
approach one isn't forced to use the native
stack of the host language.

In the case of JVM each thread gets a preallocated
fixed size stack, and if you reach the limit all
you get is a stack-overflow error.(*)

So using a trampolin has the following advantages:

- One is not bound to the size limit of the
threads, respectively each threads can be
kept small.

- Stack based optimizations such as last call
optimizations resp. tail call optimization
can be implemented.

In my heap based stack realization, last
call optimization is just some pointer
twizzling. Without last call optimization
Prolog doesn't run well.

Bye


(*)
In most JVM it is like that. I don't know whether
there are currently JVMs that allow to grow their
stack.

rupert...@googlemail.com

unread,
Feb 17, 2015, 7:04:08 AM2/17/15
to
On Tuesday, February 17, 2015 at 8:36:41 AM UTC, Jan Burse wrote:
> In the case of JVM each thread gets a preallocated
> fixed size stack, and if you reach the limit all
> you get is a stack-overflow error.(*)
>
> So using a trampolin has the following advantages:
>
> - One is not bound to the size limit of the
> threads, respectively each threads can be
> kept small.
>
> - Stack based optimizations such as last call
> optimizations resp. tail call optimization
> can be implemented.

I have a distant memory that last tail call was in the original JVM specification, or at least I remember a fellow student 15 years ago trying to compile a functional language onto the JVM and use its last tail call feature and asking me why it didn't work; it was in the spec but never implemented.

At any rate, they seem to have implemented it on version 8, presumably lambdas wasn't going to be much fun without recursion...

http://www.drdobbs.com/jvm/tail-call-optimization-and-java/240167044

rupert...@googlemail.com

unread,
Feb 17, 2015, 7:06:34 AM2/17/15
to
On Tuesday, February 17, 2015 at 8:36:41 AM UTC, Jan Burse wrote:
> rupert...@googlemail.com schrieb:
> > I see. A stack is well suited to depth first search.
> > Does moving away from using a stack make it easier
> > to implement other search algorithms? Constraint
> > guided searches, heuristic searches or searches
> > guided by probabilities and so on?
>
> There still will be a stack. But in the trampolin
> approach one isn't forced to use the native
> stack of the host language.

Cool. What I was really getting at, is does the small-step semantics approach help at all with implementing non-depth first search orderings as compared with WAM?

Jan Burse

unread,
Feb 17, 2015, 11:31:28 AM2/17/15
to
rupert...@googlemail.com schrieb:
>> There still will be a stack. But in the trampolin
>> >approach one isn't forced to use the native
>> >stack of the host language.
>
> Cool. What I was really getting at, is does the
> small-step semantics approach help at all with
> implementing non-depth first search orderings as
> compared with WAM?

I was using the term "small-step semantics" more
as a moniker for interpreters that work with
structures instead of compiles that work with
some code.

Concerning non Prolog search strategies, I tend
to implement them in Prolog itself, instead in
a host language.

But there is a wast literature about host
language implemented SAT, SMT, ATPs. I don't
know how much of them qualify as "small-step".
Probably much more than Prolog interpreter.

I consider the compilation idea quite specific
to Prolog, it started with some publications
by Warren and there was also the idea of having
an alternative to LISP.

Bye


0 new messages