Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Message trees can be fast

24 views
Skip to first unread message

Mike Austin

unread,
Nov 7, 2009, 3:14:03 PM11/7/09
to
It's been said that interpreting a message tree will likely be much slower than
interpreting bytecode. While developing Impulse, I wanted to see how far I
could push message trees, and recent optimization have proven fruitful. In
this microbenchmark (I know, I know), it takes under 5 seconds to iterate
3,600,000 times while performing a simple coordinate rotation:

rotations = 360 * 10000

x0 = 10
y0 = 0

(1..rotations) each: a ->
a = a * (PI / 180)
x1 = (x0 * a cos) - (y0 * a sin)
y1 = (x0 * a sin) + (y0 * a cos)
end

/impulse-0.5.3/bench$ time impulse rotate.im
real 0m4.893s
user 0m4.832s
sys 0m0.012s

So I'm happy :) This is on par with Ruby 1.9 and faster than Python 3, granted
there are some optimization which I'll have to evaluate in a larger scale
program. Some of the optimizations include minimal dynamic allocation,
specialized math messages and inlining all core functions. Also, message
chains, such as "a sin abs", are arrays instead of trees.

The only thing left I can optimize is the lookup, which currently uses a
std::map. Are there any other near-perfect hash algorithms besides cuckoo
hashing? I'd like to try the CMPH library but it's LGPL'd and I use an MIT
license.

Regards,
Mike

BGB / cr88192

unread,
Nov 7, 2009, 3:59:01 PM11/7/09
to

"Mike Austin" <mi...@mike-nospam-austin.com> wrote in message
news:D6qdnQ3NHuiRSGjX...@giganews.com...

> It's been said that interpreting a message tree will likely be much slower
> than interpreting bytecode. While developing Impulse, I wanted to see how
> far I could push message trees, and recent optimization have proven
> fruitful. In this microbenchmark (I know, I know), it takes under 5
> seconds to iterate 3,600,000 times while performing a simple coordinate
> rotation:
>

don't know what a message tree is, can't find info...

but, recently, I have been having fairly good success with a hash table,
structs, and function pointers...


> rotations = 360 * 10000
>
> x0 = 10
> y0 = 0
>
> (1..rotations) each: a ->
> a = a * (PI / 180)
> x1 = (x0 * a cos) - (y0 * a sin)
> y1 = (x0 * a sin) + (y0 * a cos)
> end
>
> /impulse-0.5.3/bench$ time impulse rotate.im
> real 0m4.893s
> user 0m4.832s
> sys 0m0.012s
>
> So I'm happy :) This is on par with Ruby 1.9 and faster than Python 3,
> granted there are some optimization which I'll have to evaluate in a
> larger scale program. Some of the optimizations include minimal dynamic
> allocation, specialized math messages and inlining all core functions.
> Also, message chains, such as "a sin abs", are arrays instead of trees.
>

yep.


> The only thing left I can optimize is the lookup, which currently uses a
> std::map. Are there any other near-perfect hash algorithms besides cuckoo
> hashing? I'd like to try the CMPH library but it's LGPL'd and I use an
> MIT license.
>

LGPL should not be problematic for MIT, since LGPL does not have the exact
same properties as GPL (it will not try to force itself on the rest of the
codebase, ...).

although, granted, you can't freely copy code back and forth either, since
the LGPL would remain in effect on the copied code.


well, hashing is not too difficult to implement, although many of the
well-known variations (such as those on Wikipedia) are IMO not very good...

my usual algo for string hashing is:
hi=0;
while(*s)hi=(hi*4093)+(*s++);

or for integers:
hi=val*4093;

which can be converted to the index for a hash table via:
i=(hi>>12)&4095; //for a 12-bit hash.

I also typically step to subsequent indices via a pseudo-random sequence:
hi=hi*4093;

IME, this usually does fairly well (and it is often both faster and better
behaved than use of shifts and xor to produce the hash...).


a slight variation of this is to use a mersenne prime, but I have noted that
mersenne primes have different hashing behavior than most other primes, so
one has to adjust the algo in response (using a mersenne prime as presented
above will give poor results).

...


> Regards,
> Mike


Rod Pemberton

unread,
Nov 8, 2009, 6:16:03 AM11/8/09
to
"BGB / cr88192" <cr8...@hotmail.com> wrote in message
news:hd4n2l$3rv$1...@news.albasani.net...

>
> my usual algo for string hashing is:
> hi=0;
> while(*s)hi=(hi*4093)+(*s++);
>

BGB, you've mentioned hashing quite a bit lately. And, you've posted quite
a few algo's... You either have alot of them or you keep changing in an
attempt them to improve them.

My brute force tests of near 50 hashing algorithms and variations from
various sources on the Internet was that there are two good algorithms
publicy available:

Austin Appleby's MurmurHash2
Bob Jenkins' hashlittle()

IIRC, Jenkin's hashlittle is from lookup3.c. His hash from lookup2 performs
very similarly to it in terms of collisions.

The Jenkins' hash generates 10% or so more collisions than Appleby's.
However, both are very low compared to everything else I tested. For the
most part, I didn't check to see which hashes failed the various Bob
Jenkins' hash tests. (Most fail.) Appleby's MurmurHash2 passes them.
IIRC, the speed of one of them (Jenkins ?) was very sensitive to different
optimization levels with the GCC compiler. E.g., why is my code with this
hash fast with -O but slow with -O2?

Another thing to note is Paul Hsieh's highly respected hashes didn't perform
well. You should also note that quite a few respected mathematicians in the
list have come up with absolutely horrid hashes. But, one should remember
that many of these were created decades ago and weren't intended for the
data we're throwing at them today.

By programatic brute force, I've tested every 4-byte ASCII sequence and a
large percentage (I'd guess 60%...) or so of the 5-byte ASCII sequences
(upto 2GB file - DOS limit). I was planning to rewrite the test programs so
I could test the entire 5-byte ASCII sequence range, but I didn't get around
to it. And, I concluded that what I had was probably good enough. Sorry, I
didn't have need of hashes of binary sequences, the preferred test.
However, I believe the results on the ASCII sequences, since they are
consecutive binary values over the text range, are representative of the
hash's performance in regards to collisions.

All the other hashes fell into two categories: too slow (roughly 33%) to be
useful, too many collisions (remainder). I've speed/collision tested the
following. Sorry, these may have my naming for them instead of their actual
names... Remember, the list includes the good hashes above, but the
rest are "bad". The list has been sorted alphabetically.

additive hash
Adler checksum
Arash Partow hash
Arash Partow hash
Austin Appleby's MurmurHash2
Austin Appleby's MurmurHash2 aligned
Bob Jenkins' hash
Bob Jenkins' hashlittle
Bob Jenkins' oneAtATimeHash
Bob Jenkins' oneAtATimeHashPH
BP hash (unknown)
Bruno Preiss hash
CRC16
CRC32
CRC32PH (parallel hash)
DJ Bernstein
DJ Bernstein 5381 sum
DJ Bernstein 5381 xor
DJ Bernstein >>16 variation
DJ Bernstein modified
DJ Bernstein original
Donald Knuth hash
elf hash
Fletcher checksum
FNV (Fowler,Noll,Vo) Hash
GAWK hash
hash 65599
Julienne Walker hash
Justin Sobel hash
K&R >>16 variation
K&R hash
Modified FNV (Fowler,Noll,Vo)
Ozan Yigit's SDBM hash
Paul Hsieh SFHAsm (Super Fast Hash in asm)
Paul Hsieh SuperFastHash
Peter Kankowski x17
Peter Kankowski x17 unrolled
Peter Weinberger RD hash
Peter Weinburger hash
Ramakrishna hash
Robert Sedgwicks hash
rot hash
shift-add-xor
TCL HashString
xor hash

> although, granted, you can't freely copy code back and forth either, since
> the LGPL would remain in effect on the copied code.
>

Unh...

AIUI, under US law, a copyright applies to the *entire* work as a whole,
not little sub-pieces. I.e., if you take a small piece from here and insert
it over there in a larger work, the copyright of the larger work applies...
This can lead to obvious problems, since a larger sub-piece, if unique
enough, can also constitute an entire work under the law. Interpretation is
almost entirely up to the judge...

One of the irritants of the the GNU FAQ's on the GPL is that they say Public
Domain code (i.e., no copyright) can be inserted into copyrighted code.
But, under US and international laws, such a work was copyrighted *once*
automatically and therefore it can't be copyrighted again. Inserting it
into GPL'd code - where the copyright applies to the *entire* work -
attempts to copyright the Public Domain code... Of course, this only applies
if that piece of PD code is unique and large enough to be considered an
entire work. But, I've seen quite a few situations in GPL where they copied
the entire PD work into GPL'd code. Although I've conversed with an IP
attorney on this issue, I'm not an IP attorney, so... My recommendation is
to treat every license (w/copyrighted code) and Public Domain code as
unmixable entities. I.e., don't mix PD and GPL. Don't mix BSD and LGPL.
etc. If you want or intend to do so, I'd check with the OSI (Open Source
Initiative) first, for their legal recommendations, not FSF's. Some of
FSF's "interpretations" seem somewhat dubious.


Rod Pemberton


BGB / cr88192

unread,
Nov 8, 2009, 9:25:50 AM11/8/09
to

"Rod Pemberton" <do_no...@nohavenot.cmm> wrote in message
news:hd69d5$umo$1...@aioe.org...

> "BGB / cr88192" <cr8...@hotmail.com> wrote in message
> news:hd4n2l$3rv$1...@news.albasani.net...
>>
>> my usual algo for string hashing is:
>> hi=0;
>> while(*s)hi=(hi*4093)+(*s++);
>>
>
> BGB, you've mentioned hashing quite a bit lately. And, you've posted
> quite
> a few algo's... You either have alot of them or you keep changing in an
> attempt them to improve them.
>

I have a collection of them I use for different tasks, where each one does
slightly different things.
I have been using them for around 8 years now, so I have collected a few
variants.


the above sequence is mostly for calculating a hash value for strings, and
is best suited for hash tables in the 4096-1M range.

another variation is to hash integer data.
and, sometimes, I combine pre-existing hashes (this used in my object
system).


similarly, the tables themselves can be used in different ways:
as a single stop cache. (I call these "caching hashes", since they only
cache things, rather than working as a means of "permanent" storage, and in
general one can NULL all the entries in a caching hash with little or no ill
effect).

as a collection of linked lists, which is usually a variant of a "storage
hash".
with chains inside the hash, which is where I often use pseudo-random
stepping.
...


> My brute force tests of near 50 hashing algorithms and variations from
> various sources on the Internet was that there are two good algorithms
> publicy available:
>
> Austin Appleby's MurmurHash2
> Bob Jenkins' hashlittle()
>
> IIRC, Jenkin's hashlittle is from lookup3.c. His hash from lookup2
> performs
> very similarly to it in terms of collisions.
>

yes, ok.

I have not looked much at others' algos, as I have my own variants and
experiences as to which works best in which sort of task.


> The Jenkins' hash generates 10% or so more collisions than Appleby's.
> However, both are very low compared to everything else I tested. For the
> most part, I didn't check to see which hashes failed the various Bob
> Jenkins' hash tests. (Most fail.) Appleby's MurmurHash2 passes them.
> IIRC, the speed of one of them (Jenkins ?) was very sensitive to different
> optimization levels with the GCC compiler. E.g., why is my code with this
> hash fast with -O but slow with -O2?
>
> Another thing to note is Paul Hsieh's highly respected hashes didn't
> perform
> well. You should also note that quite a few respected mathematicians in
> the
> list have come up with absolutely horrid hashes. But, one should remember
> that many of these were created decades ago and weren't intended for the
> data we're throwing at them today.
>

yep.

I think most of my hash functions are derived from one of Donald Knuth's
PRNG's (original idea was derived from the one found in Newlib).

experimentally, I had found it had went both faster and had better collision
behavior than the more common variants I had used prior, which had usually
involved shifts and xors.

for example:
hi=0; while(*s)hi=(hi<<3)^(hi>>7)^(*s++);
and similar...

which had gotten 'pwned' by my discovery of the approach I have been using
since.


> By programatic brute force, I've tested every 4-byte ASCII sequence and a
> large percentage (I'd guess 60%...) or so of the 5-byte ASCII sequences
> (upto 2GB file - DOS limit). I was planning to rewrite the test programs
> so
> I could test the entire 5-byte ASCII sequence range, but I didn't get
> around
> to it. And, I concluded that what I had was probably good enough. Sorry,
> I
> didn't have need of hashes of binary sequences, the preferred test.
> However, I believe the results on the ASCII sequences, since they are
> consecutive binary values over the text range, are representative of the
> hash's performance in regards to collisions.
>

yeah...

mine I had refined before through the use of random collections of longer
strings, and sometimes random data. others had used random chunks from
strings which were "sliced and diced" for testing.
...

ok.

PD is "free for whatever use", FWIW, and using PD code in a non-PD project
does not effectively change its status for the pre-existing code. as for
PD-code embedded in a derived work, this is less certain (say, if one wants
to copy it back out as PD). they may have to instead resort to the
originals.


there are ways to do it, but the key is not to "mix" the code, as in, place
code with one liscense inside source files with another liscense.

so, in general, it is my practice to divide everything up into independent
modules, and then let each module have an overall liscense...


then the issue is, what of the final linked result?...
well, with GPL, the whole thing goes under GPL (this is the "sticky"
property so many dislike of the GPL).
with LGPL, it does not.

with BSD style liscenses, or PD, the combined result can be distributed
under "whatever" liscense, but this does not generally change the code
itself.

or such...

Mike Austin

unread,
Nov 8, 2009, 2:20:46 PM11/8/09
to
BGB / cr88192 wrote:
> "Mike Austin" <mi...@mike-nospam-austin.com> wrote in message
> news:D6qdnQ3NHuiRSGjX...@giganews.com...
>> It's been said that interpreting a message tree will likely be much slower
>> than interpreting bytecode. While developing Impulse, I wanted to see how
>> far I could push message trees, and recent optimization have proven
>> fruitful. In this microbenchmark (I know, I know), it takes under 5
>> seconds to iterate 3,600,000 times while performing a simple coordinate
>> rotation:
>>
>
> don't know what a message tree is, can't find info...

An message tree is just a parse tree for a message based interpreter. For
example "2.pow(n).sin()":

{ 2L, new Message("pow", new Message("n")), new Message("sin") };

So it would be ok to include the CMPH library in my distribution?

> although, granted, you can't freely copy code back and forth either, since
> the LGPL would remain in effect on the copied code.
>
>
> well, hashing is not too difficult to implement, although many of the
> well-known variations (such as those on Wikipedia) are IMO not very good...
>
> my usual algo for string hashing is:
> hi=0;
> while(*s)hi=(hi*4093)+(*s++);
>
> or for integers:
> hi=val*4093;
>
> which can be converted to the index for a hash table via:
> i=(hi>>12)&4095; //for a 12-bit hash.
>
> I also typically step to subsequent indices via a pseudo-random sequence:
> hi=hi*4093;
>
> IME, this usually does fairly well (and it is often both faster and better
> behaved than use of shifts and xor to produce the hash...).

Thanks for the tips. The items that are in my table are pointers, and will
average roughly 10 to 20 entries per table. I've heard it's better to use
randomly generated integers instead, so I could switch to that if it's faster.

Dmitry A. Kazakov

unread,
Nov 8, 2009, 2:37:22 PM11/8/09
to
On Sun, 08 Nov 2009 11:20:46 -0800, Mike Austin wrote:

> BGB / cr88192 wrote:
>> "Mike Austin" <mi...@mike-nospam-austin.com> wrote in message
>> news:D6qdnQ3NHuiRSGjX...@giganews.com...
>>> It's been said that interpreting a message tree will likely be much slower
>>> than interpreting bytecode. While developing Impulse, I wanted to see how
>>> far I could push message trees, and recent optimization have proven
>>> fruitful. In this microbenchmark (I know, I know), it takes under 5
>>> seconds to iterate 3,600,000 times while performing a simple coordinate
>>> rotation:
>>
>> don't know what a message tree is, can't find info...
>
> An message tree is just a parse tree for a message based interpreter. For
> example "2.pow(n).sin()":
>
> { 2L, new Message("pow", new Message("n")), new Message("sin") };

So, why there should be a parse tree and not the Polish reverse? (I hope
that "new" does not mean heap allocation?)

A tree might be very useful for semantic analysis and optimization, but not
as an interpretable code.

If it is an interpreter, then it is not clear why to generate a parse tree
instead of a direct execution on-the-fly, when it is a code generator, then
clearly, tree is a poor code representation format.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Mike Austin

unread,
Nov 8, 2009, 3:19:41 PM11/8/09
to
Rod Pemberton wrote:
> "BGB / cr88192" <cr8...@hotmail.com> wrote in message
> news:hd4n2l$3rv$1...@news.albasani.net...
>> my usual algo for string hashing is:
>> hi=0;
>> while(*s)hi=(hi*4093)+(*s++);
>>
>
> BGB, you've mentioned hashing quite a bit lately. And, you've posted quite
> a few algo's... You either have alot of them or you keep changing in an
> attempt them to improve them.
>
> My brute force tests of near 50 hashing algorithms and variations from
> various sources on the Internet was that there are two good algorithms
> publicy available:
>
> Austin Appleby's MurmurHash2
> Bob Jenkins' hashlittle()

Wow, MurmurHash2 is amazingly simple. I'm not an expert in hashing algorithms,
do you have any recommendations for using it for pointers to array indexes?

std::map<Symbol*, Value> _slots;

where Symbol* is 1,000 to 5,000 pointers mapped to an average of 30 Values per
object.

Wow, TCL hasing is at the bottom? Isn't the entire language reliant on string
hashing? :)

BGB / cr88192

unread,
Nov 8, 2009, 4:20:15 PM11/8/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:1l0imnbrgtpt1.1qcd2ne20nyyj$.dlg@40tude.net...

> On Sun, 08 Nov 2009 11:20:46 -0800, Mike Austin wrote:
>
>> BGB / cr88192 wrote:
>>> "Mike Austin" <mi...@mike-nospam-austin.com> wrote in message
>>> news:D6qdnQ3NHuiRSGjX...@giganews.com...
>>>> It's been said that interpreting a message tree will likely be much
>>>> slower
>>>> than interpreting bytecode. While developing Impulse, I wanted to see
>>>> how
>>>> far I could push message trees, and recent optimization have proven
>>>> fruitful. In this microbenchmark (I know, I know), it takes under 5
>>>> seconds to iterate 3,600,000 times while performing a simple coordinate
>>>> rotation:
>>>
>>> don't know what a message tree is, can't find info...
>>
>> An message tree is just a parse tree for a message based interpreter.
>> For
>> example "2.pow(n).sin()":
>>
>> { 2L, new Message("pow", new Message("n")), new Message("sin") };
>
> So, why there should be a parse tree and not the Polish reverse? (I hope
> that "new" does not mean heap allocation?)
>

hmm...
well, I use RPN, and it works well enough.
it is fairly easy to work with and produce, but does have a downside of
being very ordering-dependent, which has made its fair share of trouble for
me in my C compiler's codegen.

it also doesn't help much that normal functions use right to left ordering,
but many builtins use left to right, and VM's such as the JVM and .NET
default to left to right (and, there is no obvious way to automatically flip
the argument ordering with RPN).

all this has led to much hackery...

I have at times on/off considered redesigning the RPN-based IL as a much
"cleaner" RPN-based language (as in, much closer to PostScript, and
operating at a higher level of abstraction), and at other times, I have
considered abandoning it altogether. thus far, neither has happened...


> A tree might be very useful for semantic analysis and optimization, but
> not
> as an interpretable code.
>

I guess this much depends on other factors, but in general I would be
inclined to agree.
internally, a lot of my stuff uses XML, so directly using the parse trees
for an interpreter would probably be a no-go (unless speed is not really
much of a priority).


> If it is an interpreter, then it is not clear why to generate a parse tree
> instead of a direct execution on-the-fly, when it is a code generator,
> then
> clearly, tree is a poor code representation format.
>

errm, direct textual interpretation I would think would have its own
problems, such as not being able to deal effectively with looping, ... OTOH,
direct parse trees would be a little faster, could deal more readily with a
lot more constructs, ... but, as a down side, are still not all that fast...

Mike Austin

unread,
Nov 8, 2009, 5:33:31 PM11/8/09
to
Your message didn't seem to make it to my newsgroup provider, so I'm
using Google groups to reply.

On Nov 8, 11:37 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:


> On Sun, 08 Nov 2009 11:20:46 -0800, Mike Austin wrote:
> > BGB / cr88192 wrote:

> >> "Mike Austin" <m...@mike-nospam-austin.com> wrote in message


> >>news:D6qdnQ3NHuiRSGjX...@giganews.com...
> >>> It's been said that interpreting a message tree will likely be much slower
> >>> than interpreting bytecode.  While developing Impulse, I wanted to see how
> >>> far I could push message trees, and recent optimization have proven
> >>> fruitful.  In this microbenchmark (I know, I know), it takes under 5
> >>> seconds to iterate 3,600,000 times while performing a simple coordinate
> >>> rotation:
>
> >> don't know what a message tree is, can't find info...
>
> > An message tree is just a parse tree for a message based interpreter.  For
> > example "2.pow(n).sin()":
>
> > { 2L, new Message("pow", new Message("n")), new Message("sin") };
>
> So, why there should be a parse tree and not the Polish reverse? (I hope
> that "new" does not mean heap allocation?)

I guess you could say it's a bit of both - the tree comes into place
with message arguments, but an expression with no arguments is similar
to reverse Polish. There is heap allocation in the parser, but not in
the interpreter. Maybe "virtual machine" is a better terminology.

> A tree might be very useful for semantic analysis and optimization, but not
> as an interpretable code.
>
> If it is an interpreter, then it is not clear why to generate a parse tree
> instead of a direct execution on-the-fly, when it is a code generator, then
> clearly, tree is a poor code representation format.

When I say a parse tree, I mean a very specialized tree where each
"Value" is basically an opcode, which either returns a literal or
invokes "eval()". If the value is a message, it calls "send
()" (invoke method) on the receiver to send the message, looks for it
and executes it. In pseudo code:

Value code[] = { Value(2L), Value(new Message("sin")) };

for (ip in code) {
value = ip.eval(value);
}

There is some abstraction going on, but in the end it's really just a
type of function dispatching. And with that, so far I say it's fast
enough to warrant not implementing a bytecode virtual machine with
it's own semantics and other issues. It also has benefits when
converting to machine code laster on:

http://central.kaserver5.org/Kasoft/Typeset/JavaTree/Pt00.html

Mike

> --
> Regards,
> Dmitry A. Kazakovhttp://www.dmitry-kazakov.de

Rod Pemberton

unread,
Nov 9, 2009, 1:38:02 AM11/9/09
to
"Mike Austin" <mi...@mike-nospam-austin.com> wrote in message
news:Jb6dnaOW6s5DumrX...@giganews.com...

> Rod Pemberton wrote:
> >
> > My brute force tests of near 50 hashing algorithms and variations from
> > various sources on the Internet was that there are two good algorithms
> > publicy available:
> >
> > Austin Appleby's MurmurHash2
> > Bob Jenkins' hashlittle()
>
> Wow, MurmurHash2 is amazingly simple. I'm not an expert in hashing
algorithms,
> do you have any recommendations for using it for pointers to array
indexes?
>

Sorry, the extent of my use is was my personal testing.

> [...]


>
> Wow, TCL hasing is at the bottom?

No. As stated, the list was sorted alphabetically for the post. I.e., the
ordering contains no useful information other than names of authors. This
may be useful in locating some of them.

I can't tell you TCL hash's overall rank in terms of collisions - since I
don't know. I performed a speed test followed by a series of smaller
collision tests to eliminate poorer algorithms. I didn't save results for
these non-performers since my goal was the results of the larger brute force
test on good algorithms.

I can tell you TCL hashing's rank in terms of time: 13th out of 45.
MurmurHash2 ranked 7th. hashlittle() ranked 2nd. (IMO, the 1st place
result should be discarded...) So, if you need a little more speed:
Jenkins. If you need fewer collisions: Appleby.


Rod Pemberton


Robbert Haarman

unread,
Nov 9, 2009, 2:37:44 AM11/9/09
to newsgroup
Hi Rod,

Your investigation of hash algorithms sounds really interesting.

Would you consider putting up a webpage with your results and the
algorithms you've tested?

Kind regards,

Bob

Dmitry A. Kazakov

unread,
Nov 9, 2009, 4:13:01 AM11/9/09
to
On Sun, 8 Nov 2009 14:33:31 -0800 (PST), Mike Austin wrote:

> Your message didn't seem to make it to my newsgroup provider, so I'm
> using Google groups to reply.

Strange, you are not the first one who told me that. Maybe some pieces of
the Berlin wall are still there (I am posting from Germany (:-))

> On Nov 8, 11:37�am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> On Sun, 08 Nov 2009 11:20:46 -0800, Mike Austin wrote:
>>> BGB / cr88192 wrote:
>>>> "Mike Austin" <m...@mike-nospam-austin.com> wrote in message
>>>>news:D6qdnQ3NHuiRSGjX...@giganews.com...
>>>>> It's been said that interpreting a message tree will likely be much slower
>>>>> than interpreting bytecode. �While developing Impulse, I wanted to see how
>>>>> far I could push message trees, and recent optimization have proven
>>>>> fruitful. �In this microbenchmark (I know, I know), it takes under 5
>>>>> seconds to iterate 3,600,000 times while performing a simple coordinate
>>>>> rotation:
>>
>>>> don't know what a message tree is, can't find info...
>>
>>> An message tree is just a parse tree for a message based interpreter. �For
>>> example "2.pow(n).sin()":
>>
>>> { 2L, new Message("pow", new Message("n")), new Message("sin") };
>>
>> So, why there should be a parse tree and not the Polish reverse? (I hope
>> that "new" does not mean heap allocation?)
>
> I guess you could say it's a bit of both - the tree comes into place
> with message arguments, but an expression with no arguments is similar
> to reverse Polish.

In Polish expression arguments are on the "stack". The advantage of it is
that you need not to see the operation before its arguments evaluation,
e.g. you save one stack or, equivalently, one pass upon evaluation.

BTW, there is a case where that is rather a disadvantage, I mean the
short-circuit operations.

> There is heap allocation in the parser, but not in
> the interpreter.

But you don't need heap allocation in the parser, maybe later, when (and
if) the tree is dealt with at the semantic analysis stage. Before that the
arguments and nodes are allocated in LIFO order (e.g. stack).

When I use parse tree, I still allocate the nodes of the tree in an arena
pool organized LIFO.

>> A tree might be very useful for semantic analysis and optimization, but not
>> as an interpretable code.
>>
>> If it is an interpreter, then it is not clear why to generate a parse tree
>> instead of a direct execution on-the-fly, when it is a code generator, then
>> clearly, tree is a poor code representation format.
>
> When I say a parse tree, I mean a very specialized tree where each
> "Value" is basically an opcode, which either returns a literal or
> invokes "eval()". If the value is a message, it calls "send
> ()" (invoke method) on the receiver to send the message, looks for it
> and executes it. In pseudo code:
>
> Value code[] = { Value(2L), Value(new Message("sin")) };
>
> for (ip in code) {
> value = ip.eval(value);
> }
>
> There is some abstraction going on, but in the end it's really just a
> type of function dispatching. And with that, so far I say it's fast
> enough to warrant not implementing a bytecode virtual machine with
> it's own semantics and other issues. It also has benefits when
> converting to machine code laster on:

It is of less importance how we would name it. What really matters is
whether it is "prefix" or "suffix". In order to evaluate a "prefix
notation" in a stack-oriented environment one needs an extra overhead of
[implicit] converting it into "suffix". This is why I was wondering. During
parsing there is a certain point where you can choose, which path you go.
You can generate either a "prefix" or a "suffix" thing at *no* extra cost
for either. So choosing a "prefix" way needs a justification, e.g. a need
in doing sematic analysis later. For an interpreter case this is
unjustified.

bartc

unread,
Nov 9, 2009, 7:03:55 PM11/9/09
to

"Mike Austin" <mi...@mike-nospam-austin.com> wrote in message
news:D6qdnQ3NHuiRSGjX...@giganews.com...

> It's been said that interpreting a message tree will likely be much slower
> than interpreting bytecode. While developing Impulse, I wanted to see how
> far I could push message trees, and recent optimization have proven
> fruitful. In this microbenchmark (I know, I know), it takes under 5
> seconds to iterate 3,600,000 times while performing a simple coordinate
> rotation:
>
> rotations = 360 * 10000
>
> x0 = 10
> y0 = 0
>
> (1..rotations) each: a ->
> a = a * (PI / 180)
> x1 = (x0 * a cos) - (y0 * a sin)
> y1 = (x0 * a sin) + (y0 * a cos)
> end
>
> /impulse-0.5.3/bench$ time impulse rotate.im
> real 0m4.893s
> user 0m4.832s
> sys 0m0.012s
>
> So I'm happy :) This is on par with Ruby 1.9 and faster than Python 3,
> granted

Being faster than Ruby or Python is not difficult...

However I think this test might be dominated by the time taken for the sin
and cos functions (especially if they are each executed twice, if that
optimisation is not done, and maybe if they have trouble dealing with very
large angles...)

I don't know how fast your machine is, but on mine Python took some 9
seconds, while a C version using gcc took between 1 and 2 seconds
(optimised/not optimised).

I think the fastest possible speed must be with some form of bytecode (if
you stay with interpretation), but if you want to use parse trees, I don't
think it will slow things down so much, (maybe 2-4 times?), but the
performance depends on lots of other factors.

> there are some optimization which I'll have to evaluate in a larger scale
> program. Some of the optimizations include minimal dynamic allocation,
> specialized math messages and inlining all core functions. Also, message
> chains, such as "a sin abs", are arrays instead of trees.

> The only thing left I can optimize is the lookup, which currently uses a
> std::map.

What's the lookup for?

--
Bartc


Mike Austin

unread,
Nov 9, 2009, 10:32:23 PM11/9/09
to
On Nov 9, 4:03 pm, "bartc" <ba...@freeuk.com> wrote:
> "Mike Austin" <m...@mike-nospam-austin.com> wrote in message

>
> news:D6qdnQ3NHuiRSGjX...@giganews.com...
>
> > It's been said that interpreting a message tree will likely be much slower
> > than interpreting bytecode.  While developing Impulse, I wanted to see how
> > far I could push message trees, and recent optimization have proven
> > fruitful.  In this microbenchmark (I know, I know), it takes under 5
> > seconds to iterate 3,600,000 times while performing a simple coordinate
> > rotation:
>
> > rotations = 360 * 10000
>
> > x0  = 10
> > y0  = 0
>
> > (1..rotations) each: a ->
> >   a = a * (PI / 180)
> >   x1 = (x0 * a cos) - (y0 * a sin)
> >   y1 = (x0 * a sin) + (y0 * a cos)
> > end
>
> > /impulse-0.5.3/bench$ time impulse rotate.im
> > real 0m4.893s
> > user 0m4.832s
> > sys 0m0.012s
>
> > So I'm happy :) This is on par with Ruby 1.9 and faster than Python 3,
> > granted
>
> Being faster than Ruby or Python is not difficult...

I never said it was :) But I am happy being just as fast as those
those languages right now, without having to resort to bytecode.

> However I think this test might be dominated by the time taken for the sin
> and cos functions (especially if they are each executed twice, if that
> optimisation is not done, and maybe if they have trouble dealing with very
> large angles...)

Very true, the benchmark would be more accurate in terms of raw speed
if it didn't call expensive native functions. I'll try it with sine
and cosine tables instead.

> I don't know how fast your machine is, but on mine Python took some 9
> seconds, while a C version using gcc took between 1 and 2 seconds
> (optimised/not optimised).
>
> I think the fastest possible speed must be with some form of bytecode (if
> you stay with interpretation), but if you want to use parse trees, I don't
> think it will slow things down so much, (maybe 2-4 times?), but the
> performance depends on lots of other factors.

I'm not sure, with a highly dynamic language. In a typed language
it's possible to do much more optimization. Even if a tiny bit
slower, I like the idea of keeping the semantics of the language in
the compiled code. The language Io, and LISP, or course, have the same
feature.

> > there are some optimization which I'll have to evaluate in a larger scale
> > program.  Some of the optimizations include minimal dynamic allocation,
> > specialized math messages and inlining all core functions.  Also, message
> > chains, such as "a sin abs", are arrays instead of trees.
> > The only thing left I can optimize is the lookup, which currently uses a
> > std::map.
>
> What's the lookup for?

For looking up methods, of course :) If I write "10 foo", it must
look up "foo", then evaluate it.

> --
> Bartc

Mike Austin

unread,
Nov 10, 2009, 12:50:16 AM11/10/09
to
Dmitry A. Kazakov wrote:
> On Sun, 8 Nov 2009 14:33:31 -0800 (PST), Mike Austin wrote:
>
>> Your message didn't seem to make it to my newsgroup provider, so I'm
>> using Google groups to reply.
>
> Strange, you are not the first one who told me that. Maybe some pieces of
> the Berlin wall are still there (I am posting from Germany (:-))

It did eventually arrive, so next time I'll just wait a little longer :)

>> On Nov 8, 11:37 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
>> wrote:
>>> On Sun, 08 Nov 2009 11:20:46 -0800, Mike Austin wrote:
>>>> BGB / cr88192 wrote:
>>>>> "Mike Austin" <m...@mike-nospam-austin.com> wrote in message
>>>>> news:D6qdnQ3NHuiRSGjX...@giganews.com...
>>>>>> It's been said that interpreting a message tree will likely be much slower
>>>>>> than interpreting bytecode. While developing Impulse, I wanted to see how
>>>>>> far I could push message trees, and recent optimization have proven
>>>>>> fruitful. In this microbenchmark (I know, I know), it takes under 5
>>>>>> seconds to iterate 3,600,000 times while performing a simple coordinate
>>>>>> rotation:
>>>>> don't know what a message tree is, can't find info...
>>>> An message tree is just a parse tree for a message based interpreter. For
>>>> example "2.pow(n).sin()":
>>>> { 2L, new Message("pow", new Message("n")), new Message("sin") };
>>> So, why there should be a parse tree and not the Polish reverse? (I hope
>>> that "new" does not mean heap allocation?)
>> I guess you could say it's a bit of both - the tree comes into place
>> with message arguments, but an expression with no arguments is similar
>> to reverse Polish.
>
> In Polish expression arguments are on the "stack". The advantage of it is
> that you need not to see the operation before its arguments evaluation,
> e.g. you save one stack or, equivalently, one pass upon evaluation.

I do like the simplicity of stack based languages, but they have the
disadvantage you point out. Impulse, if I didn't mention, is a an object-based
language. So expressions with no arguments, such as "10 sin abs", resemble a
stack based language, "10.sin().abs()" in other similar "oo" languages.

> BTW, there is a case where that is rather a disadvantage, I mean the
> short-circuit operations.
>
>> There is heap allocation in the parser, but not in
>> the interpreter.
>
> But you don't need heap allocation in the parser, maybe later, when (and
> if) the tree is dealt with at the semantic analysis stage. Before that the
> arguments and nodes are allocated in LIFO order (e.g. stack).
>
> When I use parse tree, I still allocate the nodes of the tree in an arena
> pool organized LIFO.

Thats a good idea. Do you do this simply to avoid keeping track of dynamic
allocations, or is there another reason? I assume this helps a little with
locality of reference also.

Impulse is object based, so the syntax is similar to
"receiver.method(arg1, arg2, ...)" as in other languages. You might call this
infix, although I don't know if that still holds with multiple arguments. In Ruby:

(1..5).zip(1..5)

I'm not fond of the way Ruby implements zip - one list as the receiver, and one
as an argument. So instead, I chose to pass multiple objects as the receiver,
in a list:

[1..5, 1..5] zip: a, b -> a + b

This would return [2, 4, 6, 8 10]

Mike

Dmitry A. Kazakov

unread,
Nov 10, 2009, 4:14:37 AM11/10/09
to
On Mon, 09 Nov 2009 21:50:16 -0800, Mike Austin wrote:

> Dmitry A. Kazakov wrote:

>> But you don't need heap allocation in the parser, maybe later, when (and
>> if) the tree is dealt with at the semantic analysis stage. Before that the
>> arguments and nodes are allocated in LIFO order (e.g. stack).
>>
>> When I use parse tree, I still allocate the nodes of the tree in an arena
>> pool organized LIFO.
>
> Thats a good idea. Do you do this simply to avoid keeping track of dynamic
> allocations, or is there another reason?

Basically only for performance reasons. Since it is for free, why not to
take this advantage? The language that I am using allows user-defined
pools, so it its really free, even syntactically. I write "new" as if it
were a heap, but because the target pointer has a pool-specific type, it
goes into my arena pool organized as a segmented stack.

> I assume this helps a little with
> locality of reference also.

Yes, and because it is statically typed, you cannot mistakenly point
somewhere outside the pool.

>> It is of less importance how we would name it. What really matters is
>> whether it is "prefix" or "suffix". In order to evaluate a "prefix
>> notation" in a stack-oriented environment one needs an extra overhead of
>> [implicit] converting it into "suffix". This is why I was wondering. During
>> parsing there is a certain point where you can choose, which path you go.
>> You can generate either a "prefix" or a "suffix" thing at *no* extra cost
>> for either. So choosing a "prefix" way needs a justification, e.g. a need
>> in doing sematic analysis later. For an interpreter case this is
>> unjustified.
>
> Impulse is object based, so the syntax is similar to
> "receiver.method(arg1, arg2, ...)" as in other languages.

I didn't mean the language syntax, so I put prefix and suffix in quotation
marks. The source language can be prefix, infix or postfix, whatever. That
is unrelated to the target / intermediate code. The parse tree is a sort of
that code, in some language different from the source language. That
language is "prefix". If you have a stack machine to interpret code in this
language, you need an extra overhead compared to a "suffix" intermediate
language.

> You might call this
> infix, although I don't know if that still holds with multiple arguments. In Ruby:
>
> (1..5).zip(1..5)

But note that prefix, postfix, infix are syntax terms. Receiver (or else a
controlling argument) is a semantic term. You can have it in any context.
Some languages tie certain syntax flavors to certain semantic meanings. In
my view, that is a bad idea. I prefer a language design where a subprogram
can be associated with any of the three major syntactic form of a call:

* operational, prefix: op a, infix: a op b, postfix a op
* functional: op(a,b,c)
* indexing: a(b,c)

I don't want the syntax to determine whether a,b or c is controlling. Maybe
they are all controlled, e.g. in multiple dispatch.

Talking about the language design, the idea of an object "receiving
objects" is all wrong to me. Object receives nothing. It is passed to an
operation of along with some other objects. It is not even always specified
whether this passing is by reference and not, say, copy out, apply, copy
back (by value). It is rubbish to pretend that "1 receives 1" in 1+1.

bartc

unread,
Nov 10, 2009, 5:49:31 AM11/10/09
to
Mike Austin wrote:
> On Nov 9, 4:03 pm, "bartc" <ba...@freeuk.com> wrote:
>> "Mike Austin" <m...@mike-nospam-austin.com> wrote in message
>> news:D6qdnQ3NHuiRSGjX...@giganews.com...

>>> there are some optimization which I'll have to evaluate in a larger
>>> scale program. Some of the optimizations include minimal dynamic
>>> allocation, specialized math messages and inlining all core
>>> functions. Also, message chains, such as "a sin abs", are arrays
>>> instead of trees.
>>> The only thing left I can optimize is the lookup, which currently
>>> uses a std::map.
>>
>> What's the lookup for?
>
> For looking up methods, of course :) If I write "10 foo", it must
> look up "foo", then evaluate it.

Oh, OK. Although I would have expected most names to have been resolved
during a compilation stage (if you call the source code to parse tree
translator a compiler).

Only names such as foo.bar would need to be resolved at runtime, that is, if
there is more one than one .bar and the type of foo is not known until then.

But maybe your language works a little differently, eg. the foo in "10 foo"
is different from the foo in "AB" foo.

--
Bartc

Mike Austin

unread,
Nov 10, 2009, 5:08:50 PM11/10/09
to
On Nov 10, 1:14 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> On Mon, 09 Nov 2009 21:50:16 -0800, Mike Austin wrote:
> > Dmitry A. Kazakov wrote:
> >> But you don't need heap allocation in the parser, maybe later, when (and
> >> if) the tree is dealt with at the semantic analysis stage. Before that the
> >> arguments and nodes are allocated in LIFO order (e.g. stack).
>
> >> When I use parse tree, I still allocate the nodes of the tree in an arena
> >> pool organized LIFO.
>
> > Thats a good idea.  Do you do this simply to avoid keeping track of dynamic
> > allocations, or is there another reason?
>
> Basically only for performance reasons. Since it is for free, why not to
> take this advantage? The language that I am using allows user-defined
> pools, so it its really free, even syntactically. I write "new" as if it
> were a heap, but because the target pointer has a pool-specific type, it
> goes into my arena pool organized as a segmented stack.

Can you say which language? I know this can be done in C++ with
placement operator new(), eg. "new (pool) Message()", and Objective-C
support resource pools.

> > I assume this helps a little with
> > locality of reference also.
>
> Yes, and because it is statically typed, you cannot mistakenly point
> somewhere outside the pool.
>
> >> It is of less importance how we would name it. What really matters is
> >> whether it is "prefix" or "suffix". In order to evaluate a "prefix
> >> notation" in a stack-oriented environment one needs an extra overhead of
> >> [implicit] converting it into "suffix". This is why I was wondering. During
> >> parsing there is a certain point where you can choose, which path you go.
> >> You can generate either a "prefix" or a "suffix" thing at *no* extra cost
> >> for either. So choosing a "prefix" way needs a justification, e.g. a need
> >> in doing sematic analysis later. For an interpreter case this is
> >> unjustified.
>
> > Impulse is object based, so the syntax is similar to
> > "receiver.method(arg1, arg2, ...)" as in other languages.
>
> I didn't mean the language syntax, so I put prefix and suffix in quotation
> marks. The source language can be prefix, infix or postfix, whatever. That
> is unrelated to the target / intermediate code. The parse tree is a sort of
> that code, in some language different from the source language. That
> language is "prefix". If you have a stack machine to interpret code in this
> language, you need an extra overhead compared to a "suffix" intermediate
> language.

Then I would say the interpreter uses "infix" internally.
"2 pow: 8" is literally interpreted as:

evaluate "2", which returns 2, then evaluate "pow:" with argument "8"
in the context of the previous value.

> > You might call this
> > infix, although I don't know if that still holds with multiple arguments.  In Ruby:
>
> > (1..5).zip(1..5)
>
> But note that prefix, postfix, infix are syntax terms. Receiver (or else a
> controlling argument) is a semantic term. You can have it in any context.
> Some languages tie certain syntax flavors to certain semantic meanings. In
> my view, that is a bad idea. I prefer a language design where a subprogram
> can be associated with any of the three major syntactic form of a call:
>
> * operational, prefix: op a, infix: a op b, postfix a op
> * functional: op(a,b,c)
> * indexing: a(b,c)
>
> I don't want the syntax to determine whether a,b or c is controlling. Maybe
> they are all controlled, e.g. in multiple dispatch.

Dylan is one of my favorite languages, although I don't use it much.
It has multi-methods and can simulate object oriented syntax very
easily. "receiver" in this case can be on the left of the expression,
or passed as an argument:

time-of-day.total-seconds;

is syntactic sugar (or salt if you dislike it) for:

total-seconds(time-of-day);

I just have a preference towards the dot notation syntax because of
other languages I've used. I like postfix and infix notation because
it reads left to right:

("abc" split) join: ","

vs.

join(split("abc"), ",")

But that's just me. I don't expect everyone to like this syntax, and
there are good reasons to use prefix notation.

> Talking about the language design, the idea of an object "receiving
> objects" is all wrong to me. Object receives nothing. It is passed to an
> operation of along with some other objects. It is not even always specified
> whether this passing is by reference and not, say, copy out, apply, copy
> back (by value). It is rubbish to pretend that "1 receives 1" in 1+1.

Objects receive messages, such as "foo", which may or may not contain
arguments. In "1 + 1", "1" receives the message "+" with the argument
"1" - it's just infix syntax (an semantics). This is how Smalltalk
works, along with many languages derived from it.

I appreciate your feedback! Thanks.

Mike Austin

unread,
Nov 10, 2009, 5:33:42 PM11/10/09
to

As in in Smalltalk, Python, Ruby, Lua, Io, Groovy, etc., the language
is highly dynamic and does not bind methods at compile time. There
are a few optimizations under the hood, of course. I chose this
because of the tasks I want to use it for - live user interfaces,
games, simulations, possibly distributed applications. Fast math can
be done with a vector library, so I'm not concerned about computation
speed.

Mike

> --
> Bartc

Dmitry A. Kazakov

unread,
Nov 11, 2009, 4:17:28 AM11/11/09
to
On Tue, 10 Nov 2009 14:08:50 -0800 (PST), Mike Austin wrote:

> On Nov 10, 1:14�am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> On Mon, 09 Nov 2009 21:50:16 -0800, Mike Austin wrote:
>>> Dmitry A. Kazakov wrote:
>>>> But you don't need heap allocation in the parser, maybe later, when (and
>>>> if) the tree is dealt with at the semantic analysis stage. Before that the
>>>> arguments and nodes are allocated in LIFO order (e.g. stack).
>>
>>>> When I use parse tree, I still allocate the nodes of the tree in an arena
>>>> pool organized LIFO.
>>
>>> Thats a good idea. �Do you do this simply to avoid keeping track of dynamic
>>> allocations, or is there another reason?
>>
>> Basically only for performance reasons. Since it is for free, why not to
>> take this advantage? The language that I am using allows user-defined
>> pools, so it its really free, even syntactically. I write "new" as if it
>> were a heap, but because the target pointer has a pool-specific type, it
>> goes into my arena pool organized as a segmented stack.
>
> Can you say which language? I know this can be done in C++ with
> placement operator new(), eg. "new (pool) Message()", and Objective-C
> support resource pools.

It is Ada. Differently to C++ in Ada pointers are of named types. So the
pool is associated not with the object's type, which can be allocated
anywhere, but with the type of a pointer to the object. Therefore you don't
need to specify the pool when you do "new". It is the expected pointer type
to determine it. Another advantage of properly typed pointers is that you
could for example, declare comparison operations for different pointer
types of the same target type, e.g. pointers to sort by name, pointers to
sort by ID, etc.

>> I didn't mean the language syntax, so I put prefix and suffix in quotation
>> marks. The source language can be prefix, infix or postfix, whatever. That
>> is unrelated to the target / intermediate code. The parse tree is a sort of
>> that code, in some language different from the source language. That
>> language is "prefix". If you have a stack machine to interpret code in this
>> language, you need an extra overhead compared to a "suffix" intermediate
>> language.
>
> Then I would say the interpreter uses "infix" internally.
> "2 pow: 8" is literally interpreted as:
>
> evaluate "2", which returns 2, then evaluate "pow:" with argument "8"
> in the context of the previous value.

Let you have a tree:

| <-- we are here
pow
/ \
2 8

then it is "prefix", if you hold to the tree root. The root is what you see
first. Tree traversal is equivalent to a conversion of this "prefix"
notation into "suffix". But to evaluate a tree, you need more than that.
You need two passes first up to the leaves, then down to the root carrying
the intermediate results with. With a "suffix" notation you need only one
pass.

If I wanted to use tree as a code, I would add some additional horizontal
links across the tree and backward links down the tree, in order to skip
the first pass:

|
pow
/ \ \
-> 2 --- 8

(that wouldn't be a tree anymore)

>> Talking about the language design, the idea of an object "receiving
>> objects" is all wrong to me. Object receives nothing. It is passed to an
>> operation of along with some other objects. It is not even always specified
>> whether this passing is by reference and not, say, copy out, apply, copy
>> back (by value). It is rubbish to pretend that "1 receives 1" in 1+1.
>
> Objects receive messages, such as "foo", which may or may not contain
> arguments. In "1 + 1", "1" receives the message "+" with the argument
> "1" - it's just infix syntax (an semantics).

The semantics of + is given in mathematics. Mathematically + is not defined
on integer, it is on a Cartesian product of integers. In the traditional
mathematical notation

+ : integer x integer --> integer

+ operates a tuple of integers, it is not one integer. It is a tuple to
"receive" anything (but in effect nothing). So if you wanted to turn it
upside down, you could say that (1,1) "received" +. However that is still
wrong, because "receive" presumes statefulness, while tuples are stateless.

> This is how Smalltalk
> works, along with many languages derived from it.

They did it wrong. The reason for that fault was, that they mistakenly
separated types of objects (OO) from the types of arguments (non-OO). That
makes the language impure: there are objects and not-so-objects...

Mike Austin

unread,
Nov 11, 2009, 7:48:27 PM11/11/09
to
On Nov 11, 1:17 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>

One of these days I want to learn ADA. Do you recommend GNAT, and
what dialect is better to start with?

This is how it is stored and interpreted:

2
|
pow:
\
8

> >> Talking about the language design, the idea of an object "receiving
> >> objects" is all wrong to me. Object receives nothing. It is passed to an
> >> operation of along with some other objects. It is not even always specified
> >> whether this passing is by reference and not, say, copy out, apply, copy
> >> back (by value). It is rubbish to pretend that "1 receives 1" in 1+1.
>
> > Objects receive messages, such as "foo", which may or may not contain
> > arguments.  In "1 + 1", "1" receives the message "+" with the argument
> > "1" - it's just infix syntax (an semantics).
>
> The semantics of + is given in mathematics. Mathematically + is not defined
> on integer, it is on a Cartesian product of integers. In the traditional
> mathematical notation
>
>     + : integer x integer --> integer
>
> + operates a tuple of integers, it is not one integer. It is a tuple to
> "receive" anything (but in effect nothing). So if you wanted to turn it
> upside down, you could say that (1,1) "received" +. However that is still
> wrong, because "receive" presumes statefulness, while tuples are stateless.

In strict mathematical terms, yes it's wrong. But it's not a (pure)
mathematical language. Smalltalk is even more wrong in that it
doesn't have operator precedence, but I forgive it. Its focus is not
on pure mathematics.

> > This is how Smalltalk
> > works, along with many languages derived from it.
>
> They did it wrong. The reason for that fault was, that they mistakenly
> separated types of objects (OO) from the types of arguments (non-OO). That
> makes the language impure: there are objects and not-so-objects...

You mean because not all arguments are used for dispatching? I may
have mentioned it, but Dylan is another favorite language of mine,
which does support multi-methods. There was a language called Slate
that added multi-methods to the Smalltalk syntax, but I disliked it
because the syntax got a little hairy.

Dmitry A. Kazakov

unread,
Nov 12, 2009, 3:57:45 AM11/12/09
to
On Wed, 11 Nov 2009 16:48:27 -0800 (PST), Mike Austin wrote:

> On Nov 11, 1:17�am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:

>> [33 quoted lines suppressed]


>
> One of these days I want to learn ADA. Do you recommend GNAT, and
> what dialect is better to start with?

GNAT is good. There is no Ada dialects. There are three language revisions
(Ada is an ISO standard) customary called Ada 83, Ada 95, Ada 2005.

>> [20 quoted lines suppressed]


>
> In strict mathematical terms, yes it's wrong. But it's not a (pure)
> mathematical language. Smalltalk is even more wrong in that it
> doesn't have operator precedence, but I forgive it. Its focus is not
> on pure mathematics.

I think it is a misconception to believe that a language can be
mathematically impure. In which sense that could happen? Does it contradict
to anything in mathematics, like 2+2=5? Surely it does not. Then what is
the point? I always laughed at the claims that an OO circle is not an
ellipse.

Further the concepts of type and class existed in mathematics long before
Simula and OO. There is nothing new in OO. There is also nothing there that
could or should contradict to mathematics. Beyond some sloppy wording (like
"receiving messages") and some poor hobby philosophy (about "everything is
object"), OO can be done perfectly mathematical.

>> [6 quoted lines suppressed]


>
> You mean because not all arguments are used for dispatching? I may
> have mentioned it, but Dylan is another favorite language of mine,
> which does support multi-methods. There was a language called Slate
> that added multi-methods to the Smalltalk syntax, but I disliked it
> because the syntax got a little hairy.

I think that support of multiple dispatch in the context of strongly typing
is one of the most interesting problems, yet not solved.

llothar

unread,
Nov 18, 2009, 9:43:55 AM11/18/09
to
On 7 Nov., 21:14, Mike Austin <m...@mike-nospam-austin.com> wrote:
> It's been said that interpreting a message tree will likely be much slower than
> interpreting bytecode.  While developing Impulse, I wanted to see how far I

Be care with this microbenchmarks.

One of the problems with tree evaluation is that the memory density is
pretty low.
This means you run into L1 and L2 cache misses often. And they are
expensive. A L1
cache miss is about 30 cycles or 60 operations you can do in the
meantime and an
L2 hit is even close to 200 cycles.

Optimize for density and prefetching. With byte code you always have a
full cache line
of code prefetched. With memory trees you have a few more random
access to find the
child nodes. Bytecode + argument stack are much better here.

Mike Austin

unread,
Nov 18, 2009, 8:52:23 PM11/18/09
to
llothar wrote:
> On 7 Nov., 21:14, Mike Austin <m...@mike-nospam-austin.com> wrote:
>> It's been said that interpreting a message tree will likely be much slower than
>> interpreting bytecode. While developing Impulse, I wanted to see how far I
>
> Be care with this microbenchmarks.
>
> One of the problems with tree evaluation is that the memory density is
> pretty low.
> This means you run into L1 and L2 cache misses often. And they are
> expensive. A L1
> cache miss is about 30 cycles or 60 operations you can do in the
> meantime and an
> L2 hit is even close to 200 cycles.

If I implement Dmitry's suggestion of allocating messages from a pool, it
should help, but you're right - it will never be as small as bytecode. The
smallest node right now is 16 bytes in a 64 bit app.

> Optimize for density and prefetching. With byte code you always have a
> full cache line
> of code prefetched. With memory trees you have a few more random
> access to find the
> child nodes. Bytecode + argument stack are much better here.

Maybe I'll experiment with a hybrid until I can benchmark larger programs -
something that will keep the size down and the messages closer. Thanks for the
feedback, it's good to know.

Mike

0 new messages