Web Images Videos Maps News Shopping Gmail more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Message from discussion Writing a Compiler: Lisp or Scheme or Haskell?
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Kaz Kylheku  
View profile  
 More options May 28, 1:01 pm
Newsgroups: comp.lang.lisp
From: Kaz Kylheku <kkylh...@gmail.com>
Date: Thu, 28 May 2009 17:01:09 +0000 (UTC)
Local: Thurs, May 28 2009 1:01 pm
Subject: Re: Writing a Compiler: Lisp or Scheme or Haskell?
On 2009-05-27, Nicolas Neuss <lastn...@math.uni-karlsruhe.de> wrote:

> Isaac Gouy <igo...@yahoo.com> writes:

>> [...] Is the Lisp community now so small

> Yes.

>> that there are no workaday programmers capable of doing a good job on
>> trivial programs - just newbies and language implementors?

> No.  But unfortunately, it is sufficiently small that IMO neither newbies
> nor capable programmers should waste time on a badly designed and moderated
> benchmark game.

What is so badly designed about it? I've looked at some of these programs
and come to the conclusion that the benchmark is fair.

For instance if we look at the knucleotide, where SBCL has its ass kicked
by the likes of Lua, several things are obvious.

Firstly, everyone is using a crappy algorithm to solve the problem.  This must
be consequence of the shootout rules. Now maybe that is not realistic of the
real world, but contest must have rules so that some things are held equal in
order that we may compare other things.

In the knucleotide benchmark, the task is to analyze a long gene by looking
for nucleotide subsequences. The program must calculate the frequencies
of the individual nucleotides, and all 16 possible digraphs, and count
the occurences of various longer subsequences.

Now this could obviously be done by a state machine, custom tailored
to the set of subsequences, making a single pass over the data.

But the way these programs are doing it is building associative maps of
subsequences, and then querying the maps.

So this is a benchmark of the programming language's string manipulation
capabilities: extracting substrings and associative mapping where where the
keys are strings made up of the letters ACGT.

The Lua program is straightforward code. It doesn't concern itself with
details like what kind of hash table to construct.  It just uses the awk-like
subscripting syntax. Yet it beats the SBCL code which has been tweaked with
ugly declarations all over the place, and a custom sxhash and equality function
for gene sequences.

There is clearly something wrong there.

On the other hand, I don't see anything on the Lua website that even breathes a
hint at the possibility that Lua character strings might support international
text via wide characters. I suspect they are 8 bit characters.  The
lua-users.org wiki connfirms thsi:

  http://lua-users.org/wiki/LuaUnicode

  ``A Lua string is an aribitrary sequence of values which have at least 8 bits
  (octets); they map directly into the char type of the C compiler. ''

So that right there cuts memory consumption and bandwidth in half. A string
that occupies two cache lines in the Lisp program using 16 bit strings might
fit into one cache line under Lua, and so is hashed about twice as fast, when
memory bandwidth is the bottleneck! The programs iterate repeatedly over a
large array of data, which is too large for the L1 cache, and which takes up
twice the space in the Lisp program.

But faster strings are not a substitute when you need international strings.

Also, is it a big deal if the default string hashing function doesn't work so
well over strings of 16 bit characters which have only two bits of entropy per
character?

Coming up with hashing functions that give a good distribution for a wide
range of inputs, /and/ which are fast, is not easy.

Should the SBCL guys drop everything and fix the hashing function so that
it improves the knucleotide benchmark? What if something runs slower
because of it?

C++ trounced everything on this benchmark. So let's all use C++!  Currently,
the result of the C implementation of the benchmark is that it doesn't build
due to compile errors.  Clearly, C is difficult to use and unreliable, so let's
not use that.

I think that some people are reading way too much into these benchmarks,
using them as an excuse for disingenuous trolling.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google