Advanced data structures

jacob navia

unread,

Dec 11, 2009, 7:55:43 PM12/11/09

to

I have started reading
"Advanced data structures" from Peter Brass.

He says:

<quote>
This is not a book on object-oriented programming, but on the
algorithmic side of data structures. In May 2009 I finally removed the
ps/pdf-files of the book; go and buy it. Indeed the printed version
looks nice, and the copyeditor and the production people from Cambridge
UP did good work.
<end quote>

I find this book excellent. Everything is written in C, it
is clear, and explains very well the (quite difficult) stuff.

I see that I will have to read this before going further in the
container library. This is exactly the book I was looking for.

It is obvious too, that C is a good language for conveying
algorithmic thought. No, it is NOT just a language for some
obscure embedded systems.

jacob

Richard Heathfield

unread,

Dec 11, 2009, 8:02:45 PM12/11/09

to

jacob navia wrote:
> I have started reading
> "Advanced data structures" from Peter Brass.

<snip>

> I find this book excellent. Everything is written in C, it
> is clear, and explains very well the (quite difficult) stuff.

I'm glad you like it. I haven't read it, but you make it sound worth the
trouble.

<snip>

> It is obvious too, that C is a good language for conveying
> algorithmic thought. No, it is NOT just a language for some
> obscure embedded systems.

I couldn't agree more. It is precisely because (well-written) C is so
good at describing algorithms in all their nitty-gritty detail that it
is the lingua franca of nearly all the best general programming books.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within

Ian Collins

unread,

Dec 11, 2009, 8:03:53 PM12/11/09

to

jacob navia wrote:
>
> It is obvious too, that C is a good language for conveying
> algorithmic thought. No, it is NOT just a language for some
> obscure embedded systems.

Who said it was?

--
Ian Collins

bartc

unread,

Dec 11, 2009, 8:22:44 PM12/11/09

to

"Richard Heathfield" <r...@see.sig.invalid> wrote in message
news:i4qdnVefnqKEdr_W...@bt.com...

> jacob navia wrote:
>> I have started reading
>> "Advanced data structures" from Peter Brass.
>
> <snip>
>
>> I find this book excellent. Everything is written in C, it
>> is clear, and explains very well the (quite difficult) stuff.
>
> I'm glad you like it. I haven't read it, but you make it sound worth the
> trouble.
>
> <snip>
>
>> It is obvious too, that C is a good language for conveying
>> algorithmic thought. No, it is NOT just a language for some
>> obscure embedded systems.
>
> I couldn't agree more. It is precisely because (well-written) C is so good
> at describing algorithms in all their nitty-gritty detail that it is the
> lingua franca of nearly all the best general programming books.

There might be more apt languages than C (not everyone is comfortable with
zero-based indexing for example), but if it's a choice between 'MIX' and C,
then C wins.

--
bartc

Frank

unread,

Dec 11, 2009, 11:38:59 PM12/11/09

to

Sounds great. How much is it? Does it come with a companion disc?
--
frank

Frank

unread,

Dec 12, 2009, 2:59:16 AM12/12/09

to

I find a lot of C literature slanted this way. You certainly don't blame a
guy for publishing any technical book, but the embedded stuff usually has a
lot of proprietary and non-standard stuff. I know I'm not the owner, so I
lose interest.

No one will ever put a label on standard c and say "this belongs to rupert
murdoch." I'm fine with our Switzerland ethic of making it commercially
viable.

Chuck would never say a program were portable unless it could be ported to
a clicker. I think jacob looks at portability like I do.
--
frank

jacob navia

unread,

Dec 12, 2009, 3:06:25 AM12/12/09

to

Frank a �crit :

>
> Sounds great. How much is it? Does it come with a companion disc?

I paid 50.62$ at Amazon.com (new) but maybe you can find it for less. There is
a used version (at amazon) starting at 45$, other sites may have better offers.

The code of the book is available at the author's web site. Just cut/paste
from there.

jacob

Richard Heathfield

unread,

Dec 12, 2009, 3:24:07 AM12/12/09

to

bartc wrote:
>
> "Richard Heathfield" <r...@see.sig.invalid> wrote in message
> news:i4qdnVefnqKEdr_W...@bt.com...

<snip>

>> It is precisely because (well-written) C is so
>> good at describing algorithms in all their nitty-gritty detail that it
>> is the lingua franca of nearly all the best general programming books.
>
> There might be more apt languages than C (not everyone is comfortable
> with zero-based indexing for example), but if it's a choice between
> 'MIX' and C, then C wins.

Anyone who is not comfortable with 0-based indexing needs to buy a new
mattress. The contortions you have to go through to use other bases just
melt away when you switch to 0.

For its power to communicate algorithmic ideas readably, I would choose
C not just over MIX, but over Java, C++, Li*p... everything. Have you a
more apt language in mind?

Seebs

unread,

Dec 12, 2009, 3:42:38 AM12/12/09

to

On 2009-12-12, Richard Heathfield <r...@see.sig.invalid> wrote:
> For its power to communicate algorithmic ideas readably, I would choose
> C not just over MIX, but over Java, C++, Li*p... everything. Have you a
> more apt language in mind?

In terms of number of people who can read them, probably not. For many
algorithms, though, I'd probably find it easier to express them in Ruby,
and I'm told Icon is also excellent if you know it...

C is pretty expressive, but a lot of things end up with a bunch of
overhead that isn't actually the implementation of the algorithm, but
rather, the bookkeeping to allow that implementation to work.

-s
--
Copyright 2009, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Frank

unread,

Dec 12, 2009, 3:49:28 AM12/12/09

to

On Sat, 12 Dec 2009 08:24:07 +0000, Richard Heathfield wrote:

> bartc wrote:
>>
>> "Richard Heathfield" <r...@see.sig.invalid> wrote in message
>> news:i4qdnVefnqKEdr_W...@bt.com...
>
> <snip>
>
>>> It is precisely because (well-written) C is so
>>> good at describing algorithms in all their nitty-gritty detail that it
>>> is the lingua franca of nearly all the best general programming books.
>>
>> There might be more apt languages than C (not everyone is comfortable
>> with zero-based indexing for example), but if it's a choice between
>> 'MIX' and C, then C wins.
>
> Anyone who is not comfortable with 0-based indexing needs to buy a new
> mattress. The contortions you have to go through to use other bases just
> melt away when you switch to 0.

It's not the indexing. I wend in an out of fortran and c. In support of
your point, I found that the fortran bit model for integers is right to
left and from 0 to wordsize - 1. It's a serious gotcha under
circumstances, e.g. your word is shifted off its axis by one, you've lost a
bit and have another that came out of nowhere.

>
> For its power to communicate algorithmic ideas readably, I would choose
> C not just over MIX, but over Java, C++, Li*p... everything. Have you a
> more apt language in mind?

If I take your "you" to be familiar plural, then I would distinguish
between algorithm and caller. I love the C bindings in fortran! I can
look at output in a language where I can write
print *, anything
, while the computational capability came out of a can named C.
--
frank

jacob navia

unread,

Dec 12, 2009, 4:03:41 AM12/12/09

to

Seebs a �crit :

> On 2009-12-12, Richard Heathfield <r...@see.sig.invalid> wrote:
>> For its power to communicate algorithmic ideas readably, I would choose
>> C not just over MIX, but over Java, C++, Li*p... everything. Have you a
>> more apt language in mind?
>
> In terms of number of people who can read them, probably not. For many
> algorithms, though, I'd probably find it easier to express them in Ruby,
> and I'm told Icon is also excellent if you know it...
>

In ruby, you have built-in hash tables, sets, and many other containers.
You have to use the implementation that the language offers. Obviously
you *could* write your own, but all performance considerations would
be difficult to measure in an interpreter.

Ruby also, has a way of overloading operators that sometimes looks quite
weird:

[1,2,3]+[3,4,5] --> [4,6,8] ?
[1,2,3]-[3,4,5] --> [-2,-2,-2] ?

No. [1,2,3]-[3,4,5]-->[1,2], and [1,2,3]+[3,4,5]-->[1,2,3,4,5]

This is surely not really obvious for people not speaking ruby
fluently...

> C is pretty expressive, but a lot of things end up with a bunch of
> overhead that isn't actually the implementation of the algorithm, but
> rather, the bookkeeping to allow that implementation to work.
>
> -s

But exactly THAT makes you AWARE of that overhead, that is an inherent
part of the algorithm's cost!

For instance if we design an algorithm without explicit free, using a
garbage collector (and with JAVA you do not have any other choice) the
cost of an algorithm that uses memory sparingly and another that
needlessly thrashes memory is not visible in the code you write.

Hiding "details" can be OK, but when designing an algorithm and
evaluating its cost, those details do matter. In ruby or java
memory management costs are hidden by the language, and you are
forced to use the OO paradigm.

James Dow Allen

unread,

Dec 12, 2009, 5:13:51 AM12/12/09

to

On Dec 12, 4:03 pm, jacob navia <ja...@spamsink.net> eut ecrit:

> [Check out Peter Brass's book. Source code on-line.]

I looked at the source for tries (digital search trees) and was
disappointed to see that he didn't use "traversor structures."
I know Ben Pfaff uses such things, and it seems excellent idea.
Do others use them? Can Brass' examples be recommended if he
*doesn't* use them?

> Seebs a crit :

> > C is pretty expressive, but a lot of things end up with a bunch of
> > overhead that isn't actually the implementation of the algorithm, but
> > rather, the bookkeeping to allow that implementation to work.
>
> > -s

This seems to me to be a very apt description, though the notion
of *which* overhead is tedious might vary: there's "bookkeepping"
like includes and prototypes, and there's the tedium of pointer
manipulations or bit twiddling.

> But exactly THAT makes you AWARE of that overhead, that is an inherent
> part of the algorithm's cost!

I agree. The close similarity, at least in character, between
low-level C and a machine's likely instruction set, make it the
more comfortable choice for hardware-interfacing tasks like
memory management.

James Dow Allen

Bruce Cook

unread,

Dec 12, 2009, 5:44:53 AM12/12/09

to

bartc wrote:

>
> "Richard Heathfield" <r...@see.sig.invalid> wrote in message

[...]

> There might be more apt languages than C (not everyone is comfortable with
> zero-based indexing for example), but if it's a choice between 'MIX' and
> C, then C wins.

I'm interested in why you'd pick zero-based indexing as something people
would specifically have problems with in C.

I personally find zero-based indexing fairly natural, other languages which
1 based indexing such as Fortran and Cobol used to bug me, especially after
I had started programming in C and "knew there was a better way" :)

zero based indexing certainly makes more sense mathematically.

Bruce

Frank

unread,

Dec 12, 2009, 6:06:55 AM12/12/09

to

Sounds good. If anyone in the beehive state wants to ask me what I want
for Christmas, I can be precise.

Cheers,
--

Nick

unread,

Dec 12, 2009, 6:14:56 AM12/12/09

to

Bruce Cook <bruce-...@noreplybicycle.synonet.comnoreply> writes:

Once you get into "friendly" string manipulation, there are a few arguments
for 1-based indexing, particularly when combined with "zero is false,
everything else is true". For example, it would be "nicer" in
Javascript to be able to write »if(somestring.indexOf("test"))« rather
than having to add »!=-1« at the end. Of course, C using pointers for
this manages to do it quite well with the NULL return from strstr.
--
Online waterways route planner: http://canalplan.org.uk
development version: http://canalplan.eu

Message has been deleted

jacob navia

unread,

Dec 12, 2009, 7:50:14 AM12/12/09

to

James Dow Allen a �crit :

> On Dec 12, 4:03 pm, jacob navia <ja...@spamsink.net> eut ecrit:
>
>> [Check out Peter Brass's book. Source code on-line.]
>
> I looked at the source for tries (digital search trees) and was
> disappointed to see that he didn't use "traversor structures."
> I know Ben Pfaff uses such things, and it seems excellent idea.
> Do others use them? Can Brass' examples be recommended if he
> *doesn't* use them?
>

I do not know but the tries subject is treated very extensively
in that book, sorry. Pages 335 to 356 are about tries, and he
shows several implementations.

Maybe you didn't look at the correct one?

Because in many examples he shows a first "trivial" implementation that
he later refines more and more. If you would explain what do you
mean by "traversor structures" we could go further.

jacob

jacob navia

unread,

Dec 12, 2009, 8:49:46 AM12/12/09

to

Stefan Ram a �crit :

> jacob navia <ja...@spamsink.net> writes:
>> I find this book excellent. Everything is written in C, it
>

> It seems to allocate a stack entry
>
> st = (stack_t *) malloc( ...
>
> and then, in the next line
>
> st->base = ...
>
> it seems to use �st� without ever taking account for the
> possibility that �st� might be 0?
>
> This might be fine when a program is used by the programmer
> himself, who can cope with segment violations or other
> strange runtime-behavior. But it might not be appropriate
> when software is intended to be delievered as a product to a
> customer who needs to depend on it.
>
He does that, as he explains in the text, because they are replaced
in the next section by a heap implementation where he builds an
intermediate layer (as I did for the list container) that allocates
nodes from a linked list of blocks.

The book is not a fool proof receipt book, it is a book about
advanced data structures. The code is just to illustrate the text!

jacob navia

unread,

Dec 12, 2009, 9:10:38 AM12/12/09

to

jacob navia a �crit :

>
> The book is not a fool proof receipt book, it is a book about
> advanced data structures. The code is just to illustrate the text!

What I mean with this sentence is that when reading code about an
algorithm, it is easier to read if the error handling code is not
shown.

What I am interested in with this book is to learn about the
data structures themselves, not about good software practices,
testing the return value of malloc, writing int main(void) or
similar problems.

In my implementation of the heap I did test for NULL return,
and I issue the corresponding error. In a book about advanced
data structures that is just a distracting detail.

Message has been deleted

James Dow Allen

unread,

Dec 12, 2009, 9:56:06 AM12/12/09

to

On Dec 12, 7:50 pm, jacob navia <ja...@spamsink.net> eût écrit :
> James Dow Allen a écrit :
> > On Dec 12, 4:03 pm, jacob navia <ja...@spamsink.net> eût écrit :

> >> [Check out Peter Brass's book. Source code on-line.]
>
> > I looked at the source for tries (digital search trees) and was
> > disappointed to see that he didn't use "traversor structures."
> > I know Ben Pfaff uses such things, and it seems excellent idea.
> > Do others use them? Can Brass' examples be recommended if he
> > *doesn't* use them?
>

> Because in many examples he shows a first "trivial" implementation that
> he later refines more and more. If you would explain what do you
> mean by "traversor structures" we could go further.

I examined several modules linked from
http://www-cs.ccny.cuny.edu/~peter/dstest.html
all with the same unfortunate result. Let us admit,
however, that much or most other tree-traversing code also
does *not* use such "traversor structures". Ben Pfaff's
code does and, whether or not he claims it as original
invention, can doubtless do better than I at lucid
discussion thereof.

Not counting the rather trivial
typedef char object_t;
Brass's code declares a single object type
(node), where I'd use at least another two.
Add a table or table descriptor type more than
just a certain node (the root) becoming all
there is to a "tree." The other type I'd
surely introduce for all but the most trivial
search tree is "traversor", which holds the
state of a search in progress (or search completed).

To understand the need, consider that we operate
the tree searcher as a co-routine, returning the
entire tree in pre- or post-order. Brass's
code can't do that: it has create(), find(), insert()
and delete() but no treewalker(). When you write
treewalker() you'll see the need for the traversor
object; once you implement it you find it convenient
to use during find(), delete(), etc. as well.

James Dow Allen

Ben Bacarisse

unread,

Dec 12, 2009, 10:08:40 AM12/12/09

to

James Dow Allen <jdall...@yahoo.com> writes:

> On Dec 12, 4:03 pm, jacob navia <ja...@spamsink.net> eut ecrit:
>
>> [Check out Peter Brass's book. Source code on-line.]
>
> I looked at the source for tries (digital search trees) and was
> disappointed to see that he didn't use "traversor structures."
> I know Ben Pfaff uses such things, and it seems excellent idea.
> Do others use them? Can Brass' examples be recommended if he
> *doesn't* use them?

I'll pass on that for the moment (for one thing, I have not read the
book so there may be good reasons to keep the emphasis on other
aspects of the implementation -- it would not be fair to comment on
the basis of the code alone).

However, it seems a shame not to have had someone who knows C really
well read over the code first. There is, of course, cast malloc calls
(almost everyone does this so I suppose that fight is over) but there
is also sizeof(char) and (int)'\0'. In fact lots of expressions that
will promote to int are cast to int, often with extra parentheses.

We also have an idiosyncratic layout that is never consistent. Almost
every possible arrangement of spaces are used with parentheses ([] and
() alike) and operators, for example.

That last is just my being fussy, but there is also a fixed size stack
that can overflow in the trie delete code and comments like:

next_char += 1; /* move to next character */

I'd also prefer to see returns from malloc checked and a rather less
cavalier attitude to integer overflow. There are real cases
where un-trapped overflow leads to a negative hash table index and
where other assumptions about arithmetic leads to similar UB. I could
provoke a segmentation fault from the example code very easily.

If the book is other wise good, all of these are a real shame.
Someone could probably have corrected the code with only few days
work.

<snip>
--
Ben.

Ben Bacarisse

unread,

Dec 12, 2009, 10:25:08 AM12/12/09

to

jacob navia <ja...@spamsink.net> writes:

> Stefan Ram a écrit :

>> jacob navia <ja...@spamsink.net> writes:
>>> I find this book excellent. Everything is written in C, it
>>
>> It seems to allocate a stack entry
>>
>> st = (stack_t *) malloc( ...
>>
>> and then, in the next line
>>
>> st->base = ...
>>
>> it seems to use »st« without ever taking account for the
>> possibility that »st« might be 0?
>>
>> This might be fine when a program is used by the programmer
>> himself, who can cope with segment violations or other
>> strange runtime-behavior. But it might not be appropriate
>> when software is intended to be delievered as a product to a
>> customer who needs to depend on it.
>>
> He does that, as he explains in the text, because they are replaced
> in the next section by a heap implementation where he builds an
> intermediate layer (as I did for the list container) that allocates
> nodes from a linked list of blocks.

I don't see how that solves the problem for a malloc call. Also, if
the free list code is as in the online examples, it too suffers from
un-tested malloc returns so the problem is simply duplicated rather
then solved.

> The book is not a fool proof receipt book, it is a book about
> advanced data structures. The code is just to illustrate the text!

--
Ben.

Eric Sosman

unread,

Dec 12, 2009, 10:55:09 AM12/12/09

to

On 12/12/2009 9:54 AM, Stefan Ram wrote:

> jacob navia<ja...@spamsink.net> writes:
>> What I mean with this sentence is that when reading code about an
>> algorithm, it is easier to read if the error handling code is not
>> shown.
>

> I have another definition of �algorithm�: An algorithm is
> the implementation of some feature using the means of a
> specific target language. [...]

You're free to use words in your own way ("Who's to be
master?" -- H.D.), but your definition of "algorithm" is at
variance with the accepted meaning. That's likely to create
confusion and raise barriers to communication, just as if
you made an appointment to meet someone at noon, using your
own private definition of "noon" as "when the Moon rises."

But back to "algorithm:" Under your definition, there's
one "algorithm" for finding the GCD of two numbers in C,
another for C++, another for Basic, another for Java, ...
As a competent programmer, it seems you must learn a new
"algorithm" for every programming language, perhaps even for
every framework. That seems an awfully heavy cognitive burden.

In fact, under your definition "Euclid's Algorithm" is
an empty fiction, and ought to be something like "Euclid's
Recipe" or "Euclid's Way Of Going About It." There is
clearly an implementation-independent Something about this
GCD-finding approach; what do you call that i-i S, since
you use "algorithm" for something else?

Please answer before "noon." ;-)

--
Eric Sosman
eso...@ieee-dot-org.invalid

Message has been deleted

jacob navia

unread,

Dec 12, 2009, 11:29:07 AM12/12/09

to

Stefan Ram a écrit :
>
> If you have another definition of »algorithm« than I do,
> feel free to post its wording.
>

www.dictionary.com:

al⋅go⋅rithm
–noun
A set of rules for solving a problem in a finite number of steps, as for
finding the greatest common divisor.

Origin:
1890–95; var. of algorism, by assoc. with Gk arithmós number. See
arithmetic

Note that no programming language (or language at all) is mentioned.
An algorithm is a set of rules to do something mathematically. The only
condition is that the number of steps should be finite.

jacob navia

unread,

Dec 12, 2009, 11:34:56 AM12/12/09

to

Stefan Ram a �crit :
> jacob navia <ja...@spamsink.net> writes:

>> In my implementation of the heap I did test for NULL return,
>> and I issue the corresponding error. In a book about advanced
>> data structures that is just a distracting detail.
>

> I would not actually issue from a library function in the
> sense of a side-effect with access to external files or
> consoles or so, but just return 0.

I call the established error function for the container where the error
happens. Depending on the severity of the error, that function
(that the user of the library can replace with a function of
his/her own) the library exists, shows an error message or just returns.
If the library error function returns, then I just pass the error
to the calling function.

jacob

Message has been deleted

Phil Carmody

unread,

Dec 12, 2009, 11:47:34 AM12/12/09

to

r...@zedat.fu-berlin.de (Stefan Ram) writes:
> jacob navia <ja...@spamsink.net> writes:
>>alâgoârithm
>>ânoun

>>A set of rules for solving a problem in a finite number of steps, as for
>>finding the greatest common divisor.

>>Note that no programming language (or language at all) is mentioned.
>>An algorithm is a set of rules to do something mathematically. The only
>>condition is that the number of steps should be finite.
>

> Those »steps« must assume that some »elementary operations«
> are already given. Therefore, the steps depend on those
> elementary operations.
>
> You can not give any algorithm without this assumption
> (of a given set of elementary operations) (just try it).

Yes, but those steps do not need to assume that those
elementary operations are native to the language, merely
implementable in that language.

Phil
--
Any true emperor never needs to wear clothes. -- Devany on r.a.s.f1

Message has been deleted

Ben Bacarisse

unread,

Dec 12, 2009, 12:32:31 PM12/12/09

to

r...@zedat.fu-berlin.de (Stefan Ram) writes:

> Phil Carmody <thefatphi...@yahoo.co.uk> writes:
>>>Those »steps« must assume that some »elementary operations«
>>>are already given. Therefore, the steps depend on those
>>>elementary operations.

>>Yes, but those steps do not need to assume that those
>>elementary operations are native to the language, merely
>>implementable in that language.
>

> For a calculation to be actually executable in a language,
> it is not sufficient for those operations to be merely
> "implementable". They have to be /actually implemented/. Thus,
> their implementation is a necessary part of the implementation
> of the calculation.
>
> In Haskell, the calculation the GCD of x and y just seems to be:
>
> gcd x y
>
> http://www.zvon.org/other/haskell/Outputprelude/gcd_f.html
>
> . In C, it might be written instead:
>
> int gcd( int a, int b )
> { int c; while( a ){ c = a; a = b % a; b = c; }return b; }
>
> (...)
>
> gcd( x, y )

But these are different algorithms in both senses -- i.e. both Eric's
and your use of the term -- so the example is not a good one. Eric's
question is what, if anything, you call the idea shared by these
two functions (one Haskell, the other C):

igcd a 0 = a
igcd a b = igcd b (a `rem` b)

int igcd(int a, int b)
{
return b ? igcd(b, a % b) : a;
}

Most people are happy to say that they embody some common thing -- a
recursive GCD algorithm -- despite the huge differences in form and,
indeed, semantics. Do you not bother to name or talk about this
commonality?

<snip>
--
Ben.

Phil Carmody

unread,

Dec 12, 2009, 12:36:25 PM12/12/09

to

r...@zedat.fu-berlin.de (Stefan Ram) writes:
> Phil Carmody <thefatphi...@yahoo.co.uk> writes:

>>>Those »steps« must assume that some »elementary operations«
>>>are already given. Therefore, the steps depend on those
>>>elementary operations.

>>Yes, but those steps do not need to assume that those
>>elementary operations are native to the language, merely
>>implementable in that language.
>

> For a calculation to be actually executable in a language,
> it is not sufficient for those operations to be merely
> "implementable". They have to be /actually implemented/.

You're obviously not a theorist. Actual executability is pretty
darn irrelevant when it comes to theoretical algorithmics.

> Thus,
> their implementation is a necessary part of the implementation
> of the calculation.

As that depends on what is false to theoretists, it's just as false.

> In Haskell, the calculation the GCD of x and y just seems to be:
>
> gcd x y

No. _A_ calculation of the GCD is that. There's nothing preventing
an explicit Euclidean algorithm.

Message has been deleted

Ben Bacarisse

unread,

Dec 12, 2009, 12:47:27 PM12/12/09

to

r...@zedat.fu-berlin.de (Stefan Ram) writes:

> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>> igcd a 0 = a
>> igcd a b = igcd b (a `rem` b)
>>
>> int igcd(int a, int b)
>> {
>> return b ? igcd(b, a % b) : a;
>> }
>
>>Most people are happy to say that they embody some common thing -- a
>>recursive GCD algorithm -- despite the huge differences in form and,
>>indeed, semantics. Do you not bother to name or talk about this
>>commonality?
>

> I agree that they share something. It is the exploitation of
> the same mathematical theorem.

That's fair enough, but I won't be using your terminology any time
soon and I think you'll have to accept that most people call this
commonality "an algorithm" not "the exploitation of a mathematical
theorem".

> I just do not like the idea of this theorem being useful to
> calculate the GCD in each and every programming language,
> because, for example:

Yes, but I don't think anyone said that. If you have recursion and a
mod operation in language X, the above "algorithm" can be represented
in X. No one has said it is good to so or the best way to calculate
GCD or any such thing.

> - when the language already does have a GCD, it is not
> required, and
>
> - when the language does not have a primitive remainder
> operation, there might be better means to perform
> the calculation (possibly exploiting other mathematical
> theorems).

--
Ben.

Richard Harter

unread,

Dec 12, 2009, 12:58:08 PM12/12/09

to

On 12 Dec 2009 16:23:35 GMT, r...@zedat.fu-berlin.de (Stefan Ram)
wrote:

>Eric Sosman <eso...@ieee-dot-org.invalid> writes:
>>>I have another definition of �algorithm�: An algorithm is
>>>the implementation of some feature using the means of a
>>>specific target language. [...]

>>one "algorithm" for finding the GCD of two numbers in C,
>>another for C++, another for Basic, another for Java, ...
>

> The GCD requires the mod operation. So, when a language
> does not have a mod operation, an implementation of the
> mod operation becomes part of the implementation of the
> GCD, otherwise it don't.

There is a distinction that you are blurring over - the
distinction between an algorithm and the implementation of an
algorithm. Algorithms are ordinarily understood to be a
description of a suite of steps that can be taken to achieve some
specified results. The key point is that the algorithm specifies
what the steps must do rather than how they do what they must do.

As a matter of convenience programming languages are commonly
used to describe algorithms. Unfortunately most programming
languages express "how to do something" but not "what is to be
done". As a result, algorithms are often specified in terms of
an implementation. This is a regular source of confusion - it
becomes easy to confuse algorithms and their implementations.

Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Infinity is one of those things that keep philosophers busy when they
could be more profitably spending their time weeding their garden.

Message has been deleted

Seebs

unread,

Dec 12, 2009, 1:28:46 PM12/12/09

to

On 2009-12-12, jacob navia <ja...@spamsink.net> wrote:
> In ruby, you have built-in hash tables, sets, and many other containers.
> You have to use the implementation that the language offers. Obviously
> you *could* write your own, but all performance considerations would
> be difficult to measure in an interpreter.

Yeah. On the other hand, there's usually not much REASON to use other
implementations.

> This is surely not really obvious for people not speaking ruby
> fluently...

Not especially, no.

> But exactly THAT makes you AWARE of that overhead, that is an inherent
> part of the algorithm's cost!

Yes. I use Ruby for things where I'm pretty sure modern computers are
Fast Enough. Then I'm pretty much okay with ignoring the overhead in most
cases, and for most of what I write, the cost of the time it would take me
to implement it in a less expressive language far exceeds the cost of the
computer Ruby will need to get it done faster anyway. :)

> Hiding "details" can be OK, but when designing an algorithm and
> evaluating its cost, those details do matter. In ruby or java
> memory management costs are hidden by the language, and you are
> forced to use the OO paradigm.

Often, the details matter much less than you might expect. I did an
iterative fractal design once, and spent quite a while paying attention
to those low-level details, finally managing to get about a 40% increase.
Then I eliminated the 0-length lines from the end of one segment to the
beginning of the next, and by about depth 5-6, I had eliminated more
than half of the points drawn, yielding a >50% increase, and >1000% for
the levels of depth where that pushed the data set outside what you could
keep in memory at a time...

-s
--
Copyright 2009, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Malcolm McLean

unread,

Dec 12, 2009, 1:39:52 PM12/12/09

to

"Seebs" <usenet...@seebs.net> wrote in message
>
> C is pretty expressive, but a lot of things end up with a bunch of
> overhead that isn't actually the implementation of the algorithm, but
> rather, the bookkeeping to allow that implementation to work.
>
The problem is that it leads to a dichotomy between "teaching" and
"industrial" code. I chose C for Basic Algorithms, but it meant that some of
the fucntions would fail on extreme inputs. The only way round is to add
lots of error-checking and memory management code, which would distract from
the algorithm.

However the publishers didn't like the fact that it was in C.

Flash Gordon

unread,

Dec 12, 2009, 1:38:42 PM12/12/09

to

Stefan Ram wrote:
> r...@zedat.fu-berlin.de (Stefan Ram) writes:
>> In Haskell, the GCD is very �straightforward�, as it's
>> fundamental assertions can be written �directly�:
>
> �it's� should be �its� instead.
>
>> But in Prolog, a �Tricky version� sometimes is needed:
>
> And the most obvious example: In a language, where the GCD
> can be calculated with the operator �o�, the algorithm for
> the GCD of x and y is just �x o y�.

That is no more an algorithm for GCD than the word "algorithm" is a
definition of the word "algorithm". At least, it is not an algorithm in
the sense the word is commonly used in English, computer science or
mathematics. Consider that that it is common to talk about whether the C
library function qsort actually uses the quicksort algorithm. It is also
possible to talk about which algorithm a Haskal implementation used to
calculate the GCD when you use its GCD operator (without needing to
worry about the language Haskal has been implemented in). You can also
ask if there is an algorithm to solve the halting problem without
needing to specify the language you would implement it in.
--
Flash Gordon

Message has been deleted

Kaz Kylheku

unread,

Dec 12, 2009, 2:29:03 PM12/12/09

to

On 2009-12-12, jacob navia <ja...@spamsink.net> wrote:

> For instance if we design an algorithm without explicit free, using a
> garbage collector (and with JAVA you do not have any other choice) the
> cost of an algorithm that uses memory sparingly and another that
> needlessly thrashes memory is not visible in the code you write.

That is false.

Algorithms which use memory reveal it by the calls that they make to
functions which allocate memory. Garbage collection does not eliminate
the allocation calls.

If we can count the allocation calls, we know how much the algorithm
``conses'', and that tells us how much pressure it exerts on the garbage
collector.

Message has been deleted

Eric Sosman

unread,

Dec 12, 2009, 3:42:43 PM12/12/09

to

On 12/12/2009 11:23 AM, Stefan Ram wrote:
> [...]
> If you have another definition of �algorithm� than I do,

> feel free to post its wording.

Knuth devotes nine full pages to this topic, and I decline
to type them (besides, that much bulk might stretch the bounds
of the "fair use" doctrine for copyright).

In those nine pages, the only part that seems to come at
all close to your mechanistic definition is the requirement of
"definiteness:" We must know how to carry out each step the
algorithm specifies. An algorithm cannot say "Set N equal to
the smallest even number not expressible as a sum of two primes,"
because we do not know how to find such a number (we do not even
know whether such a number exists). In your example of a system
without a direct means of calculating remainders, an algorithm
cannot just say "take the remainder."

... except that it can! We *do* know how to calculate
remainders without an explicit "remainder" operation, or even
without "divide" and "multiply," if it comes to that. We have
(wait for it) algorithms for doing such calculations, and we
can use them as sub-algorithms in our master algorithm. This
does not, I claim, change the master algorithm in the slightest:
It says "Find the remainder," and we use whatever means we choose
to do so: A built-in remainder operator like `%', or a divide-
multiply-subtract sequence, or even just a subtraction loop:

unsigned int remainder(unsigned int val, unsigned int div)
{
while (val >= div)
val -= div;
return val;
}

I agree that an implementation of Euclid's Algorithm using
this remainder() function will be a different implementation than
one using `%', and will probably have different timing and so on.
But I do not subscribe to the idea that Euclid's Algorithm itself
is in any way dependent on the mechanisms used to carry out the
steps it calls for. It's a musical score; the implementation is
a performance.

--
Eric Sosman
eso...@ieee-dot-org.invalid

jacob navia

unread,

Dec 12, 2009, 4:00:20 PM12/12/09

to

Kaz Kylheku a �crit :

> On 2009-12-12, jacob navia <ja...@spamsink.net> wrote:
>> For instance if we design an algorithm without explicit free, using a
>> garbage collector (and with JAVA you do not have any other choice) the
>> cost of an algorithm that uses memory sparingly and another that
>> needlessly thrashes memory is not visible in the code you write.
>
> That is false.
>
> Algorithms which use memory reveal it by the calls that they make to
> functions which allocate memory.

True

> Garbage collection does not eliminate
> the allocation calls.
>

But eliminates the disposal and memory management calls.Yes, they
reveal half of the job, but not all of it.

An algorithm that manages a free list to avoid free/malloc calls is
much more complex that one that doesn't and have just the allocation
calls.

> If we can count the allocation calls, we know how much the algorithm
> ``conses'', and that tells us how much pressure it exerts on the garbage
> collector.

But we have no control about the workings of the release part. Unless
of course, we do our own memory management using preallocated arrays
(what is possible in Java but then, we would be writing C in java...)

Keith Thompson

unread,

Dec 12, 2009, 4:02:03 PM12/12/09

to

r...@zedat.fu-berlin.de (Stefan Ram) writes:
> r...@zedat.fu-berlin.de (Stefan Ram) writes:
>>I only do not deem such an approach to be »language
>>independent« - as it is sometimes claimed - because it might
>>not be possible with other types of languages (like
>>functional or logical languages). Insofar, I say that there
>>is no "language-independent algorithm", i.e., a general
>>recipe, that can be translated nearly mechanically into
>>every programming language.
>
> Especially, one effect of such thinking is to write C code
> by applying »general algorithms« for list structures,
> ignoring that in C, memory management is an integral part of
> such algorithms, while in, say, Java, it is not.

It shouldn't have such an effect for competent programmers.

You can, for example, implement an AVL tree insertion algorithm in C,
C++ Java, Perl, Ada, INTERCAL, x86 assembly language, and so forth.
The resulting code is likely to be very different, particularly
regarding memory management and error handling.

I'd say that the *algorithm* is an abstraction, the part of the
method that doesn't necessarily change from one language to another.
(And if it does change, perhaps because some language provides
some fancy feature that makes the whole process easier, then I'd
say you're implementing a different algorithm in that language.)

There's a lot of handwaving here, and non-procedural languages
mess up the argument in ways that I haven't thought through.

[Stefan, why do you indent your text by two columns? I find
that it makes your articles slightly more difficult to read.
Indentation on Usenet normally denotes quoted text.]

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Ian Collins

unread,

Dec 12, 2009, 4:08:06 PM12/12/09

to

Bruce Cook wrote:
> bartc wrote:
>
>> "Richard Heathfield" <r...@see.sig.invalid> wrote in message
> [...]
>> There might be more apt languages than C (not everyone is comfortable with
>> zero-based indexing for example), but if it's a choice between 'MIX' and
>> C, then C wins.
>
> I'm interested in why you'd pick zero-based indexing as something people
> would specifically have problems with in C.
>
> I personally find zero-based indexing fairly natural, other languages which
> 1 based indexing such as Fortran and Cobol used to bug me, especially after
> I had started programming in C and "knew there was a better way" :)

It's a pity K&R2 abandoned 0 based chapter numbers!

--
Ian Collins

Eric Sosman

unread,

Dec 12, 2009, 4:08:59 PM12/12/09

to

On 12/12/2009 2:26 PM, Stefan Ram wrote:
> Eric Sosman<eso...@ieee-dot-org.invalid> writes:

>> In fact, under your definition "Euclid's Algorithm" is
>> an empty fiction, and ought to be something like "Euclid's
>> Recipe" or "Euclid's Way Of Going About It." There is
>> clearly an implementation-independent Something about this
>> GCD-finding approach; what do you call that i-i S, since
>> you use "algorithm" for something else?
>

> When one considers a family of certain procedural languages
> (like C and Java) that share certain features, one can
> indeed map some corresponding features to each other and
> then say that two different code units in those languages
> use an equivalent approach, which might be called �algorithm�.

>
> I only do not deem such an approach to be �language
> independent� - as it is sometimes claimed - because it might
> not be possible with other types of languages (like
> functional or logical languages). Insofar, I say that there
> is no "language-independent algorithm", i.e., a general
> recipe, that can be translated nearly mechanically into
> every programming language.

Okay, fine: As I said at the outset, you're entitled to use
words to mean whatever you please. Just don't expect to be
understood, because others are not likely to have access to your
private dictionary. (Even when you offer excerpts, few people
will be highly motivated to memorize them just for the pleasure
of deciphering your idiosyncratic usage.)

Communication is founded on convention; if you disregard the
conventions you communicate less ecstatically.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Message has been deleted

Antoninus Twink

unread,

Dec 12, 2009, 5:25:55 PM12/12/09

to

On 12 Dec 2009 at 21:02, Keith Thompson wrote:
> There's a lot of handwaving here, and non-procedural languages
> mess up the argument in ways that I haven't thought through.

I find no mention of non-procedural languages in the ISO C Standard, so
you have once again violated your idiotic "topicality" constraints.

Hypocrite.

> [Stefan, why do you indent your text by two columns? I find
> that it makes your articles slightly more difficult to read.
> Indentation on Usenet normally denotes quoted text.]

Interesting that you don't challenge your buddy Sossman's eccentric
indentation and annoyingly small line width.

Hypocrite.

Phil Carmody

unread,

Dec 12, 2009, 5:31:25 PM12/12/09

to

Keith Thompson <ks...@mib.org> writes:
> r...@zedat.fu-berlin.de (Stefan Ram) writes:
>> r...@zedat.fu-berlin.de (Stefan Ram) writes:
>>>I only do not deem such an approach to be »language
>>>independent« - as it is sometimes claimed - because it might
>>>not be possible with other types of languages (like
>>>functional or logical languages). Insofar, I say that there
>>>is no "language-independent algorithm", i.e., a general
>>>recipe, that can be translated nearly mechanically into
>>>every programming language.
>>
>> Especially, one effect of such thinking is to write C code
>> by applying »general algorithms« for list structures,
>> ignoring that in C, memory management is an integral part of
>> such algorithms, while in, say, Java, it is not.
>
> It shouldn't have such an effect for competent programmers.
>
> You can, for example, implement an AVL tree insertion algorithm in C,
> C++ Java, Perl, Ada, INTERCAL, x86 assembly language, and so forth.

<humour type="phil's" explanation="in headers">
I don't necessarily agree with your logical inference there. Just
because you can implement it in all those imperative languages doesn't
mean you can implement it in a stack language.
</humour>

> The resulting code is likely to be very different, particularly
> regarding memory management and error handling.

And determinism, please! I'm sure you can see what angle I come from.

Phil Carmody

unread,

Dec 12, 2009, 5:32:56 PM12/12/09

to

Keith Thompson <ks...@mib.org> writes:
> r...@zedat.fu-berlin.de (Stefan Ram) writes:

[...]

> [Stefan, why do you indent your text by two columns? I find
> that it makes your articles slightly more difficult to read.
> Indentation on Usenet normally denotes quoted text.]

I'm tempted to dub the style "blockhead quoting".

Flash Gordon

unread,

Dec 12, 2009, 6:14:28 PM12/12/09

to

> Sometimes, some people need to disagree with some of those
> conventions. For example, there is a certain convention
> allowing one to speak of entities, such as �god�, �absolute
> time�, �sadness� or �dark matter�. But some speakers might
> like to emphasize that they does not believe there is an actual
> entity in the world that corresponds to some of these words.

Fine, say that algorithms don't exist. There are plenty of words for
things that don't exist, such as bandersnatch. Appropriating the word
for some other purpose just because you believe what it refers to does
not exist only leads to confusion. If you don't believe dark matter
exists, you could just as well use that as your term!
--
Flash Gordon

Bruce Cook

unread,

Dec 12, 2009, 8:27:04 PM12/12/09

to

Nick wrote:

> Bruce Cook <bruce-...@noreplybicycle.synonet.comnoreply> writes:
>
>> bartc wrote:
>>
>>>
>>> "Richard Heathfield" <r...@see.sig.invalid> wrote in message
>> [...]
>>> There might be more apt languages than C (not everyone is comfortable
>>> with zero-based indexing for example), but if it's a choice between
>>> 'MIX' and C, then C wins.
>>
>> I'm interested in why you'd pick zero-based indexing as something people
>> would specifically have problems with in C.
>>
>> I personally find zero-based indexing fairly natural, other languages
>> which 1 based indexing such as Fortran and Cobol used to bug me,
>> especially after I had started programming in C and "knew there was a
>> better way" :)
>>

>> zero based indexing certainly makes more sense mathematically.
>
> Once you get into "friendly" string manipulation, there are a few
> arguments for 1-based indexing, particularly when combined with "zero is
> false,
> everything else is true". For example, it would be "nicer" in
> Javascript to be able to write »if(somestring.indexOf("test"))« rather
> than having to add »!=-1« at the end. Of course, C using pointers for
> this manages to do it quite well with the NULL return from strstr.

Yeah I agree, zero-based indexing is a pain in function results. I think
this is more a problem of utilizing the single value range for signaling in
function returns rather than zero-based indexing itself.

Perl almost gets it right by having the undef & !exists consitions on
scalars but it's usefulness falls down by treating 0 and undef in the same
manner in bool expressions, so you can't just use the same return result to
test function failure & result values.

Try-except steps around this by providing a separate signaling channel & and
a standard error/exception handling mechanism for the language but
introduces it's own set of problem I think.

Bruce

Bruce Cook

unread,

Dec 12, 2009, 8:42:31 PM12/12/09

to

Stefan Ram wrote:

> Eric Sosman <eso...@ieee-dot-org.invalid> writes:
>>>I have another definition of »algorithm«: An algorithm is

>>>the implementation of some feature using the means of a
>>>specific target language. [...]
>>one "algorithm" for finding the GCD of two numbers in C,
>>another for C++, another for Basic, another for Java, ...
>
> The GCD requires the mod operation. So, when a language
> does not have a mod operation, an implementation of the
> mod operation becomes part of the implementation of the
> GCD, otherwise it don't.
>

> In Haskell, the GCD is very »straightforward«, as it's

> fundamental assertions can be written »directly«:
>
> http://en.literateprograms.org/Euclidean_algorithm_(Haskell)

>
> But in Prolog, a »Tricky version« sometimes is needed:
>

> http://en.literateprograms.org/Euclidean_algorithm_%28Prolog%29
>
> , which is another means to calculate the GCD.
[...]

In my experience any algorithm I implement I will implement is the same
regardless of the language. I will use the tools the language supplies &
program within the constraints of the language but the algorithm is
recognizably the same.

A programming language simply provides a toolkit to express algorithms,
different languages have different ways of looking at things & different
tools you can use to attack problems but essentially a quicksort in one
language is the same as a quicksort in another (otherwise by definition it's
not a quicksort).

C implementations provide a quicksort library function (other languages may
have this built in as a language primitive). If you use those (and you
should) you're not implementing a quicksort algorithm, you're using a
language tool.

(and just for the pedants I'm using the generic form of "you")

Bruce

Keith Thompson

unread,

Dec 12, 2009, 10:19:47 PM12/12/09

to

r...@zedat.fu-berlin.de (Stefan Ram) writes:

> Keith Thompson <ks...@mib.org> writes:
>>[Stefan, why do you indent your text by two columns? I find
>>that it makes your articles slightly more difficult to read.
>>Indentation on Usenet normally denotes quoted text.]
>

> I do indent the main body of natural language text so as to
> distinguish it from quotations of formal language text,
> while I often do not indent these quotations of formal
> language text.
>
[fairly reasonable explanation snipped]

Ok, but since the vast majority of other posters consistently write
original text left-justified, using a different and inconsistent
convention is more likely to cause confusion than to convey whatever
information you're trying to convey.

Python indentation is unlikely to be relevant here in comp.lang.c,
and C code can almost always be indented without loss of meaning.
Even if you'd rather not indent C code, it looks different enough
from plain text that including a block of code between paragraphs
of English won't cause any problems.

See "beautiful new impediments to understanding".

Give it some thought.

Keith Thompson

unread,

Dec 12, 2009, 10:22:37 PM12/12/09

to

Bruce Cook <bruce-...@noreplybicycle.synonet.comnoreply> writes:
[...]

> C implementations provide a quicksort library function (other languages may
> have this built in as a language primitive). If you use those (and you
> should) you're not implementing a quicksort algorithm, you're using a
> language tool.

[...]

Quibble: C provides a qsort() library function, which may or may
not be an implementation of the Quicksort algorithm (though the
name was almost certainly inspired by the name of the algorithm).

Kenny McCormack

unread,

Dec 12, 2009, 10:34:54 PM12/12/09

to

In article <ln8wd7l...@nuthaus.mib.org>,
Keith Thompson <ks...@mib.org> wrote:
...

>See "beautiful new impediments to understanding".
>
>Give it some thought.

Isn't it funny how easy it is to cause Kiki to lose his precarious sense
of balance?

Bruce Cook

unread,

Dec 12, 2009, 11:06:50 PM12/12/09

to

Keith Thompson wrote:

> Quibble: C provides a qsort() library function, which may or may
> not be an implementation of the Quicksort algorithm (though the
> name was almost certainly inspired by the name of the algorithm).

:) yep, true.

Curtis Dyer

unread,

Dec 13, 2009, 4:13:14 AM12/13/09

to

On 12 Dec 2009, Seebs <usenet...@seebs.net> wrote:

> On 2009-12-12, Richard Heathfield <r...@see.sig.invalid> wrote:
>> For its power to communicate algorithmic ideas readably, I
>> would choose C not just over MIX, but over Java, C++, Li*p...
>> everything. Have you a more apt language in mind?

I would have to agree, and I actually think PHP would be pretty
good, too. I'm biased, thogh; it was the first programming language
I learned, so has a special place in my heart. (I know, I'm cheesy
and lame.)

<snip>

> C is pretty expressive, but a lot of things end up with a bunch
> of overhead that isn't actually the implementation of the
> algorithm, but rather, the bookkeeping to allow that
> implementation to work.

Although I don't have a large amount of experience with C, it still
seems that, for the most part, you could modularize the bookkeeping
code so that the actual implementation of the algorithm will read
more clearly. Also, as Jacob mentioned, sometimes the explicit
handling of low-level things like memory handling can be relevant in
crafting the actual implementation of an algorithm.

--
"Don't worry about efficiency until you've attained correctness."
~ Eric Sosman, comp.lang.c

Dann Corbit

unread,

Dec 14, 2009, 2:54:32 PM12/14/09

to

In article <HoudncDt8dANz77W...@bt.com>,
r...@see.sig.invalid says...

>
> bartc wrote:
> >
> > "Richard Heathfield" <r...@see.sig.invalid> wrote in message

> > news:i4qdnVefnqKEdr_W...@bt.com...
>
> <snip>
>
> >> It is precisely because (well-written) C is so
> >> good at describing algorithms in all their nitty-gritty detail that it
> >> is the lingua franca of nearly all the best general programming books.

> >
> > There might be more apt languages than C (not everyone is comfortable
> > with zero-based indexing for example), but if it's a choice between
> > 'MIX' and C, then C wins.
>

> Anyone who is not comfortable with 0-based indexing needs to buy a new
> mattress. The contortions you have to go through to use other bases just
> melt away when you switch to 0.

>
> For its power to communicate algorithmic ideas readably, I would choose
> C not just over MIX, but over Java, C++, Li*p... everything. Have you a
> more apt language in mind?

If you use C++ templates, then you can write algorithm generically.
These generic algorithms are still typesafe.

That is the reason that we find huge algorithmic solution sets like STL
and BOOST in C++ but not in C.

You can, of course, accomplish the same thing in C, but it is harder and
less natural.

jacob navia

unread,

Dec 14, 2009, 3:08:49 PM12/14/09

to

Dann Corbit a �crit :

>
> If you use C++ templates, then you can write algorithm generically.
> These generic algorithms are still typesafe.
>

Yes. The problem with C++ is in other parts of the language and
in its enormous complexity. Templates are a good idea. You can
accomplish pretty much the same thing with a preprocessed text
file in C, that takes a template file and adapts it to the specific
data type given.

> That is the reason that we find huge algorithmic solution sets like STL
> and BOOST in C++ but not in C.
>

I think that no development is done in C in this field because C has been
abandoned as development language for this type of solutions, not
because C can't do it. The book I proposed has red/black trees, for instance
that make the bassis for several data structures in the STL. They can
be programmed in c, obviously using void pointers but still in a
fairly simple manner.

> You can, of course, accomplish the same thing in C, but it is harder and
> less natural.
>

Yes, you can accomplish the same thing in C++, but using that complex language
is harder and less natural than in C.

:-)

jacob

Kaz Kylheku

unread,

Dec 14, 2009, 3:38:04 PM12/14/09

to

The reason the solutions are huge in C++ templates, is that the C++ template
language is extremely clumsy. It's driven by declarations, and can fails to do
even some obvious inference. Classes have to be used in order to emulate
type-valued functions. If you want a function f:U, V -> T where U, V and T are
types, rather than values, then you can't just write a function. You have to
write a class, where the return value T is hacked as some member. And you
can't write a body of algorithmic code to compute that member; it all has to be
done in terms of templates: the class member which returns the function's
type-value is either just declared literally, or computed using calls
to other templates, which is very limited in terms of expressivity.
Things like if/else decisions all have to be coded as template classes.
Then you have to write reams of specializations for that class for various
instantiations of U and V. Oh yeah, and the C++ template system sometimes
doesn't even know when someting is intended to be a type, so you have to remind
it with ``typename''. The whole thing is the avoidance of giving the
programmer what is really needed: namely a compile-time meta-programming
language, in which you have functions and variables whose values are types and
pieces of syntax.

The C++ template system is the work of people who are so gifted that they don't
think they have to study the computer science which came before them.

Look at, say, Template Haskell. There we don't have the huge algorithmic
solution sets of STL and BOOST. (Much of what C++ templates do doens't
even need the TH extension, just the straight language).
Speaking of which, I'm now revisiting the TH paper
(http://www.haskell.org/th/papers/meta-haskell.ps) and have noticed it has some
juicy things to say about C++ templates, in section 10.1, which is mostly
praise for C++ templates, but also this:

``It is (now) widely recognized that this type-system computation language
[of C++ templates] is simply an extraordinarily baroque functional language,
full of /ad hoc/ coding tricks and conventions.''

Bingo!

Dann Corbit

unread,

Dec 14, 2009, 4:11:59 PM12/14/09

to

In article <200912141...@gmail.com>, kkyl...@gmail.com says...

I don't really know the Haskell language, though it does sound
interesting.

I will download and read that cited paper.

Kaz Kylheku

unread,

Dec 14, 2009, 4:31:55 PM12/14/09

to

That's a terrible starting point for Haskell, describing the workings of a
Template extension, but good luck! :)

bartc

unread,

Dec 14, 2009, 8:12:14 PM12/14/09

to

"Richard Heathfield" <r...@see.sig.invalid> wrote in message

news:HoudncDt8dANz77W...@bt.com...

> bartc wrote:
>>
>> "Richard Heathfield" <r...@see.sig.invalid> wrote in message
>> news:i4qdnVefnqKEdr_W...@bt.com...
>
> <snip>
>
>>> It is precisely because (well-written) C is so good at describing
>>> algorithms in all their nitty-gritty detail that it is the lingua franca
>>> of nearly all the best general programming books.
>>
>> There might be more apt languages than C (not everyone is comfortable
>> with zero-based indexing for example), but if it's a choice between 'MIX'
>> and C, then C wins.
>
> Anyone who is not comfortable with 0-based indexing needs to buy a new
> mattress. The contortions you have to go through to use other bases just
> melt away when you switch to 0.

I think there are more contortions with starting from zero; you need to deal
with 0, N-1 and N. With 1-based, you deal with 1,N and only sometimes N-1.
(This is for counting or enumerating things; for measuring, then starting
from zero is better.)

> For its power to communicate algorithmic ideas readably, I would choose C
> not just over MIX, but over Java, C++, Li*p... everything. Have you a more
> apt language in mind?

Pseudo-code will do me. That is, 1-based pseudo-code.

And printed to make nice use of bold and italics. Printed C code always used
to look anaemic because of the thin courier font used (but perhaps that has
changed recently, and with stuff being on-line).

--
Bartc

Ian Collins

unread,

Dec 14, 2009, 11:08:31 PM12/14/09

to

jacob navia wrote:
> Dann Corbit a �crit :
>>
>> If you use C++ templates, then you can write algorithm generically.
>> These generic algorithms are still typesafe.
>>
>
> Yes. The problem with C++ is in other parts of the language and
> in its enormous complexity.

Complexity which, as we have discussed before, you don't have to use.
In C++, just like in C, you don't have to pay for what you don't use.

As I posted a couple of days ago, any given problem will require a fixed
amount of complexity to solve. The programmer has the choice of
implementing the complexity himself, or leaving it to the compiler writer!

> Templates are a good idea. You can
> accomplish pretty much the same thing with a preprocessed text
> file in C, that takes a template file and adapts it to the specific
> data type given.

Which is what early C++ implementations did until they grew up.

--
Ian Collins

jacob navia

unread,

Dec 15, 2009, 2:35:46 AM12/15/09

to

Kaz Kylheku a �crit :

I developed a compiler that does exactly that. You have a library of
entry points into the compilation environment, where you can specify
templates in C. There are two types of paradigms:

1) Event oriented:
The compiler generates a stream of compilation events that you can
subclass, in a similar way as GUI event oriented programming where
the GUI generates events that you subclass.
The types of events are:
begin/end compilation
begin/end function
begin/end statement
Syntax error (here you add syntax as you wish)
etc (many others omitted)

2) Template oriented:
You define a compile time function that resides in some dynamic
library, and you call it from your program. For instance you
define a function "list" that will generate a list type, and
you call it with
list<int>
That function will generate a type definition.

All code generation is based in a text interface: in All cases, the
generated code is a text stream. That text can contain further templates
that will provoke other events. When an event is active however, the
system will never generate an event for the event that is currently
running.

Now, this would need an army of people to do that, and I am alone.

>
> The C++ template system is the work of people who are so gifted that they don't
> think they have to study the computer science which came before them.
>
> Look at, say, Template Haskell. There we don't have the huge algorithmic
> solution sets of STL and BOOST. (Much of what C++ templates do doens't
> even need the TH extension, just the straight language).
> Speaking of which, I'm now revisiting the TH paper
> (http://www.haskell.org/th/papers/meta-haskell.ps) and have noticed it has some
> juicy things to say about C++ templates, in section 10.1, which is mostly
> praise for C++ templates, but also this:
>
> ``It is (now) widely recognized that this type-system computation language
> [of C++ templates] is simply an extraordinarily baroque functional language,
> full of /ad hoc/ coding tricks and conventions.''
>
> Bingo!

In my system you *can* do assignments, since everything is programmed in
C. You can save context from an event into the next, for instance just
to gather statistics, etc.

jacob

gwowen

unread,

Dec 15, 2009, 4:56:03 AM12/15/09

to

On Dec 14, 8:38 pm, Kaz Kylheku <kkylh...@gmail.com> wrote:
>
> ``It is (now) widely recognized that this type-system computation language
> [of C++ templates] is simply an extraordinarily baroque functional language,
> full of /ad hoc/ coding tricks and conventions.''
>
> Bingo!

The problems you identify with C++ templates is that they're now being
routinely used for things which the initial design never imagined. As
originally envisaged, they were the typesafety-enhanced-macros,
similar to those which Jacob. Then people discovered you could use
them as a metaprogramming language, and all the baroque tricks that
you mention appeared, and people began to utilise the full power,
which required contorting their clunky syntax beyond the comprehension
of mere mortals. It's a baroque functional language, because it was
never designed to be functional language -- it was an (unhappy)
accident.

Richard

unread,

Dec 15, 2009, 5:05:43 AM12/15/09

to

gwowen <gwo...@gmail.com> writes:

It became apparent to me from day one that any C++ project that was
allowed to grow without "common sense" code review would develop into a
hellish mess. And so it became. Smart arses utilising their "an orange
is a fruit" C++ tutorial classes with full throttle and before you knew
it you had an unreadable mess requiring detailed knowledge of the
intricacies of about 2000 classes and their overloaded operators and
"friend" dependencies.

--
"Avoid hyperbole at all costs, its the most destructive argument on
the planet" - Mark McIntyre in comp.lang.c

Nick Keighley

unread,

Dec 15, 2009, 6:01:36 AM12/15/09

to

I'm reading Alexandrescu's "Modern C++ Design" at the moment. I'm not
sure if its brilliant, or crazed. A lot of what he says makes
incredible sense (exactly how your "class" works depends on a set of
policies which are small "classes" that decide things [how is memory
allocted, can you copy the thing etc. etc.]) and then there's
machinary with things like "partial template specialization" and
"template templates". Still I'm sure I'll have a merry christmas.

Nick Keighley

unread,

Dec 15, 2009, 6:06:30 AM12/15/09

to

On 12 Dec, 10:44, Bruce Cook <bruce-

use...@noreplybicycle.synonet.comnoreply> wrote:
> bartc wrote:
> > "Richard Heathfield" <r...@see.sig.invalid> wrote in message

> > There might be more apt languages than C (not everyone is comfortable with

> > zero-based indexing for example), but if it's a choice between 'MIX' and
> > C, then C wins.
>

> I'm interested in why you'd pick zero-based indexing as something people
> would specifically have problems with in C.
>
> I personally find zero-based indexing fairly natural, other languages which
> 1 based indexing such as Fortran and Cobol used to bug me, especially after
> I had started programming in C and "knew there was a better way" :)

why do we have to be hand-cuffed to any particular base?

> zero based indexing certainly makes more sense mathematically.

I don't see why. It makes the index calculation slightly easier but
shouldn't that be the compiler writers concern, not ours?

Richard

unread,

Dec 15, 2009, 6:26:16 AM12/15/09

to

Nick Keighley <nick_keigh...@hotmail.com> writes:

Because of real life considerations with real life
processors. Processors weren't always as powerful and flexible as now
with arbitrary "for free" indexing.

Starting at zero offset simply saves calculations/additions or even a
look up to calculate the delta value (index base) in order to correctly
address the correct memory range.

Nick Keighley

unread,

Dec 15, 2009, 6:34:15 AM12/15/09

to

On 12 Dec, 21:08, Ian Collins <ian-n...@hotmail.com> wrote:
> Bruce Cook wrote:

<snip>

> > I personally find zero-based indexing fairly natural, other languages which
> > 1 based indexing such as Fortran and Cobol used to bug me, especially after
> > I had started programming in C and "knew there was a better way" :)
>

> It's a pity K&R2 abandoned 0 based chapter numbers!

I've read computer science books that used zero-based chapter
numbering (bit of a pose I thought...)

Nick Keighley

unread,

Dec 15, 2009, 6:44:54 AM12/15/09

to

On 15 Dec, 11:26, Richard <rgrd...@gmail.com> wrote:

> Nick Keighley <nick_keighley_nos...@hotmail.com> writes:
> > On 12 Dec, 10:44, Bruce Cook <bruce-
> > use...@noreplybicycle.synonet.comnoreply> wrote:
> >> bartc wrote:
> >> > "Richard Heathfield" <r...@see.sig.invalid> wrote in message
>
> >> > There might be more apt languages than C (not everyone is comfortable with
> >> > zero-based indexing for example), but if it's a choice between 'MIX' and
> >> > C, then C wins.
>
> >> I'm interested in why you'd pick zero-based indexing as something people
> >> would specifically have problems with in C.
>
> >> I personally find zero-based indexing fairly natural, other languages which
> >> 1 based indexing such as Fortran and Cobol used to bug me, especially after
> >> I had started programming in C and "knew there was a better way" :)
>
> > why do we have to be hand-cuffed to any particular base?
>
> >> zero based indexing certainly makes more sense mathematically.
>
> > I don't see why. It makes the index calculation slightly easier but
> > shouldn't that be the compiler writers concern, not ours?
>
> Because of real life considerations with real life
> processors.

I think real life considerations with er... real life should be taken
into consideration. Some things map naturally onto non-zero based
indexing. At least Pascal agrees with me...

> Processors weren't always as powerful and flexible as now
> with arbitrary "for free" indexing.
>
> Starting at zero offset simply saves calculations/additions or even a
> look up to calculate the delta value (index base) in order to correctly
> address the correct memory range.

I'd have thought the cost would be at compile time. Just add one (or
whatever) to the base address.

[Incidently I don't expect to win a "zero-indexing is bad" argument in
clc but it's fun trying]

jacob navia

unread,

Dec 15, 2009, 6:48:43 AM12/15/09

to

Nick Keighley a �crit :

>
> I'd have thought the cost would be at compile time. Just add one (or
> whatever) to the base address.
>
> [Incidently I don't expect to win a "zero-indexing is bad" argument in
> clc but it's fun trying]

The programming language APL had a system variable QuadIO, for Index Origin that
would be either 1 or 0. It was systematically added to all array indices, and allowed
you to start arrays indices either with 1 or zero, as you wish.

Richard

unread,

Dec 15, 2009, 6:51:59 AM12/15/09

to

Nick Keighley <nick_keigh...@hotmail.com> writes:

> On 15 Dec, 11:26, Richard <rgrd...@gmail.com> wrote:
>> Nick Keighley <nick_keighley_nos...@hotmail.com> writes:
>> > On 12 Dec, 10:44, Bruce Cook <bruce-
>> > use...@noreplybicycle.synonet.comnoreply> wrote:
>> >> bartc wrote:
>> >> > "Richard Heathfield" <r...@see.sig.invalid> wrote in message
>>
>> >> > There might be more apt languages than C (not everyone is comfortable with
>> >> > zero-based indexing for example), but if it's a choice between 'MIX' and
>> >> > C, then C wins.
>>
>> >> I'm interested in why you'd pick zero-based indexing as something people
>> >> would specifically have problems with in C.
>>
>> >> I personally find zero-based indexing fairly natural, other languages which
>> >> 1 based indexing such as Fortran and Cobol used to bug me, especially after
>> >> I had started programming in C and "knew there was a better way" :)
>>
>> > why do we have to be hand-cuffed to any particular base?
>>
>> >> zero based indexing certainly makes more sense mathematically.
>>
>> > I don't see why. It makes the index calculation slightly easier but
>> > shouldn't that be the compiler writers concern, not ours?
>>
>> Because of real life considerations with real life
>> processors.
>
> I think real life considerations with er... real life should be taken
> into consideration. Some things map naturally onto non-zero based
> indexing. At least Pascal agrees with me...

The high level language with little thought given to efficiency. C
is/was much more efficient. Trade off.

>
>> Processors weren't always as powerful and flexible as now
>> with arbitrary "for free" indexing.
>>
>> Starting at zero offset simply saves calculations/additions or even a
>> look up to calculate the delta value (index base) in order to correctly
>> address the correct memory range.
>
> I'd have thought the cost would be at compile time. Just add one (or
> whatever) to the base address.
>
> [Incidently I don't expect to win a "zero-indexing is bad" argument in
> clc but it's fun trying]

--

gwowen

unread,

Dec 15, 2009, 7:47:52 AM12/15/09

to

On Dec 15, 11:48 am, jacob navia <ja...@nospam.org> wrote:
> The programming language APL had a system variable QuadIO, for Index Origin that
> would be either 1 or 0. It was systematically added to all array indices, and allowed
> you to start arrays indices either with 1 or zero, as you wish.

And, of course, Fortran allows the user to define an arbitrary range
for the indexing of an array/matrix, which can be quite useful if (for
example, if you're exploiting symmetry in your algorithm its sometimes
helpful to declare an array "real arr(-100:100,201)", which is a
201x201 array indexed from -100 to +100 on the first co-ordinate, and
1:201 on the second...).

I'm not sure it helps all that much, though.

Richard Harter

unread,

Dec 15, 2009, 10:30:53 AM12/15/09

to

On Sat, 12 Dec 2009 10:44:53 GMT, Bruce Cook
<bruce-...@noreplybicycle.synonet.comnoreply> wrote:

>bartc wrote:
>
>>
>> "Richard Heathfield" <r...@see.sig.invalid> wrote in message

>[...]

>> There might be more apt languages than C (not everyone is comfortable with
>> zero-based indexing for example), but if it's a choice between 'MIX' and
>> C, then C wins.
>
>I'm interested in why you'd pick zero-based indexing as something people
>would specifically have problems with in C.
>
>I personally find zero-based indexing fairly natural, other languages which
>1 based indexing such as Fortran and Cobol used to bug me, especially after
>I had started programming in C and "knew there was a better way" :)
>

>zero based indexing certainly makes more sense mathematically.

From a mathematical viewpoint, 1 based is appropriate for
counting elements and 0 based for counting/measuring intervals.

Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Infinity is one of those things that keep philosophers busy when they
could be more profitably spending their time weeding their garden.

Keith Thompson

unread,

Dec 15, 2009, 11:09:40 AM12/15/09

to

jacob navia <ja...@nospam.org> writes:
> Nick Keighley a écrit :

Perl also has (had) a system variable, set to 0 by default,
that determines the index of the first element in an array.
It's hardly ever used, and its use is strongly discouraged.
(Perl-arrays are 0-based.)

This isn't an argument for or against 0-based or 1-based indexing;
controlling it via a language-defined global variable was a bad idea.

Message has been deleted

Kaz Kylheku

unread,

Dec 15, 2009, 1:46:17 PM12/15/09

to

On 2009-12-15, Stefan Ram <r...@zedat.fu-berlin.de> wrote:
> jacob navia <ja...@spamsink.net> writes:
>>"Advanced data structures" from Peter Brass.
>
> From a technical point of view, there is no means to tell,
> how "advanced" a data structure is, it depends on the
> reader, whether he

We can safely understand ``advanced'' to mean ``anything more more
complicated than a linked list, unbalanced binary search tree, or
directly addressed array''.

Flash Gordon

unread,

Dec 15, 2009, 2:26:12 PM12/15/09

to

Nick Keighley wrote:
> On 15 Dec, 11:26, Richard <rgrd...@gmail.com> wrote:
>> Nick Keighley <nick_keighley_nos...@hotmail.com> writes:
>>> On 12 Dec, 10:44, Bruce Cook <bruce-
>>> use...@noreplybicycle.synonet.comnoreply> wrote:
>>>> bartc wrote:
>>>>> "Richard Heathfield" <r...@see.sig.invalid> wrote in message
>>>>> There might be more apt languages than C (not everyone is comfortable with
>>>>> zero-based indexing for example), but if it's a choice between 'MIX' and
>>>>> C, then C wins.
>>>> I'm interested in why you'd pick zero-based indexing as something people
>>>> would specifically have problems with in C.
>>>> I personally find zero-based indexing fairly natural, other languages which
>>>> 1 based indexing such as Fortran and Cobol used to bug me, especially after
>>>> I had started programming in C and "knew there was a better way" :)
>>> why do we have to be hand-cuffed to any particular base?
>>>> zero based indexing certainly makes more sense mathematically.
>>> I don't see why. It makes the index calculation slightly easier but
>>> shouldn't that be the compiler writers concern, not ours?
>> Because of real life considerations with real life
>> processors.
>
> I think real life considerations with er... real life should be taken
> into consideration. Some things map naturally onto non-zero based
> indexing. At least Pascal agrees with me...

Zero based indexing is good. It's simple, easy to use, works nicely when
you need to do odd things such as manual index calculations, it's easy
to implement efficiently. I found them intuitive and obvious when I
first used them.

One based indexing is good, it's easy to understand etc. I found them
intuitive and obvious when I first used them.

Indexing on whatever you want is good (when it's determined per array
and not by a global option), it means you can do whatever is intuitive
for the problem you are trying to solve. I've used enumerated types as
indicies in another language when it made sense.

>> Processors weren't always as powerful and flexible as now
>> with arbitrary "for free" indexing.
>>
>> Starting at zero offset simply saves calculations/additions or even a
>> look up to calculate the delta value (index base) in order to correctly
>> address the correct memory range.
>
> I'd have thought the cost would be at compile time. Just add one (or
> whatever) to the base address.

Now consider a 2d array, or 3d or 4d, it starts getting more complex.

> [Incidently I don't expect to win a "zero-indexing is bad" argument in
> clc but it's fun trying]

C uses 0 based because it makes it simple pointer arithmetic and what
you write is what you get. For higher level abstraction other things are
better.
--
Flash Gordon

bartc

unread,

Dec 15, 2009, 5:35:11 PM12/15/09

to

"Flash Gordon" <sm...@spam.causeway.com> wrote in message
news:lf4lv6x...@news.flash-gordon.me.uk...
> Nick Keighley wrote:

[1-based array indexing]

>> I'd have thought the cost would be at compile time. Just add one (or
>> whatever) to the base address.
>
> Now consider a 2d array, or 3d or 4d, it starts getting more complex.

Yeah, I use 4D arrays every day...

If these are static (ie. flat) arrays, then the address offsets can still be
done at compile time without any runtime cost.

With dynamic arrays, it's a bit more fiddly but I believe the -1 adjustment
can again be eliminated, either by each array pointer pointing at a dummy
[0] element (this is in the implementation, so can be made to work), or
building the offset into the machine addressing mode.

But anyway a decrement operation or two, when all these dereferences are
happening to get at the array element, probably wouldn't be a huge cost.

--
Bartc

Flash Gordon

unread,

Dec 15, 2009, 6:20:49 PM12/15/09

to

bartc wrote:
>
> "Flash Gordon" <sm...@spam.causeway.com> wrote in message
> news:lf4lv6x...@news.flash-gordon.me.uk...
>> Nick Keighley wrote:
>
> [1-based array indexing]
>
>>> I'd have thought the cost would be at compile time. Just add one (or
>>> whatever) to the base address.
>>
>> Now consider a 2d array, or 3d or 4d, it starts getting more complex.
>
> Yeah, I use 4D arrays every day...

I've used far more complex structures in real production code which was
being developed for a paying customer.

> If these are static (ie. flat) arrays, then the address offsets can
> still be done at compile time without any runtime cost.

a, b, c and d are all calculated with complex expressions, possibly
including function calls...
arr[a][b][c][d] = something;
How are you going to calculate the address without extra runtime cost?
It may be small, but it does exist.

> With dynamic arrays, it's a bit more fiddly but I believe the -1
> adjustment can again be eliminated, either by each array pointer
> pointing at a dummy [0] element (this is in the implementation, so can
> be made to work),

There are all sorts of problems this can lead to, especially in C, which
would have to be dealt with. If it's an array of a large struct (yes,
I've done this for good reason) that dummy is pretty big, and you can't
have other things in that dummy[0] space or you might get pointers to
different objects which compare equal! I foresee all sorts of unforeseen
problems...

> or building the offset into the machine addressing mode.

Do you know of any processors which have such addressing modes? Ones
which can be set up without cost given that the size in bytes of the
offset is not the same for all arrays?

> But anyway a decrement operation or two, when all these dereferences are
> happening to get at the array element, probably wouldn't be a huge cost.

Ah well, sometimes you really don't want any extra costs because the
timings really are tight.
--
Flash Gordon

bartc

unread,

Dec 15, 2009, 7:43:49 PM12/15/09

to

"Flash Gordon" <sm...@spam.causeway.com> wrote in message

news:k7ilv6x...@news.flash-gordon.me.uk...

> bartc wrote:
>>
>> "Flash Gordon" <sm...@spam.causeway.com> wrote in message
>> news:lf4lv6x...@news.flash-gordon.me.uk...
>>> Nick Keighley wrote:
>>
>> [1-based array indexing]
>>
>>>> I'd have thought the cost would be at compile time. Just add one (or
>>>> whatever) to the base address.
>>>
>>> Now consider a 2d array, or 3d or 4d, it starts getting more complex.
>>
>> Yeah, I use 4D arrays every day...
>
> I've used far more complex structures in real production code which was
> being developed for a paying customer.
>
>> If these are static (ie. flat) arrays, then the address offsets can still
>> be done at compile time without any runtime cost.
>
> a, b, c and d are all calculated with complex expressions, possibly
> including function calls...
> arr[a][b][c][d] = something;
> How are you going to calculate the address without extra runtime cost? It
> may be small, but it does exist.

I've put that line (using int ****arr) into gcc 3.4.5 for x86, it gave me a
4 instruction sequence.

Then I tried it with arr[a-1][b-1][c-1][d-1] = something (the sort of thing
you might do to convert 1-based to zero-based); gcc gave me the same 4
instructions, but the middle two now had a -4 offset in their addressing
modes; two extra bytes I think.

If a,b,c,d are actually that complex, it's possible the -1 can be absorbed
into one or two of them (for example if a was i+1 then it reduces to just
i).

>> With dynamic arrays, it's a bit more fiddly but I believe the -1
>> adjustment can again be eliminated, either by each array pointer pointing
>> at a dummy [0] element (this is in the implementation, so can be made to
>> work),
>
> There are all sorts of problems this can lead to, especially in C, which
> would have to be dealt with. If it's an array of a large struct (yes, I've
> done this for good reason) that dummy is pretty big, and you can't have
> other things in that dummy[0] space or you might get pointers to different
> objects which compare equal! I foresee all sorts of unforeseen problems...

You're thinking of C as used at the programmer level. I was thinking of the
implementation side where you only have to worry about a specific
architecture.

>> or building the offset into the machine addressing mode.
>
> Do you know of any processors which have such addressing modes? Ones which
> can be set up without cost given that the size in bytes of the offset is
> not the same for all arrays?

Just about every processor I've ever used can apply an offset to an indirect
address mode. Smaller processors may involve a tangible cost, but then you
probably wouldn't be using 4-dimensional dynamic arrays on those, or you
just spend ten minutes converting your code to a zero-base.

--
Bartc

Bruce Cook

unread,

Dec 16, 2009, 6:02:06 AM12/16/09

to

Richard wrote:

[...]

>>> zero based indexing certainly makes more sense mathematically.
>>
>> I don't see why. It makes the index calculation slightly easier but
>> shouldn't that be the compiler writers concern, not ours?
>>
>
> Because of real life considerations with real life
> processors. Processors weren't always as powerful and flexible as now
> with arbitrary "for free" indexing.

Certainly that was the case when I started programming.

What I was thinking of however was that with zero-based indexing it all
"fits together" neatly, I find I don't have to deal with off-by-one problems
with arrays.

for example with character maps (we used to do a lot of this in the early
days):

(This code doesn't compile it's for demonstration of concept only)

unsigned info[] = ( 0, /* null */
INFO1 | INFO2 | INFO3 ,
INFO1,
INFO3,
INFO5,
... pretend I've filled the character map up to 255
);

unsigned get_characteristics (char *c)
{
return (info[c]);
}

If we used 1-based indexing that would be
return(info[c+1]);

No real big strain, but with zero-based indexing it just falls into place
neatly.

Bruce

Bruce Cook

unread,

Dec 16, 2009, 6:02:09 AM12/16/09

to

Keith Thompson wrote:

> Perl also has (had) a system variable, set to 0 by default,
> that determines the index of the first element in an array.
> It's hardly ever used, and its use is strongly discouraged.
> (Perl-arrays are 0-based.)
>
> This isn't an argument for or against 0-based or 1-based indexing;
> controlling it via a language-defined global variable was a bad idea.

Ouch !

That would be particularly fun with libraries

Bruce

bartc

unread,

Dec 16, 2009, 6:38:50 AM12/16/09

to

"Bruce Cook" <bruce-...@noreplybicycle.synonet.comnoreply> wrote in
message news:4b28bcfa$1...@news.mel.dft.com.au...
> Richard wrote:

Some things are better with 0-based, and some with 1-based.

It doesn't have to be a choice; you can just allow both **.

However, if you have to choose one form, 0-based might be better, but only
because you can use 1-based indexing by allocating an extra element and
ignoring element 0. It's more difficult the other way around.

(** the design of some languages seems to demand that one base is the
default or 'natural' base for indexing, even when both (or any) base is
possible.)

--
Bartc

Michael Foukarakis

unread,

Dec 16, 2009, 6:43:19 AM12/16/09

to

On Dec 12, 10:24 am, Richard Heathfield <r...@see.sig.invalid> wrote:
> bartc wrote:
>
> > "Richard Heathfield" <r...@see.sig.invalid> wrote in message

> >news:i4qdnVefnqKEdr_W...@bt.com...
>
> <snip>
>
> >> It is precisely because (well-written) C is so
> >> good at describing algorithms in all their nitty-gritty detail that it
> >> is the lingua franca of nearly all the best general programming books.
>

> > There might be more apt languages than C (not everyone is comfortable
> > with zero-based indexing for example), but if it's a choice between
> > 'MIX' and C, then C wins.
>

> Anyone who is not comfortable with 0-based indexing needs to buy a new
> mattress. The contortions you have to go through to use other bases just
> melt away when you switch to 0.
>
> For its power to communicate algorithmic ideas readably, I would choose
> C not just over MIX, but over Java, C++, Li*p... everything. Have you a
> more apt language in mind?
>

Although I am reluctant to use 'equally apt' or 'more apt', I'd say
Erlang or Haskell are very good at expressing algorithms in a neat
fashion. Other high level languages such as Ruby or Python spring to
mind as well. I think an advantage of C over those is that it's
adequately readable (at least when we're not trying to write an IOCCC
entry..) even as it's significantly "closer" to manipulating the
machine.

On topic, the book is actually very comprehensive, very concise, and
deals with some quite interesting data structures. I liked it. :)

gwowen

unread,

Dec 16, 2009, 7:36:01 AM12/16/09

to

On Dec 12, 12:55 am, jacob navia <ja...@spamsink.net> wrote:
> I have started reading

> "Advanced data structures" from Peter Brass.

> ...
> I find this book excellent. Everything is written in C, it
> is clear, and explains very well the (quite difficult) stuff.
>
> I see that I will have to read this before going further in the
> container library. This is exactly the book I was looking for.

On a not-entirely-unrelated-note, I acquired a copy of Sedgwick's
"Algorithms In C" at the weekend (3 pounds in the Wooden Canal Boat
Society Charity Shop == result). He covers lists, arrays and various
balanced trees. Is there anything in Barnes that's not in Sedgwick?

jacob navia

unread,

Dec 16, 2009, 7:48:34 AM12/16/09

to

gwowen a �crit :

I will tell you in a few weeks. I bought Sedgewick in Amazon but
it was sold out. I had to wait several weeks until they shipped it and
it is still not here yet.

jacob

Nick

unread,

Dec 16, 2009, 3:24:19 PM12/16/09

to

gwowen <gwo...@gmail.com> writes:

Despite what you see in my sig, it's not mine! Mine's just "Algorithms"
- they're in either Pascal or a pascal-like pseudo code. I actually
prefer to implement myself from pseudo code than have them given to me.
--
Online waterways route planner: http://canalplan.org.uk
development version: http://canalplan.eu

Flash Gordon

unread,

Dec 16, 2009, 3:58:48 PM12/16/09

to

bartc wrote:
>
> "Flash Gordon" <sm...@spam.causeway.com> wrote in message
> news:k7ilv6x...@news.flash-gordon.me.uk...
>> bartc wrote:
>>>
>>> "Flash Gordon" <sm...@spam.causeway.com> wrote in message
>>> news:lf4lv6x...@news.flash-gordon.me.uk...
>>>> Nick Keighley wrote:
>>>
>>> [1-based array indexing]
>>>
>>>>> I'd have thought the cost would be at compile time. Just add one (or
>>>>> whatever) to the base address.
>>>>
>>>> Now consider a 2d array, or 3d or 4d, it starts getting more complex.
>>>
>>> Yeah, I use 4D arrays every day...
>>
>> I've used far more complex structures in real production code which
>> was being developed for a paying customer.
>>
>>> If these are static (ie. flat) arrays, then the address offsets can
>>> still be done at compile time without any runtime cost.
>>
>> a, b, c and d are all calculated with complex expressions, possibly
>> including function calls...
>> arr[a][b][c][d] = something;
>> How are you going to calculate the address without extra runtime cost?
>> It may be small, but it does exist.
>
> I've put that line (using int ****arr) into gcc 3.4.5 for x86, it gave
> me a 4 instruction sequence.

Not all of the world is i x86.

> Then I tried it with arr[a-1][b-1][c-1][d-1] = something (the sort of
> thing you might do to convert 1-based to zero-based); gcc gave me the
> same 4 instructions, but the middle two now had a -4 offset in their
> addressing modes; two extra bytes I think.

Now try on a DSP or some other embedded processor. Not everything can do
that trick.

> If a,b,c,d are actually that complex, it's possible the -1 can be
> absorbed into one or two of them (for example if a was i+1 then it
> reduces to just i).

It might be a function call. If you build your software for 0 based
arrays you build it in to the calculation, if it is 1 based you can't do
that and if it is a function doing the calculation it might not be
possible for the compiler to optimise away.

>>> With dynamic arrays, it's a bit more fiddly but I believe the -1
>>> adjustment can again be eliminated, either by each array pointer
>>> pointing at a dummy [0] element (this is in the implementation, so
>>> can be made to work),
>>
>> There are all sorts of problems this can lead to, especially in C,
>> which would have to be dealt with. If it's an array of a large struct
>> (yes, I've done this for good reason) that dummy is pretty big, and
>> you can't have other things in that dummy[0] space or you might get
>> pointers to different objects which compare equal! I foresee all sorts
>> of unforeseen problems...
>
> You're thinking of C as used at the programmer level. I was thinking of
> the implementation side where you only have to worry about a specific
> architecture.

I was referring to the problems you can have in the implementation for a
specific architecture.

>>> or building the offset into the machine addressing mode.
>>
>> Do you know of any processors which have such addressing modes? Ones
>> which can be set up without cost given that the size in bytes of the
>> offset is not the same for all arrays?
>
> Just about every processor I've ever used can apply an offset to an
> indirect address mode.

On some of the ones I've used you can't.

> Smaller processors may involve a tangible cost,
> but then you probably wouldn't be using 4-dimensional dynamic arrays on
> those,

You might be surprised at the complexity of problem which is solved
using small processors, or at least processors which don't have such
fancy addressing modes.

> or you just spend ten minutes converting your code to a zero-base.

If the language uses one-based arrays you can't convert it to zero based!

As I said earlier, in a language which allows I'll use whatever fits
best. However, there are good reasons for using zero-base in the
language when performance is critical.
--
Flash Gordon

Richard

unread,

Dec 16, 2009, 4:08:31 PM12/16/09

to

Flash Gordon <sm...@spam.causeway.com> writes:

So what? He can with a 386 base. Maybe that's what he's aiming at? Using
C for the most common processor for PCs in the world. Just a guess.

bartc

unread,

Dec 16, 2009, 5:33:57 PM12/16/09

to

"Flash Gordon" <sm...@spam.causeway.com> wrote in message

news:c9unv6x...@news.flash-gordon.me.uk...

> bartc wrote:
>>
>> "Flash Gordon" <sm...@spam.causeway.com> wrote in message
>> news:k7ilv6x...@news.flash-gordon.me.uk...
>>> bartc wrote:

>>>> If these are static (ie. flat) arrays, then the address offsets can
>>>> still be done at compile time without any runtime cost.
>>>
>>> a, b, c and d are all calculated with complex expressions, possibly
>>> including function calls...
>>> arr[a][b][c][d] = something;
>>> How are you going to calculate the address without extra runtime cost?
>>> It may be small, but it does exist.
>>
>> I've put that line (using int ****arr) into gcc 3.4.5 for x86, it gave me
>> a 4 instruction sequence.
>
> Not all of the world is i x86.

>> Then I tried it with arr[a-1][b-1][c-1][d-1] = something (the sort of
>> thing you might do to convert 1-based to zero-based); gcc gave me the
>> same 4 instructions, but the middle two now had a -4 offset in their
>> addressing modes; two extra bytes I think.
>
> Now try on a DSP or some other embedded processor. Not everything can do
> that trick.

I didn't fully understand the gcc code, but it seemed to find a way of doing
with just two extra offsets instead of the four one might expect. If this is
just clever logic then it can be applied to whatever address calculations
are needed for any processor.

On the DSP, do you have an example part-number? It might be interesting to
find out just what it is capable of.

>> If a,b,c,d are actually that complex, it's possible the -1 can be
>> absorbed into one or two of them (for example if a was i+1 then it
>> reduces to just i).
>
> It might be a function call. If you build your software for 0 based arrays
> you build it in to the calculation, if it is 1 based you can't do that and
> if it is a function doing the calculation it might not be possible for the
> compiler to optimise away.

With up to 4 function calls the relative cost of any offset operations falls
even further.

>>>> or building the offset into the machine addressing mode.
>>>
>>> Do you know of any processors which have such addressing modes? Ones
>>> which can be set up without cost given that the size in bytes of the
>>> offset is not the same for all arrays?
>>
>> Just about every processor I've ever used can apply an offset to an
>> indirect address mode.
>
> On some of the ones I've used you can't.

Then they might also have trouble with some ordinary array and struct
accesses.

>
>> Smaller processors may involve a tangible cost, but then you probably
>> wouldn't be using 4-dimensional dynamic arrays on those,
>
> You might be surprised at the complexity of problem which is solved using
> small processors, or at least processors which don't have such fancy
> addressing modes.
>
>> or you just spend ten minutes converting your code to a zero-base.
>
> If the language uses one-based arrays you can't convert it to zero based!

OK, I would have expected a compiled language targeted at basic hardware to
have a choice of array lower bounds.

Then it would be no different from many other decisions programmers have to
make regarding the efficiency of their code: dynamic arrays are slower than
static (flat) ones; 32-bit elements can be slower than 16-bit elements; and
a lower bound of 1 can be slower than one of 0, in some cases.

--
Bartc

superpollo

unread,

Dec 16, 2009, 6:03:18 PM12/16/09

to

gwowen ha scritto:

> On a not-entirely-unrelated-note, I acquired a copy of Sedgwick's
> "Algorithms In C" at the weekend (3 pounds in the Wooden Canal Boat
> Society Charity Shop == result).

3 pounds??? lucky fella!

Flash Gordon

unread,

Dec 16, 2009, 6:42:53 PM12/16/09

to

If the array is not a static array then you can't use the fixed offset
trick without doing extra arithmetic at run time.

> On the DSP, do you have an example part-number? It might be interesting
> to find out just what it is capable of.

I spent a lot of time programming the TMS320C25 both in assembler and C
some years back, and the TMS320C80, both of which are Texas Instruments
DSPs. I've actually been out of the embedded world for a while though.

>>> If a,b,c,d are actually that complex, it's possible the -1 can be
>>> absorbed into one or two of them (for example if a was i+1 then it
>>> reduces to just i).
>>
>> It might be a function call. If you build your software for 0 based
>> arrays you build it in to the calculation, if it is 1 based you can't
>> do that and if it is a function doing the calculation it might not be
>> possible for the compiler to optimise away.
>
> With up to 4 function calls the relative cost of any offset operations
> falls even further.

On some processors a call is cheap but address arithmetic is expensive.

>>>>> or building the offset into the machine addressing mode.
>>>>
>>>> Do you know of any processors which have such addressing modes? Ones
>>>> which can be set up without cost given that the size in bytes of the
>>>> offset is not the same for all arrays?
>>>
>>> Just about every processor I've ever used can apply an offset to an
>>> indirect address mode.
>>
>> On some of the ones I've used you can't.
>
> Then they might also have trouble with some ordinary array and struct
> accesses.

One is careful about what you use where.

>>> Smaller processors may involve a tangible cost, but then you probably
>>> wouldn't be using 4-dimensional dynamic arrays on those,
>>
>> You might be surprised at the complexity of problem which is solved
>> using small processors, or at least processors which don't have such
>> fancy addressing modes.
>>
>>> or you just spend ten minutes converting your code to a zero-base.
>>
>> If the language uses one-based arrays you can't convert it to zero based!
>
> OK, I would have expected a compiled language targeted at basic hardware
> to have a choice of array lower bounds.

I don't know of any where zero doesn't work well, but one does not
always work well.

> Then it would be no different from many other decisions programmers have
> to make regarding the efficiency of their code: dynamic arrays are
> slower than static (flat) ones; 32-bit elements can be slower than
> 16-bit elements; and a lower bound of 1 can be slower than one of 0, in
> some cases.

I've got no objection to a language allowing each array to be based
differently, or even allowing other things such as an enumerated types
an array index, it gives you the choice. I've got no major objection to
using one as a base either, I just don't think it would have been the
right choice for C.
--
Flash Gordon