The first, perhaps more evident problem is that it assumes mutable
data structures - specifically, it provides (setf elt). This is not a
big deal per se - if you want an immutable sequence type, define (setf
elt) to signal an error for it. That's how Java lists work, too.
However, digging deeper you find a more substantial problem: there's
no generic way of extending a sequence, except by creating a new one,
copying the source sequence, and adding the new elements - all using
(setf elt). I.e. there's not a generic version of the CONS and APPEND
functions - what Clojure folks call "conj". This may make sense if all
existing sequences are lists and vectors; but it doesn't if you want
to implement multiple sequence types, some of which may be immutable.
The third design issue I found is less important and can be worked
around by stretching the protocol a bit: there's no account for
objects which can be iterated, but not accessed by index, like a hash-
based set. Those aren't strictly speaking sequences, but still they
would benefit from having some of the generic functions specialized on
sequences be specialized on them as well.
What do you think?
Cheers,
Alessio
[1] http://www.doc.gold.ac.uk/~mas01cr/papers/ilc2007/sequences-20070301.pdf
This is incorrect: There is 'concatenate.
> The third design issue I found is less important and can be worked
> around by stretching the protocol a bit: there's no account for
> objects which can be iterated, but not accessed by index, like a hash-
> based set. Those aren't strictly speaking sequences, but still they
> would benefit from having some of the generic functions specialized on
> sequences be specialized on them as well.
Could be interesting.
Pascal
--
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
I am missing something similar to Java Collections in Lisp, too.
Therefore on May 05-th I started a similar discussion here titled
"A Collections Framework?"
where I just asked if such framework existed in Lisp.
The result was like a verbose "No, it doesn't" but there seems to be a
library FSet which could be worthwhile having a closer look at.
Some expressed interest in such a framework, so maybe it would be worthwhile
to start writing one.
Currently I restrict myself to what Lisp offers to make my program run
first and optimize later. That is not so bad because with that restriction
I can concentrate better on my actual proplems. The Java Collections API is
a bit seductive :) .
Norbert
The Java Collections API is just bad. It's better to look at something
that is not designed for a toy language. :-P
Did anybody look at Dylan?
Concatenate requires all sequences to be copied. Also, it is
inconvenient to use as a replacement for APPEND since it requires to
specify the target sequence type. What I'd like to have is something
like
(defgeneric scons (item sequence)) ;;Appends item at the head of
sequence
(defgeneric sappend (sequence &rest sequences)) ;;Appends sequences to
sequence, returning a an object of the same type as sequence
they could be implemented by default as (untested):
(defmethod scons (item (s sequence))
(concatenate (type-of s) (list item) s))
(defmethod sappend ((s sequence) &rest sequences)
(apply #'concatenate (type-of s) s sequences))
while they could be specialized to behave more efficiently, for
example for lists they would call cons and append.
> > The third design issue I found is less important and can be worked
> > around by stretching the protocol a bit: there's no account for
> > objects which can be iterated, but not accessed by index, like a hash-
> > based set. Those aren't strictly speaking sequences, but still they
> > would benefit from having some of the generic functions specialized on
> > sequences be specialized on them as well.
>
> Could be interesting.
I think so too, especially given that SBCL's protocol adds support for
iterators. However, if this was being designed from scratch, it would
make sense to have a type/system class ITERABLE and have SEQUENCE as a
subtype of it. Since CL can't be retrofitted like that, iterable will
have to be a type disjoint from sequence.
Alessio
> The Java Collections API is just bad. It's better to look at something
> that is not designed for a toy language. :-P
What is bad of the Java Collection Framework (JCF)? I hope you don't think
of the first try, with Vector and Hashtable. Since Java 2 it is much
better:
http://download.oracle.com/javase/tutorial/collections/
Abstracting implementations with interfaces and separate algorithms is a
good idea and would be even easier when implementing in Common Lisp.
Maybe more natural for Lisp would be Smalltalk collections:
http://www.ifi.uzh.ch/richter/Classes/oose2/01_Collections/03_smalltalk/03_smalltalk.html
No wonder that it is a bit similar to the JCF, because the author knows it.
Another useful resource is STL of C++. Without the need for templates and
the horrible template syntax, it would be even fun to use it :-)
> Did anybody look at Dylan?
No, what are the advantages compared to other collection frameworks?
--
Frank Buss, f...@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
I don't think that scons is a good function for vectors. In general, I'm
quite skeptical about abstracted interfaces for different kinds of
collections: Each of them has their own advantages and disadvantages,
and you don't easily change from one to the other without considering
performance and other implications. So what's the real use of an
abstract interface, other that you _can_ abstract it?
>>> The third design issue I found is less important and can be worked
>>> around by stretching the protocol a bit: there's no account for
>>> objects which can be iterated, but not accessed by index, like a hash-
>>> based set. Those aren't strictly speaking sequences, but still they
>>> would benefit from having some of the generic functions specialized on
>>> sequences be specialized on them as well.
>>
>> Could be interesting.
>
> I think so too, especially given that SBCL's protocol adds support for
> iterators. However, if this was being designed from scratch, it would
> make sense to have a type/system class ITERABLE and have SEQUENCE as a
> subtype of it. Since CL can't be retrofitted like that, iterable will
> have to be a type disjoint from sequence.
I don't want to discourage you, but again, what's the real use of such
abstractions?
When the JCF was introduced, there was already a widely used collection
framework for Java, the Java Generic Library. The JGL had a couple of
very cool features, but they were just ignored by the JCF. No idea why,
but it was probably a political issue. As most elements in Java.
Just because something is in Java doesn't make it good. I have never
encountered anything that was built in Java that didn't suffer from
artificial limitations because of artificial restrictions of the
Java/JVM infrastructure. So please don't make it the first reference
point, and at least not the only reference point.
If I understand correctly, a lot of thought was put into Dylan's
collections, and would have the advantage that, because Dylan is
dynamically typed and based on a variation of CLOS (generic functions
instead of message sending), it is probably much closer to what we need
in Common Lisp.
>> This is incorrect: There is 'concatenate.
>
> Concatenate requires all sequences to be copied. Also, it is
> inconvenient to use as a replacement for APPEND since it requires to
> specify the target sequence type. What I'd like to have is something
> like
There's also ADJOIN. At least that would be the closest to Clojures's
conj. But there's nothing like that for java.util.List and it wouldn't
be efficient. Sounds like you want need something like PUSH.
[...]
>> > The third design issue I found is less important and can be worked
>> > around by stretching the protocol a bit: there's no account for
>> > objects which can be iterated, but not accessed by index, like a hash-
>> > based set. Those aren't strictly speaking sequences, but still they
>> > would benefit from having some of the generic functions specialized on
>> > sequences be specialized on them as well.
>>
>> Could be interesting.
>
> I think so too, especially given that SBCL's protocol adds support for
> iterators. However, if this was being designed from scratch, it would
> make sense to have a type/system class ITERABLE and have SEQUENCE as a
> subtype of it. Since CL can't be retrofitted like that, iterable will
> have to be a type disjoint from sequence.
I think the SERIES package could deal quite naturally with iterators.
In fact, SERIES's generators are almost iterators.
Helmut
> Did anybody look at Dylan?
Well, I did, when I designed and implemented the sequence protocol for
sbcl and wrote up the paper about it. Another source of inspiration was
the sequence interface in Factor.
Christophe
In article <8bgijd...@mid.individual.net>,
Pascal Costanza <p...@p-cos.net> wrote:
> In general, I'm
> quite skeptical about abstracted interfaces for different kinds of
> collections: Each of them has their own advantages and disadvantages,
> and you don't easily change from one to the other without considering
> performance and other implications. So what's the real use of an
> abstract interface, other that you _can_ abstract it?
...
> I don't want to discourage you, but again, what's the real use of such
> abstractions?
To make it easier to change implementations if the assumptions that led
you initially to choose a particular one turn out to be wrong. Which
happens a lot.
Why do you doubt the value of this?
rg
ADJOIN is specified to work on lists only, so it cannot be made
generic. As for java.util.List, you're right, probably I'd need to
have destructive counterparts to "scons" and "sappend" (whoah, I chose
really bad names!). However, the main use case I want to support is to
iterate over lists returned by some Java API, not constructing those
lists efficiently in Lisp. For that, I plan to make our Cons class
implement the java.util.List interface, or, if I find that to not
work, to create a tiny wrapper for that, say ConsList.
>
> [...]
>
> >> > The third design issue I found is less important and can be worked
> >> > around by stretching the protocol a bit: there's no account for
> >> > objects which can be iterated, but not accessed by index, like a hash-
> >> > based set. Those aren't strictly speaking sequences, but still they
> >> > would benefit from having some of the generic functions specialized on
> >> > sequences be specialized on them as well.
>
> >> Could be interesting.
>
> > I think so too, especially given that SBCL's protocol adds support for
> > iterators. However, if this was being designed from scratch, it would
> > make sense to have a type/system class ITERABLE and have SEQUENCE as a
> > subtype of it. Since CL can't be retrofitted like that, iterable will
> > have to be a type disjoint from sequence.
>
> I think the SERIES package could deal quite naturally with iterators.
> In fact, SERIES's generators are almost iterators.
Nice, I never gave SERIES too much attention, it sounds like it's time
to study it :)
Thanks for the advice,
Alessio
I think that as far as API goes, and by that I mean interfaces, the
JCF is nice. I admittedly don't know about Dylan's collections, or
Smalltalk's, or even the STL, but I use the JCF regularly and it works
nice (in the context of a language without any support for functional
programming, that is). It lacks a Bag type, which is a shame, and due
to the Java type system quirks it doesn't allow efficient collections
of primitive types, which can be a huge limitation in certain
applications. But the latter is a problem with Java as a whole, the
poor collections can't do anything about it.
Anyway, what I'm interested in is just to make those collections be
first-class citizens in ABCL, certainly not to make ABCL's sequences
implementation rely on them.
Thanks for the input!
Cheers,
Alessio
I agree with rg. Sure, cons for vectors doesn't make much sense.
Still, having a common set of operators which is guaranteed to work
across all sequences makes it easier to use different sequence
implementations seamlessly. If you notice that you used a list where a
vector would have been more appropriate, you have the chance to use a
vector without changing the client code. Maybe efficiency concerns
will make you change it anyway, but that's not necessary in general.
Alessio
> rg
I find it hard to imagine that this matters beyond simple examples. But
that may just be a limitation of my imagination...
> On 30/07/2010 22:53, RG wrote:
> > This is a meme that needs to be nipped in the bud:
> >
> > In article<8bgijd...@mid.individual.net>,
> > Pascal Costanza<p...@p-cos.net> wrote:
> >
> >> In general, I'm
> >> quite skeptical about abstracted interfaces for different kinds of
> >> collections: Each of them has their own advantages and disadvantages,
> >> and you don't easily change from one to the other without considering
> >> performance and other implications. So what's the real use of an
> >> abstract interface, other that you _can_ abstract it?
> >
> > ...
> >
> >> I don't want to discourage you, but again, what's the real use of such
> >> abstractions?
> >
> > To make it easier to change implementations if the assumptions that led
> > you initially to choose a particular one turn out to be wrong. Which
> > happens a lot.
> >
> > Why do you doubt the value of this?
>
> I find it hard to imagine that this matters beyond simple examples. But
> that may just be a limitation of my imagination...
Who are you and what have you done with Pascal Costanza?
Do you believe that latent typing has value? What exactly is that value
if not the ability to defer the choice of data type? What is the point
of deferring the choice of a data type if you cannot also defer the
choice of algorithm? (What does it even *mean* to defer the choice of
data type without deferring the choice of algorithm?) Or do you think
that latent typing has no benefits beyond "simple examples" and that
complex systems ought to be written in statically typed languages?
rg
These rhetorical games get a bit on my nerves.
> Do you believe that latent typing has value?
Yes.
> What exactly is that value
> if not the ability to defer the choice of data type?
That's one among many advantages of "latent" typing. There are others.
> What is the point
> of deferring the choice of a data type if you cannot also defer the
> choice of algorithm? (What does it even *mean* to defer the choice of
> data type without deferring the choice of algorithm?)
If you have to change the data type, you probably also have to change
the algorithm manually yourself, at least that seems to be the case
based on my experience.
A good example is 'elt: It's a good accessor for vectors, but a very bad
one for lists. It may be convenient that 'elt abstracts away from lists
and vectors, but eventually, it probably matters to remove 'elt and
rewrite your algorithm, at least when you arrive at using lists. To me,
such an abstraction seems like a minor advantage.
There are better examples, I know. But they don't really convince me
that much either. Maybe your experiences are just different from mine,
who knows.
> Or do you think
> that latent typing has no benefits beyond "simple examples" and that
> complex systems ought to be written in statically typed languages?
This is not an implication of what I said.
I'm not trying to argue against building a good abstract collection API,
if you believe that's going to buy you something. I'm just trying to
express my skepticism about their usefulness, which seems to be
considered higher than it actually is, IMHO. YMMV.
Don't let me stop you. Everybody should do what they think is right.
Oh come on, grow a sense of humor.
> > Do you believe that latent typing has value?
>
> Yes.
>
> > What exactly is that value
> > if not the ability to defer the choice of data type?
>
> That's one among many advantages of "latent" typing. There are others.
Really? Like what?
> > What is the point
> > of deferring the choice of a data type if you cannot also defer the
> > choice of algorithm? (What does it even *mean* to defer the choice of
> > data type without deferring the choice of algorithm?)
>
> If you have to change the data type, you probably also have to change
> the algorithm manually yourself, at least that seems to be the case
> based on my experience.
>
> A good example is 'elt: It's a good accessor for vectors, but a very bad
> one for lists.
Really? What would you recommend to use instead? NTH?
> It may be convenient that 'elt abstracts away from lists
> and vectors, but eventually, it probably matters to remove 'elt and
> rewrite your algorithm, at least when you arrive at using lists. To me,
> such an abstraction seems like a minor advantage.
You're missing the point. If you use ELT then it's easier to replace
lists with vectors than if you use NTH. So why not use ELT?
> There are better examples, I know.
Indeed. A much better example is GETF/ASSOC/GETHASH. There is no
equivalent in CL of ELT for these accessors, so the design of CL
*requires* you to make a premature optimization if you want an
associative map. (Worse, CL doesn't provide reader syntax for hash
tables, which implicitly disparages them in favor of ALists and PLists,
both of which are almost never the right choice for anything except the
simplest examples.)
> But they don't really convince me
> that much either. Maybe your experiences are just different from mine,
> who knows.
Have you never had to change large sections of code because you found
out after you wrote it that it ran too slowly because you used a list
where you should have used a vector? Or an AList where you should have
used a hash table? Or a hash table where you should have used a
red-black tree?
> > Or do you think
> > that latent typing has no benefits beyond "simple examples" and that
> > complex systems ought to be written in statically typed languages?
>
> This is not an implication of what I said.
It wasn't intended to be. It was intended to be a possible explanation
of why you said what you said.
> I'm not trying to argue against building a good abstract collection API,
> if you believe that's going to buy you something. I'm just trying to
> express my skepticism about their usefulness, which seems to be
> considered higher than it actually is, IMHO. YMMV.
But that is exactly what puzzles me. You see the value in latent
typing, but not in abstraction. And yet the value in those two things
is exactly the same, namely, the ability to write code in a way that
lets you defer decisions. So I don't understand how you can value one
and not the other.
rg
I don't share your sense of humor. I find these parts of conversations
with you unpleasant, which is a pity, because I find most other parts of
conversations with you quite interesting.
>>> Do you believe that latent typing has value?
>>
>> Yes.
>>
>>> What exactly is that value
>>> if not the ability to defer the choice of data type?
>>
>> That's one among many advantages of "latent" typing. There are others.
>
> Really? Like what?
Static type systems have to reject programs that can be nevertheless
correct, which can get in the way.
Dynamically typed languages can support higher degrees of reflection at
runtime.
Just two more.
>>> What is the point
>>> of deferring the choice of a data type if you cannot also defer the
>>> choice of algorithm? (What does it even *mean* to defer the choice of
>>> data type without deferring the choice of algorithm?)
>>
>> If you have to change the data type, you probably also have to change
>> the algorithm manually yourself, at least that seems to be the case
>> based on my experience.
>>
>> A good example is 'elt: It's a good accessor for vectors, but a very bad
>> one for lists.
>
> Really? What would you recommend to use instead? NTH?
I would recommend not to use random access into lists, but rather using
mapcar and loops over list elements, and the likes.
>> It may be convenient that 'elt abstracts away from lists
>> and vectors, but eventually, it probably matters to remove 'elt and
>> rewrite your algorithm, at least when you arrive at using lists. To me,
>> such an abstraction seems like a minor advantage.
>
> You're missing the point. If you use ELT then it's easier to replace
> lists with vectors than if you use NTH. So why not use ELT?
If you use lists with ELT, this is already fishy anyway.
>> There are better examples, I know.
>
> Indeed. A much better example is GETF/ASSOC/GETHASH. There is no
> equivalent in CL of ELT for these accessors, so the design of CL
> *requires* you to make a premature optimization if you want an
> associative map. (Worse, CL doesn't provide reader syntax for hash
> tables, which implicitly disparages them in favor of ALists and PLists,
> both of which are almost never the right choice for anything except the
> simplest examples.)
Alists have the easiest way of determining whether an association was
actually in a list or not. Plists mesh best with keyword arguments, and
similar mappings, and I somewhat prefer them over alists for subjective
reasons. Hash tables are only more efficient if there are more than 50,
or so, mappings in a table (unless the hash table does automatic
adaptation of its internal representation, but I'm not aware of a good
implementation of such an adaptive hash table). To me, this is another
good example of data structures that provide slightly interesting
differences that I nevertheless find worthwhile to be able to take
advantage of. I would not like these differences to be removed because
of an attempt to make everything look the same.
Fortunately, there is no contradiction there. You could use a collection
library, while I could continue to use the low-level substrates...
>> But they don't really convince me
>> that much either. Maybe your experiences are just different from mine,
>> who knows.
>
> Have you never had to change large sections of code because you found
> out after you wrote it that it ran too slowly because you used a list
> where you should have used a vector? Or an AList where you should have
> used a hash table? Or a hash table where you should have used a
> red-black tree?
Sure. It's a trade off. I still prefer being able to make the
fine-grained distinctions.
>>> Or do you think
>>> that latent typing has no benefits beyond "simple examples" and that
>>> complex systems ought to be written in statically typed languages?
>>
>> This is not an implication of what I said.
>
> It wasn't intended to be. It was intended to be a possible explanation
> of why you said what you said.
I don't know what to make of this. I'm pretty sure you are aware of my
position against static typing. So it seems to me that you're asking
this rhetorical question to provoke a certain reaction in me. Why do you
do that?
>> I'm not trying to argue against building a good abstract collection API,
>> if you believe that's going to buy you something. I'm just trying to
>> express my skepticism about their usefulness, which seems to be
>> considered higher than it actually is, IMHO. YMMV.
>
> But that is exactly what puzzles me. You see the value in latent
> typing, but not in abstraction. And yet the value in those two things
> is exactly the same, namely, the ability to write code in a way that
> lets you defer decisions. So I don't understand how you can value one
> and not the other.
I see value in abstraction, but not as much as seems to be the common
case. It's not an either-or thing, there are different degrees possible.
For example, static typing allows for stronger abstractions than dynamic
typing, yet I still prefer dynamic typing, with a reasonable amount of
abstractions here and there.
Funny. Looking there at
5 Hashed Collections
gives me the impression that JCF was inspired by Smalltalk's collections.
> I don't share your sense of humor.
Apparently. FWIW, there was a serious point implied in that comment,
namely, that your position on abstraction seems contradictory to your
position on dynamic typing. See below.
> >>> Do you believe that latent typing has value?
> >>
> >> Yes.
> >>
> >>> What exactly is that value
> >>> if not the ability to defer the choice of data type?
> >>
> >> That's one among many advantages of "latent" typing. There are others.
> >
> > Really? Like what?
>
> Static type systems have to reject programs that can be nevertheless
> correct, which can get in the way.
No, they don't *have* to reject them. They just *do* as a matter of
common practice. There is nothing to prevent a compiler for a
statically typed language from compiling the program in such a way that
it produces errors at run time rather than compile time.
> Dynamically typed languages can support higher degrees of reflection at
> runtime.
Also not true. Statically typed languages *can* have just as much
run-time reflection as dynamically typed languages. But as a matter of
practice they don't because it is *possible* (but not necessary) to
compile these languages in such a way that there is nothing left at run
time to reflect upon.
> >>> What is the point
> >>> of deferring the choice of a data type if you cannot also defer the
> >>> choice of algorithm? (What does it even *mean* to defer the choice of
> >>> data type without deferring the choice of algorithm?)
> >>
> >> If you have to change the data type, you probably also have to change
> >> the algorithm manually yourself, at least that seems to be the case
> >> based on my experience.
> >>
> >> A good example is 'elt: It's a good accessor for vectors, but a very bad
> >> one for lists.
> >
> > Really? What would you recommend to use instead? NTH?
>
> I would recommend not to use random access into lists, but rather using
> mapcar and loops over list elements, and the likes.
You're making a tacit assumption that I'm iterating over the elements of
the list in sequence. What if that's not what I'm doing? What if I
just want to (repeatedly) extract a single element located at a position
that isn't known until run time?
> >> There are better examples, I know.
> >
> > Indeed. A much better example is GETF/ASSOC/GETHASH. There is no
> > equivalent in CL of ELT for these accessors, so the design of CL
> > *requires* you to make a premature optimization if you want an
> > associative map. (Worse, CL doesn't provide reader syntax for hash
> > tables, which implicitly disparages them in favor of ALists and PLists,
> > both of which are almost never the right choice for anything except the
> > simplest examples.)
>
> Alists have the easiest way of determining whether an association was
> actually in a list or not.
That's not an inherent property of ALists, that's a consequence of the
way ASSOC is written, so that you have to make an additional call to
separate the key and value. You could do exactly the same thing with
PLists with an accessor whose return convention was analogous to MEMBER.
But this is beside the point.
> Hash tables are only more efficient if there are more than 50,
> or so, mappings in a table (unless the hash table does automatic
> adaptation of its internal representation, but I'm not aware of a good
> implementation of such an adaptive hash table). To me, this is another
> good example of data structures that provide slightly interesting
> differences that I nevertheless find worthwhile to be able to take
> advantage of.
You find the difference between O(1) access and O(n) access "slightly
interesting"?!? My, we really do have a difference in perspective here.
This is exactly my point: if you believe you will have <50 elements that
will lead you to make a particular choice. If you later learn that you
in fact have more than 50 elements and you need to change the
representation in order to achieve adequate performance and you have not
used an abstraction layer then you will have a lot more work to do to
fix the problem than if you had used an abstraction layer. Ergo one
ought to use abstraction layers except in cases where there is no doubt
that all of the factors that go into such decisions are reliably known
and will not change. I submit that in real-world situations, such cases
are rare. Furthermore, the fact that they are rare is the main reason
that latent typing is useful.
> I would not like these differences to be removed because
> of an attempt to make everything look the same.
Neither would I. Just because you use an abstraction doesn't mean you
must relinquish all control of the underlying implementation.
> Fortunately, there is no contradiction there. You could use a collection
> library, while I could continue to use the low-level substrates...
If you were just writing code for your own use I could accept that. But
you are an academic. You pass your ideas on to your students and people
who read your papers. You are also a well known and highly respected
member of this community and people listen to what you have to say. So
the damage you do by advancing bad advice is not limited to your own
code.
> >> But they don't really convince me
> >> that much either. Maybe your experiences are just different from mine,
> >> who knows.
> >
> > Have you never had to change large sections of code because you found
> > out after you wrote it that it ran too slowly because you used a list
> > where you should have used a vector? Or an AList where you should have
> > used a hash table? Or a hash table where you should have used a
> > red-black tree?
>
> Sure. It's a trade off. I still prefer being able to make the
> fine-grained distinctions.
So do I. Maybe this is the source of your confusion so I'll reiterate:
using an abstraction layer does NOT require you to relinquish control
over the underlying implementation.
> >>> Or do you think
> >>> that latent typing has no benefits beyond "simple examples" and that
> >>> complex systems ought to be written in statically typed languages?
> >>
> >> This is not an implication of what I said.
> >
> > It wasn't intended to be. It was intended to be a possible explanation
> > of why you said what you said.
>
> I don't know what to make of this. I'm pretty sure you are aware of my
> position against static typing.
Of course.
> So it seems to me that you're asking
> this rhetorical question to provoke a certain reaction in me. Why do you
> do that?
In order to get at the root of our disagreement. We agree about the
desirability of dynamic typing but not about abstraction. But this is
odd because IMO the reason dynamic typing is desirable is *because* it
*is* an abstraction. There is no essential difference in my mind
between forcing me to choose at compile time whether a variable is an
int or a float, and forcing me to choose whether it is an AList or a
PList. So by asking you to explain your position one of two things will
happen. Either you will come to this realization, or you will be able
to explain to me why AList/PList *is* a distinction that one ought to be
forced to make when one writes code even though int/float isn't, and I
will learn something.
> >> I'm not trying to argue against building a good abstract collection API,
> >> if you believe that's going to buy you something. I'm just trying to
> >> express my skepticism about their usefulness, which seems to be
> >> considered higher than it actually is, IMHO. YMMV.
> >
> > But that is exactly what puzzles me. You see the value in latent
> > typing, but not in abstraction. And yet the value in those two things
> > is exactly the same, namely, the ability to write code in a way that
> > lets you defer decisions. So I don't understand how you can value one
> > and not the other.
>
> I see value in abstraction, but not as much as seems to be the common
> case. It's not an either-or thing, there are different degrees possible.
> For example, static typing allows for stronger abstractions than dynamic
> typing, yet I still prefer dynamic typing, with a reasonable amount of
> abstractions here and there.
(What is a "stronger" abstraction?)
But this is just what I'm trying to get at. *Why* do you see more value
in the particular abstraction of dynamic typing than you do in
abstraction in general?
rg
>>>>> Do you believe that latent typing has value?
>>>>
>>>> Yes.
>>>>
>>>>> What exactly is that value
>>>>> if not the ability to defer the choice of data type?
>>>>
>>>> That's one among many advantages of "latent" typing. There are others.
>>>
>>> Really? Like what?
>>
>> Static type systems have to reject programs that can be nevertheless
>> correct, which can get in the way.
>
> No, they don't *have* to reject them. They just *do* as a matter of
> common practice. There is nothing to prevent a compiler for a
> statically typed language from compiling the program in such a way that
> it produces errors at run time rather than compile time.
Ok, I rephrase: static type systems have to produce errors for certain
programs that may otherwise run correctly.
>> Dynamically typed languages can support higher degrees of reflection at
>> runtime.
>
> Also not true. Statically typed languages *can* have just as much
> run-time reflection as dynamically typed languages.
This is demonstrably false. Dynamically typed languages allow for
degrees of reflection that violate assumptions that static type systems
usually make - for example that slots or functions are not removed at
run time. This can cause run-time errors where a static type system
would predict the absence of such errors. Supporters of static type
systems wouldn't be very happy with a type system that "proves" the
absence of certain errors, but allows the run-time system to reintroduce
these errors.
>>>>> What is the point
>>>>> of deferring the choice of a data type if you cannot also defer the
>>>>> choice of algorithm? (What does it even *mean* to defer the choice of
>>>>> data type without deferring the choice of algorithm?)
>>>>
>>>> If you have to change the data type, you probably also have to change
>>>> the algorithm manually yourself, at least that seems to be the case
>>>> based on my experience.
>>>>
>>>> A good example is 'elt: It's a good accessor for vectors, but a very bad
>>>> one for lists.
>>>
>>> Really? What would you recommend to use instead? NTH?
>>
>> I would recommend not to use random access into lists, but rather using
>> mapcar and loops over list elements, and the likes.
>
> You're making a tacit assumption that I'm iterating over the elements of
> the list in sequence. What if that's not what I'm doing? What if I
> just want to (repeatedly) extract a single element located at a position
> that isn't known until run time?
Then a list is not a good choice of data structure in such a case.
>>>> There are better examples, I know.
>>>
>>> Indeed. A much better example is GETF/ASSOC/GETHASH. There is no
>>> equivalent in CL of ELT for these accessors, so the design of CL
>>> *requires* you to make a premature optimization if you want an
>>> associative map. (Worse, CL doesn't provide reader syntax for hash
>>> tables, which implicitly disparages them in favor of ALists and PLists,
>>> both of which are almost never the right choice for anything except the
>>> simplest examples.)
>>
>> Alists have the easiest way of determining whether an association was
>> actually in a list or not.
>
> That's not an inherent property of ALists, that's a consequence of the
> way ASSOC is written, so that you have to make an additional call to
> separate the key and value. You could do exactly the same thing with
> PLists with an accessor whose return convention was analogous to MEMBER.
Maybe. It just happens not to be the case.
> But this is beside the point.
Indeed.
>> Hash tables are only more efficient if there are more than 50,
>> or so, mappings in a table (unless the hash table does automatic
>> adaptation of its internal representation, but I'm not aware of a good
>> implementation of such an adaptive hash table). To me, this is another
>> good example of data structures that provide slightly interesting
>> differences that I nevertheless find worthwhile to be able to take
>> advantage of.
>
> You find the difference between O(1) access and O(n) access "slightly
> interesting"?!? My, we really do have a difference in perspective here.
You're discussion style is degrading again. A difference between O(1)
and O(n) is obviously not a slight difference, so instead of assuming
that I'm stupid, you could also assume that I meant something else.
> This is exactly my point: if you believe you will have<50 elements that
> will lead you to make a particular choice. If you later learn that you
> in fact have more than 50 elements and you need to change the
> representation in order to achieve adequate performance and you have not
> used an abstraction layer then you will have a lot more work to do to
> fix the problem than if you had used an abstraction layer. Ergo one
> ought to use abstraction layers except in cases where there is no doubt
> that all of the factors that go into such decisions are reliably known
> and will not change. I submit that in real-world situations, such cases
> are rare.
Well, that's your opinion. It's good that we can create a situation
where we are both happy.
> Furthermore, the fact that they are rare is the main reason
> that latent typing is useful.
The existence of collection frameworks in statically typed languages
proves that this issue is independent of whether you have latent or
static typing.
>> I would not like these differences to be removed because
>> of an attempt to make everything look the same.
>
> Neither would I. Just because you use an abstraction doesn't mean you
> must relinquish all control of the underlying implementation.
>
>> Fortunately, there is no contradiction there. You could use a collection
>> library, while I could continue to use the low-level substrates...
>
> If you were just writing code for your own use I could accept that. But
> you are an academic. You pass your ideas on to your students and people
> who read your papers. You are also a well known and highly respected
> member of this community and people listen to what you have to say. So
> the damage you do by advancing bad advice is not limited to your own
> code.
I hope that people don't just blindly follow other people, but instead
form their own opinion. What we discuss here is a matter of style, not a
matter of right or wrong, or bad or good. I encourage my students to use
their own programming styles, and this works quite well.
I disagree with your assessment that my style creates 'damage'.
>>>> But they don't really convince me
>>>> that much either. Maybe your experiences are just different from mine,
>>>> who knows.
>>>
>>> Have you never had to change large sections of code because you found
>>> out after you wrote it that it ran too slowly because you used a list
>>> where you should have used a vector? Or an AList where you should have
>>> used a hash table? Or a hash table where you should have used a
>>> red-black tree?
>>
>> Sure. It's a trade off. I still prefer being able to make the
>> fine-grained distinctions.
>
> So do I. Maybe this is the source of your confusion so I'll reiterate:
> using an abstraction layer does NOT require you to relinquish control
> over the underlying implementation.
I'm not stopping anybody from creating a good collection framework. If
the result convinces me, I'm happy to use it. What I have seen so far
doesn't convince me, but who am I to predict there can't be anything better?
>>>>> Or do you think
>>>>> that latent typing has no benefits beyond "simple examples" and that
>>>>> complex systems ought to be written in statically typed languages?
>>>>
>>>> This is not an implication of what I said.
>>>
>>> It wasn't intended to be. It was intended to be a possible explanation
>>> of why you said what you said.
>>
>> I don't know what to make of this. I'm pretty sure you are aware of my
>> position against static typing.
>
> Of course.
>
>> So it seems to me that you're asking
>> this rhetorical question to provoke a certain reaction in me. Why do you
>> do that?
>
> In order to get at the root of our disagreement. We agree about the
> desirability of dynamic typing but not about abstraction. But this is
> odd because IMO the reason dynamic typing is desirable is *because* it
> *is* an abstraction. There is no essential difference in my mind
> between forcing me to choose at compile time whether a variable is an
> int or a float, and forcing me to choose whether it is an AList or a
> PList. So by asking you to explain your position one of two things will
> happen. Either you will come to this realization, or you will be able
> to explain to me why AList/PList *is* a distinction that one ought to be
> forced to make when one writes code even though int/float isn't, and I
> will learn something.
I think here lies your confusion: I don't want to force anybody to do
anything. I just wanted to express my skepticism about collection
frameworks, that's all.
Maybe you're right, maybe the decision between alist, plists, etc., that
we have to make in Common Lisp is suboptimal. But I think the kind of
abstraction provided by collection frameworks is also overrated.
"Abstraction" is not a goal in its own right, it's only a goal to the
extent that it serves a purpose and solves an actual problem. I don't
see what problem a collection framework solves. (Yes, I understand your
examples of wanting to change between alists, plists and hash tables,
but I don't think they are as harsh as some people, including you,
believe them to be).
>>>> I'm not trying to argue against building a good abstract collection API,
>>>> if you believe that's going to buy you something. I'm just trying to
>>>> express my skepticism about their usefulness, which seems to be
>>>> considered higher than it actually is, IMHO. YMMV.
>>>
>>> But that is exactly what puzzles me. You see the value in latent
>>> typing, but not in abstraction. And yet the value in those two things
>>> is exactly the same, namely, the ability to write code in a way that
>>> lets you defer decisions. So I don't understand how you can value one
>>> and not the other.
>>
>> I see value in abstraction, but not as much as seems to be the common
>> case. It's not an either-or thing, there are different degrees possible.
>> For example, static typing allows for stronger abstractions than dynamic
>> typing, yet I still prefer dynamic typing, with a reasonable amount of
>> abstractions here and there.
>
> (What is a "stronger" abstraction?)
As an example, a static type like Collection<T> is such a stronger
abstraction, because it constrains the collection to have elements of
all the same type, without specifying what that particular type is.
> But this is just what I'm trying to get at. *Why* do you see more value
> in the particular abstraction of dynamic typing than you do in
> abstraction in general?
I'm only interested in dynamic typing to the extent that it allows me to
do things that I cannot do in statically typed languages. The connection
between "abstraction" and "dynamic typing" exists in your mind, not in mine.
Could be :)
> Currently I restrict myself to what Lisp offers to make my program run
> first and optimize later.
If you are using what Lisp offers, you are already optimizing
prematurely. Use FSet and you can pessimize prematurely instead :)
Ha ha, just kidding :)
Seriously -- the design of FSet is such that every operation is within a
log n factor of the best possible time complexity for that operation.
So you can use the FSet types without worrying that there will be some
operation whose time complexity will be problematic (e.g., linear-time
list indexing) and so will force you to a different data structure.
Also, FSet is quite space-efficient for small collections (unlike hash
tables).
Plus, you get the stylistic benefits of functional collections.
So what I'm suggesting is to use the FSet collections by default. They
are easy to use and packed with featureful goodness.
Admittedly, it's still possible when using FSet that there will be some
inner loop that updates a collection such that the time and garbage
overhead of a functional collection will prove to be excessive. In that
case one will eventually want to change that loop to use a different
data structure. But since this is just a linear-factor problem rather
than a time complexity problem, the change won't be so urgent. E.g.,
the loop might run half as fast as you want for any size input, but it
won't blow up for large inputs.
-- Scott
> >> Hash tables are only more efficient if there are more than 50,
> >> or so, mappings in a table (unless the hash table does automatic
> >> adaptation of its internal representation, but I'm not aware of a good
> >> implementation of such an adaptive hash table). To me, this is another
> >> good example of data structures that provide slightly interesting
> >> differences that I nevertheless find worthwhile to be able to take
> >> advantage of.
> >
> > You find the difference between O(1) access and O(n) access "slightly
> > interesting"?!? My, we really do have a difference in perspective here.
>
> You're discussion style is degrading again. A difference between O(1)
> and O(n) is obviously not a slight difference, so instead of assuming
> that I'm stupid, you could also assume that I meant something else.
I'm not assuming that you're stupid, I am hypothesizing that we are
employing very different quality metrics.
> > Furthermore, the fact that they are rare is the main reason
> > that latent typing is useful.
>
> The existence of collection frameworks in statically typed languages
> proves that this issue is independent of whether you have latent or
> static typing.
We've gotten too hung up on latent typing. Let me try this another way:
the whole point of dynamicism in Lisp, whether it's latent typing, or
being able to redefine functions and methods at run time, or being able
to do things like change-class, is to be able to easily make changes at
run time. It is puzzling to me that someone who is so enthusiastic
about Lisp in particular and dynamicism in general is so unenthusiastic
about collection abstractions, which also serve to make it easy to make
changes at run time.
> > If you were just writing code for your own use I could accept that. But
> > you are an academic. You pass your ideas on to your students and people
> > who read your papers. You are also a well known and highly respected
> > member of this community and people listen to what you have to say. So
> > the damage you do by advancing bad advice is not limited to your own
> > code.
>
> I hope that people don't just blindly follow other people, but instead
> form their own opinion. What we discuss here is a matter of style, not a
> matter of right or wrong, or bad or good. I encourage my students to use
> their own programming styles, and this works quite well.
>
> I disagree with your assessment that my style creates 'damage'.
It's not your style that's the problem. It's the substance of what you
say.
> >>>> But they don't really convince me
> >>>> that much either. Maybe your experiences are just different from mine,
> >>>> who knows.
> >>>
> >>> Have you never had to change large sections of code because you found
> >>> out after you wrote it that it ran too slowly because you used a list
> >>> where you should have used a vector? Or an AList where you should have
> >>> used a hash table? Or a hash table where you should have used a
> >>> red-black tree?
> >>
> >> Sure. It's a trade off. I still prefer being able to make the
> >> fine-grained distinctions.
> >
> > So do I. Maybe this is the source of your confusion so I'll reiterate:
> > using an abstraction layer does NOT require you to relinquish control
> > over the underlying implementation.
>
> I'm not stopping anybody from creating a good collection framework. If
> the result convinces me, I'm happy to use it. What I have seen so far
> doesn't convince me, but who am I to predict there can't be anything better?
What have you looked at? Where do the things you've looked at fall
short?
No, I get that. What I don't get is the source of your skepticism. It
doesn't jibe with your lack of skepticism about Lisp in general.
> Maybe you're right, maybe the decision between alist, plists, etc., that
> we have to make in Common Lisp is suboptimal. But I think the kind of
> abstraction provided by collection frameworks is also overrated.
Overrated according to whom? I don't hear anyone saying that collection
frameworks are going to save the world. I'm certainly not saying that.
All I'm saying is that all else being equal, hiding different
implementations of the same interface behind a common abstraction is a
win. Maybe not a huge win, but a win nonetheless.
> "Abstraction" is not a goal in its own right, it's only a goal to the
> extent that it serves a purpose and solves an actual problem. I don't
> see what problem a collection framework solves. (Yes, I understand your
> examples of wanting to change between alists, plists and hash tables,
> but I don't think they are as harsh as some people, including you,
> believe them to be).
OK, I guess we'll just have to agree to disagree about that. Maybe you
never experienced the pain of having to make a change in a large system
that was not properly abstracted.
rg
Scott L. Burson wrote:
> If you are using what Lisp offers, you are already optimizing
> prematurely. Use FSet and you can pessimize prematurely instead :)
>
> Ha ha, just kidding :)
I don't get the joke?
(knowing the difference "optimize" vs. "pessimize").
> So what I'm suggesting is to use the FSet collections by default. They
> are easy to use and packed with featureful goodness.
OKAY, OKAY!
I'll give them a try next week.
:)
> [...] But since this is just a linear-factor problem rather
> than a time complexity problem, the change won't be so urgent. E.g., the
> loop might run half as fast as you want for any size input, but it won't
> blow up for large inputs.
Good point. I don't care much for linear factors, either.
Norbert
I think he's saying "give up O(n) now for o(n log n), and never worry
about O(n^2)".
- Daniel
I'm poking fun at myself on behalf of all those people who automatically
think that functional collections must be too slow.
>> So what I'm suggesting is to use the FSet collections by default. They
>> are easy to use and packed with featureful goodness.
> OKAY, OKAY!
> I'll give them a try next week.
> :)
Cool :)
>> [...] But since this is just a linear-factor problem rather
>> than a time complexity problem, the change won't be so urgent. E.g., the
>> loop might run half as fast as you want for any size input, but it won't
>> blow up for large inputs.
> Good point. I don't care much for linear factors, either.
In the kind of code I work on, I hardly worry about them at all (within
reason, of course). If the algorithm is subquadratic, it's fast enough.
-- Scott
Well, your hypothesis is wrong, I meant something else.
>>> If you were just writing code for your own use I could accept that. But
>>> you are an academic. You pass your ideas on to your students and people
>>> who read your papers. You are also a well known and highly respected
>>> member of this community and people listen to what you have to say. So
>>> the damage you do by advancing bad advice is not limited to your own
>>> code.
>>
>> I hope that people don't just blindly follow other people, but instead
>> form their own opinion. What we discuss here is a matter of style, not a
>> matter of right or wrong, or bad or good. I encourage my students to use
>> their own programming styles, and this works quite well.
>>
>> I disagree with your assessment that my style creates 'damage'.
>
> It's not your style that's the problem. It's the substance of what you
> say.
I also disagree that the substance of why I say creates 'damage'.
>> I'm not stopping anybody from creating a good collection framework. If
>> the result convinces me, I'm happy to use it. What I have seen so far
>> doesn't convince me, but who am I to predict there can't be anything better?
>
> What have you looked at? Where do the things you've looked at fall
> short?
Ah, picked up from another part of this thread: I'm always forgetting
FSet in such discussions. FSet looks really promising. I should really
take a closer look at it, maybe that one will convince me...
>> Maybe you're right, maybe the decision between alist, plists, etc., that
>> we have to make in Common Lisp is suboptimal. But I think the kind of
>> abstraction provided by collection frameworks is also overrated.
>
> Overrated according to whom? I don't hear anyone saying that collection
> frameworks are going to save the world. I'm certainly not saying that.
> All I'm saying is that all else being equal, hiding different
> implementations of the same interface behind a common abstraction is a
> win. Maybe not a huge win, but a win nonetheless.
Well, what I'm trying to get across is that not all else is equal, IMHO.
>> "Abstraction" is not a goal in its own right, it's only a goal to the
>> extent that it serves a purpose and solves an actual problem. I don't
>> see what problem a collection framework solves. (Yes, I understand your
>> examples of wanting to change between alists, plists and hash tables,
>> but I don't think they are as harsh as some people, including you,
>> believe them to be).
>
> OK, I guess we'll just have to agree to disagree about that. Maybe you
> never experienced the pain of having to make a change in a large system
> that was not properly abstracted.
Yes, I did, but not because of collections.
> OK, I guess we'll just have to agree to disagree about that. Maybe
> you never experienced the pain of having to make a change in a large
> system that was not properly abstracted.
"Properly abstracted" is the issue. Good software uses a tree of
abstractions. If you have to make changes throughout your code because
of a change in usage of a collection abstraction, it's because the higher
level abstractions weren't done properly, not because the collection
abstraction wasn't done properly.
If you change a list to a vector, why should you have to change access to
it in dozens of different places? Why not only in the abstraction where
the list or vector is used?
That's the fallacy of the value of collection frameworks. They're
inherently low level. Programmers use them incorrectly as substitutes
for proper abstraction.
> But that is exactly what puzzles me. You see the value in latent
> typing, but not in abstraction. And yet the value in those two things
> is exactly the same, namely, the ability to write code in a way that
> lets you defer decisions. So I don't understand how you can value one
> and not the other.
If I may interject, I do see value in abstraction, but it will usually
take a different form than what you propose.
Let's say that we in our program have objects of the kind foo, indexed
by bar, stored in collections bork. You do not want to prematurely
select the implementation of bork to be either hash tables, alists or
something else. You now write accessors like ADD-FOO, FIND-FOO, and the
like, essentially creating an ADT specific to the program. This ADT
encapsulates whatever implementation is chosen for bork, making change
later easy. The advantages of this approach are plenty: The accessors
can be written to fit the semantics of foo, bar and bork, rather than
shoe-horning the behaviour of those into some generic collections
framework. It makes the code much more descriptive of what it is doing
than if it is sprinkled with "generic" REF calls, which from a
documentation perspective is even worse than having it filled with calls
to GETHASH.
In other words, instead of using a generic collections framework which
by necessity will have the semantics of a lowest common denominator and
not the ones of your program, construct abstractions specific to your
problem.
What happened to the idea of building up the language to your problem?
Björn Lindberg
One abstraction does not exclude the other. Abstractions can be
layered. In your example, ADD-FOO and FIND-FOO might call, say, CONCAT
and FIND rather than CONS/APPEND and... the equivalent of FIND that
only works on lists, if it existed.
It all boils down to some common sense. Just like we use + rather than
fixnum+, bignum+, float+, double+, it's often convenient to use ELT
rather than AREF or NTH. In fact, of the CL operators that deal with
lists, most actually abstract over lists and vectors (and other
sequence types if the implementation allows them).
Alessio
That is true, but if you create custom ADTs and interfaces as needed,
then what Ron advocates as the big advantage of generic collection
frameworks, namely ease of refactoring, no longer speaks for them.
> It all boils down to some common sense. Just like we use + rather than
> fixnum+, bignum+, float+, double+, it's often convenient to use ELT
> rather than AREF or NTH.
No, these are two completely different topics. We do not use generic
arithmetic operators to delay an implementation choice of which number
type to use.
Björn Lindberg
ah! I'd been wondering how to put that.
I'd been thinking "but SICP uses abstractions all over the place"
We use them because we don't care about the low-level details of how
arithmetic operations are implemented on the CPU; we only care about
summing numbers together. Similarly, at a higher level, there are
cases when we don't care whether a sequence is a list or a vector; we
just want to append an element to it, or change the value of the nth
element, and so on.
If there were no abstractions - and by that I mean no generic
operations that abstract over several data types - then all the
benefits of dynamic typing would be gone. If I have to write list-nth,
vector-nth, int+, float+, ..., type-operation for each (type,
operation) pair, I'm basically writing a statically-typed program that
is just not checked for correctness at compile-time - i.e. the worst
case possible :)
Alessio
> Björn Lindberg
I can agree with that.
Nonetheless, depending on the operation and the types of collections,
it may make sense or not to have a generic API.
aref, nth -> elt -- ok, it's provided by CL.
push, vector-push -> ? -- here you already have a problem, that
vector-push doesn't work on all vectors, but only on vectors with a
fill-pointer. And you'd really want vector-push-extend, and therefore
adjustable vectors.
Or you would have to provide a vector pushing function (that would
copy the vector into a 1-bigger one to insert the new element, which
would be awfully slow).
So, my point is that CL data types are two low level to provide a
consistent generic collection API. If you want collections with
different classes of time or space complexities, including some
looking like lists and some looking like vectors, you will have to
implement them, using CL data types, but more sophisticated.
Said otherwise, use FSet, or some similar library.
--
__Pascal Bourguignon__ http://www.informatimago.com/
> On Aug 3, 2:02 pm, bj...@runa.se (Björn Lindberg) wrote:
>> No, these are two completely different topics. We do not use generic
>> arithmetic operators to delay an implementation choice of which number
>> type to use.
>
> We use them because we don't care about the low-level details of how
> arithmetic operations are implemented on the CPU; we only care about
> summing numbers together. Similarly, at a higher level, there are
> cases when we don't care whether a sequence is a list or a vector; we
> just want to append an element to it, or change the value of the nth
> element, and so on.
I would claim that it is comparatively rare that you don't care if
something is a list or vector. In particular, this discussion is about
how to factor code so that you can change the underlying implementation
after the fact, i.e. we are _very_ concerned with whether it is a list
or a vector.
> If there were no abstractions - and by that I mean no generic
> operations that abstract over several data types - then all the
> benefits of dynamic typing would be gone.
First of all, no one has disputed the usefulness of abstractions. In
fact, if you re-read my posts you will see that I advocate application-
specific abstractions in favour of generic framework abstractions.
Secondly, the kind of abstraction we are discussing here, namely one to
allow changing the implementation of some data type in a later version
of the program, _has_ no dynamic properties. In version X of the
program, a particular call site (ref ...) will mean ASSOC, in version Y
GETHASH. No dynamicity involved. In contrast, your straw man the
arithmetic operators are very likely to accept different number types at
run-time.
> If I have to write list-nth, vector-nth, int+, float+, ...,
> type-operation for each (type, operation) pair, I'm basically writing
> a statically-typed program that is just not checked for correctness at
> compile-time - i.e. the worst case possible :)
Yeah, I wouldn't do that.
Björn Lindberg
The vector pushing function could be more intelligent than that. For
example, if the vector has a fill pointer, use it, otherwise create a
new vector with a fill pointer, copy the original into it, and return
it. And even if the function was naive it would still do fine as long
as performance doesn't matter much. If it turns out it does, you have
the option to change data structure (without changing the code) or to
use more specialized operators (losing the advantages of abstraction
in return of better performance) or to use a different algorithm. The
same argument could be constructed for ELT and LENGTH too, which are
O(1) on vectors but O(n) on lists. A higher-level list data type could
make LENGTH be O(1) by incrementing a counter for every element pushed
onto the list and decrementing it for every element popped off it.
Still, ELT and LENGTH are useful even on the low-level CL sequences.
> So, my point is that CL data types are two low level to provide a
> consistent generic collection API. If you want collections with
> different classes of time or space complexities, including some
> looking like lists and some looking like vectors, you will have to
> implement them, using CL data types, but more sophisticated.
>
> Said otherwise, use FSet, or some similar library.
I believe FSet would be used much more if there were a de facto
standard API for sequences that worked reasonably well with both CL
sequences and FSet. One of my original points was that the sequence
API proposed by Christophe Rhodes for SBCL, while great in many
aspects, doesn't take into account functional data structures (nor
does the CL sequence API, of course).
Alessio
What I want to say is that it's not application-specific vs generic
abstractions. You should build application-specific abstractions on
top of generic ones, except when serious considerations force you to
do otherwise.
> Secondly, the kind of abstraction we are discussing here, namely one to
> allow changing the implementation of some data type in a later version
> of the program, _has_ no dynamic properties. In version X of the
> program, a particular call site (ref ...) will mean ASSOC, in version Y
> GETHASH. No dynamicity involved. In contrast, your straw man the
> arithmetic operators are very likely to accept different number types at
> run-time.
Is this kind of abstraction forbidden to have dynamic properties?
Clearly not. Given
(defun add-foo-to-bar (foo bar)
(sequence-push foo (foos-of bar))
then I will be able to deal with bars which store foos in sequences of
any type. Otherwise I would have either to constrain bars to only
store foos in, say, lists, or to manually dispatch on the type of
(foos-of bar) to handle at least lists and vectors.
There are also possible uses of custom collection types whose reason
to exist is not in the realm of data types & algorithms. For example,
a CLOS persistence library might provide a persistent-sequence type
and require the user to use it for storing persistent collections of
objects.
Alessio
> Alessio Stalla <alessi...@gmail.com> writes:
>
> > On Aug 3, 2:02 pm, bj...@runa.se (Björn Lindberg) wrote:
>
> >> No, these are two completely different topics. We do not use generic
> >> arithmetic operators to delay an implementation choice of which number
> >> type to use.
> >
> > We use them because we don't care about the low-level details of how
> > arithmetic operations are implemented on the CPU; we only care about
> > summing numbers together. Similarly, at a higher level, there are
> > cases when we don't care whether a sequence is a list or a vector; we
> > just want to append an element to it, or change the value of the nth
> > element, and so on.
>
> I would claim that it is comparatively rare that you don't care if
> something is a list or vector.
That's true. But it's relatively common not to realize that you care
until long after you've already made the choice.
> Secondly, the kind of abstraction we are discussing here, namely one to
> allow changing the implementation of some data type in a later version
> of the program, _has_ no dynamic properties.
It could. My version of dictionaries, for example, allows the
underlying implementation of a dictionary (a.k.a. an abstract
associative map) to be dynamically changed at run time.
rg
> > It all boils down to some common sense. Just like we use + rather than
> > fixnum+, bignum+, float+, double+, it's often convenient to use ELT
> > rather than AREF or NTH.
>
> No, these are two completely different topics. We do not use generic
> arithmetic operators to delay an implementation choice of which number
> type to use.
Huh?!? This seems to me to be the *only* reason to use generic
arithmetic. What else is it good for?
rg
Then, since pushing on a simple vector would imply copying it, while
pushing on an adjustable vector would not, you would have to wrap it
in your own high level data structure, therefore what you want to do
is not possible. QED.
This is my point, that CL data types are too low level. You cannot
have a homogeneous API over them or in conjunction with other higher
level data structure.
>Secondly, the kind of abstraction we are discussing here, namely one to
>allow changing the implementation of some data type in a later version
>of the program, _has_ no dynamic properties. In version X of the
>program, a particular call site (ref ...) will mean ASSOC, in version Y
>GETHASH. No dynamicity involved. In contrast, your straw man the
>arithmetic operators are very likely to accept different number types at
>run-time.
Interface abstraction frequently hides dynamic changes. It's not at
all uncommon, for example, for collections to change data structures
depending on the number of elements involved.
Similarly, functions may need to choose different algorithms and data
structures depending on how much or what kind of data they need to
operate on. I'm not talking about compiler overloading here but
rather functions that use intelligent dispatch to optimize some aspect
of their performance.
George
[slightly exaggerating to make a point]
The disadvantages are also plenty: Without access to the underlying,
standard API, I can't use the set of tools already developed for
iterating over lists, etc. I am at your mercy to implement the whole
API, I have to learn a new API, and your data structures are opaque to
MAP, LOOP, etc.
I really wish CL had the tools to use a generic REF for indexing into
arrays, retrieving hashtable values, etc. and then using a few
DECLAREs to tell the compiler how to optimize these numerous REFs into
efficient code (possibly even changing the underlying choice of data
structure). This ties into a bigger framework I'd like to have...
- Daniel
To write code that can accept differing types of numbers at run-time. As
you know, + takes a variable number of arguments, which among themselves
can have different types. Alessio would need an awfully high number of
operators with his approach (admittedly a straw man).
Björn Lindberg
> On Tue, 03 Aug 2010 16:42:59 +0200, bj...@runa.se (Björn Lindberg)
> wrote:
>
>>Secondly, the kind of abstraction we are discussing here, namely one to
>>allow changing the implementation of some data type in a later version
>>of the program, _has_ no dynamic properties. In version X of the
>>program, a particular call site (ref ...) will mean ASSOC, in version Y
>>GETHASH. No dynamicity involved. In contrast, your straw man the
>>arithmetic operators are very likely to accept different number types at
>>run-time.
>
> Interface abstraction frequently hides dynamic changes. It's not at
> all uncommon, for example, for collections to change data structures
> depending on the number of elements involved.
Of course. I explicitly limited my discussion to the case at hand, which
was about being able to change implementations in later versions of the
code, and the potential refactoring resulting thereof. In my mind this
and the dynamic change you are speaking of are two different topics,
albeit with overlapping solution spaces.
Björn Lindberg
> RG <rNOS...@flownet.com> writes:
>
> > In article <9mpwrs7...@runa.se>, bj...@runa.se (Björn Lindberg)
> > wrote:
> >
> >> > It all boils down to some common sense. Just like we use + rather than
> >> > fixnum+, bignum+, float+, double+, it's often convenient to use ELT
> >> > rather than AREF or NTH.
> >>
> >> No, these are two completely different topics. We do not use generic
> >> arithmetic operators to delay an implementation choice of which number
> >> type to use.
> >
> > Huh?!? This seems to me to be the *only* reason to use generic
> > arithmetic. What else is it good for?
>
> To write code that can accept differing types of numbers at run-time.
And how is that not "delaying an implementation choice" (to run-time) on
which number type to use?
> As
> you know, + takes a variable number of arguments, which among themselves
> can have different types. Alessio would need an awfully high number of
> operators with his approach (admittedly a straw man).
Actually having a different operator for each numeric type is not a
straw man. Some languages (like OCAML) actually do math this way.
rg
Ugh! I did intend it as a straw man :) I didn't suspect someone would
actually do it!
Cheers,
Alessio
> In article <9mplj8m...@runa.se>, bj...@runa.se (Björn Lindberg)
> wrote:
>
>> RG <rNOS...@flownet.com> writes:
>>
>> > In article <9mpwrs7...@runa.se>, bj...@runa.se (Björn Lindberg)
>> > wrote:
>> >
>> >> > It all boils down to some common sense. Just like we use + rather than
>> >> > fixnum+, bignum+, float+, double+, it's often convenient to use ELT
>> >> > rather than AREF or NTH.
>> >>
>> >> No, these are two completely different topics. We do not use generic
>> >> arithmetic operators to delay an implementation choice of which number
>> >> type to use.
>> >
>> > Huh?!? This seems to me to be the *only* reason to use generic
>> > arithmetic. What else is it good for?
>>
>> To write code that can accept differing types of numbers at run-time.
>
> And how is that not "delaying an implementation choice" (to run-time) on
> which number type to use?
Let us be careful not to have this discussion stray from the topic, and
consider the context in which that comment was made. If you think that
Alessio's 'typed arithmetics' are similar in nature to what I was
outlining in the post he replied to, can you please explain that
further?
>> As
>> you know, + takes a variable number of arguments, which among themselves
>> can have different types. Alessio would need an awfully high number of
>> operators with his approach (admittedly a straw man).
>
> Actually having a different operator for each numeric type is not a
> straw man. Some languages (like OCAML) actually do math this way.
A straw man argument is based on misrepresenting an opponent's
position. That someone, somewhere, might hold the opinion of the straw
man argument is irrelevant.
Björn Lindberg
> RG <rNOS...@flownet.com> writes:
>
> > In article <9mplj8m...@runa.se>, bj...@runa.se (Björn Lindberg)
> > wrote:
> >
> >> RG <rNOS...@flownet.com> writes:
> >>
> >> > In article <9mpwrs7...@runa.se>, bj...@runa.se (Björn Lindberg)
> >> > wrote:
> >> >
> >> >> > It all boils down to some common sense. Just like we use + rather than
> >> >> > fixnum+, bignum+, float+, double+, it's often convenient to use ELT
> >> >> > rather than AREF or NTH.
> >> >>
> >> >> No, these are two completely different topics. We do not use generic
> >> >> arithmetic operators to delay an implementation choice of which number
> >> >> type to use.
> >> >
> >> > Huh?!? This seems to me to be the *only* reason to use generic
> >> > arithmetic. What else is it good for?
> >>
> >> To write code that can accept differing types of numbers at run-time.
> >
> > And how is that not "delaying an implementation choice" (to run-time) on
> > which number type to use?
>
> Let us be careful not to have this discussion stray from the topic, and
> consider the context in which that comment was made. If you think that
> Alessio's 'typed arithmetics' are similar in nature to what I was
> outlining in the post he replied to, can you please explain that
> further?
The original context was snipped out. I presume you're referring to
this:
> If I may interject, I do see value in abstraction, but it will usually
> take a different form than what you propose.
>
> Let's say that we in our program have objects of the kind foo, indexed
> by bar, stored in collections bork. You do not want to prematurely
> select the implementation of bork to be either hash tables, alists or
> something else. You now write accessors like ADD-FOO, FIND-FOO, and the
> like, essentially creating an ADT specific to the program. This ADT
> encapsulates whatever implementation is chosen for bork, making change
> later easy. The advantages of this approach are plenty: The accessors
> can be written to fit the semantics of foo, bar and bork, rather than
> shoe-horning the behaviour of those into some generic collections
> framework. It makes the code much more descriptive of what it is doing
> than if it is sprinkled with "generic" REF calls, which from a
> documentation perspective is even worse than having it filled with calls
> to GETHASH.
Alessio's typed arithmetic operators seem to me to be exactly analogous
to your typed collections operators. ADD-FOO (presumably) only adds
objects of type FOO. In fact, just change FOO to FIXNUM and you've got
Alessio's example nearly verbatim.
> >> As
> >> you know, + takes a variable number of arguments, which among themselves
> >> can have different types. Alessio would need an awfully high number of
> >> operators with his approach (admittedly a straw man).
> >
> > Actually having a different operator for each numeric type is not a
> > straw man. Some languages (like OCAML) actually do math this way.
>
> A straw man argument is based on misrepresenting an opponent's
> position. That someone, somewhere, might hold the opinion of the straw
> man argument is irrelevant.
That's true, but Alessio's example doesn't look like a straw man to me
from that perspective either. It seems to me to be a perfectly valid
critique of your approach. If you think it's a straw man then you
haven't communicated your point very well, at least not to me or Alessio.
rg
I think Björn actually meant something like register-observer and
deregister-observer. [1] Something like this:
(defclass observable-mixin ()
((observers :initform '())))
;; specializing on standard-class here is not standard-compliant
;; but good enough for illustration purposes
(defmethod (setf slot-value-using-class) :after
((class standard-class) (object observable-mixin) slot)
;; the following should pass more information
;; but you get the idea
(mapc #'notify (slot-value object 'observers)))
(defmethod register-observer ((object observable-mixin) observer)
(pushnew observer (slot-value object 'observers)))
(defmethod deregister-observer ((object observable-mixin) observer)
(setf (slot-value object 'observers)
(remove observer (slot-value object 'observers))))
Replacing the list with a vector here is downright trivial, because it's
nicely abstracted away behind the register/deregister protocol. Adding
an abstract collection API here doesn't buy you a lot in this and
similar cases.
Pascal
[1] Björn, please correct me if I'm wrong.
By you.
> I presume you're referring to
> this:
>
>> If I may interject, I do see value in abstraction, but it will usually
>> take a different form than what you propose.
>>
>> Let's say that we in our program have objects of the kind foo, indexed
>> by bar, stored in collections bork. You do not want to prematurely
>> select the implementation of bork to be either hash tables, alists or
>> something else. You now write accessors like ADD-FOO, FIND-FOO, and the
>> like, essentially creating an ADT specific to the program. This ADT
>> encapsulates whatever implementation is chosen for bork, making change
>> later easy. The advantages of this approach are plenty: The accessors
>> can be written to fit the semantics of foo, bar and bork, rather than
>> shoe-horning the behaviour of those into some generic collections
>> framework. It makes the code much more descriptive of what it is doing
>> than if it is sprinkled with "generic" REF calls, which from a
>> documentation perspective is even worse than having it filled with calls
>> to GETHASH.
>
> Alessio's typed arithmetic operators seem to me to be exactly analogous
> to your typed collections operators. ADD-FOO (presumably) only adds
> objects of type FOO. In fact, just change FOO to FIXNUM and you've got
> Alessio's example nearly verbatim.
I will explain what I mean using a trivial example. Let us say that you
are writing an interpreter for a language. In the language, it is
possible to define functions, and later reference them in various, for
instance by calling them. In the interpreter, you initially decide to
store the association between the name of a function and the function
itself in an association list. To add to the list you are going to use
PUSH, and to look up a function by name ASSOC. Now, the concern raised
in this thread was that unless you have a generic collections framework
that turns PUSH and ASSOC into (setf (ref ...) ...) and (ref ...)
respectively, it will be very difficult to change the implementation
decision to use an association list to, say, a hash table, because you
will have PUSHs and ASSOCs all over the code, making refactoring
hard. My suggestion is to create an abstraction appropriate to the
problem:
(defun add-function (name function)
(push (cons name function) *functions*))
(defun find-function (name)
(or (cdr (assoc name *functions))
(error "Undefined function: ~A" name)))
Refactoring this interpreter to use a hash table for the functions will
require change in two places. Now, this is a trivial example, but the
approach scales. You do not need a generic collections framework to be
able to delay data structure implementation decisions without risking
large code refactorings.
Björn Lindberg
> I think Björn actually meant something like register-observer and
> deregister-observer. [1] Something like this:
Not specifically in terms of register/deregister, but the concept of
application specific abstraction, yes, sure. This to me is building the
language to meet the problem you are solving.
Björn Lindberg
Yes. Because it didn't seem relevant. It still doesn't, even after
reading everything you just wrote. Alessio's critique still seems
spot-on to me, and I still don't see how your kind of abstraction is in
any salient way different from type-specific arithmetic.
I would put that slightly differently. It's not that you want a generic
framework that turns PUSH and ASSOC into REFs. What you want to do
(IMHO) is to write your code in terms of REF to begin with, and then
have that (potentially) translated into PUSH/ASSOC or whatever.
> My suggestion is to create an abstraction appropriate to the
> problem:
>
> (defun add-function (name function)
> (push (cons name function) *functions*))
>
> (defun find-function (name)
> (or (cdr (assoc name *functions))
> (error "Undefined function: ~A" name)))
>
> Refactoring this interpreter to use a hash table for the functions will
> require change in two places. Now, this is a trivial example, but the
> approach scales. You do not need a generic collections framework to be
> able to delay data structure implementation decisions without risking
> large code refactorings.
This is the abstraction approach advocated in SICP. It has two problems:
1) It's more work than REF. You have to build one of these every single
time you address a new problem. This is the essence of Alessio's
critique. If you want to add/find a FOO you need to write
ADD-FOO/FIND-FOO. If you want to ADD/FIND a BAZ you need to write
ADD-BAZ/FIND-BAZ. If you want to ADD/FIND/SUBTRACT/MULTIPLY a fixnum
you need to... [completing this sentence is left as an exercise]
2) Abstractions built this way are leaky. ADD-FUNCTION implemented your
way doesn't return an abstract opaque structure, it returns a cons cell.
So nothing prevents you from, say, taking the CDR of the result of
ADD-FUNCTION. If you do that you lose all the benefit of the
abstraction. So reaping the benefit of this kind of abstraction is
entirely dependent on programmer discipline.
REF, by way of contrast, can be implemented opaquely so that the
abstraction barrier cannot be accidentally breached. And it confers the
benefits of abstraction without requiring the programmer to write
endless boilerplate. And it doesn't prevent you from writing SICP-style
application-specific abstractions on top of REF if that's what floats
your canoe. And the interface is dead-simple: (REF COLLECTION INDEX).
REF seems to me to strictly dominate SICP-style abstraction in all
respects.
But of course all of this is tangential to the main point, which is that
Alessio's critique still seems spot-on to me.
rg
No, replacing a list with a vector is trivial because you only have ten
lines of code. In fact, to replace lists with vectors you would need to
change four of your ten lines of code. If your code base were 10,000
LOC instead of 10, changing 40% of it to change an implementation would
feel like more of a chore.
If you had used a real abstraction you could have replaced lists with
vectors by changing zero lines of code. In a large code base that can
be a big win.
> Adding
> an abstract collection API here doesn't buy you a lot in this and
> similar cases.
Very few software engineering techniques pay big dividends on small code.
rg
In my 10 lines of code I have four definitions referring to a single
list container. So are you saying that in 10000 lines of code, I would
have 4000 definitions referring to that single list container? I'm
wondering what kind of code that would be... ;)
Pascal
Why do they all have to refer to that one list container? Why could you
not have 1000 different kinds of containers, each of which requires the
same 10-line boilerplate, and all of which might have to be changed
under the right circumstances? I'm working with a system right now that
pretty much fits that description. (It's a financial system.)
rg
Why would the all require the same change at the same time?
If they all require the same boilerplate, why didn't you write a macro
to generate it? Then you would have greatly reduced the workload again...
> In article <m2y6cmg...@nex.nex>, bj...@runa.se (Björn Lindberg)
> wrote:
>> My suggestion is to create an abstraction appropriate to the
>> problem:
>>
>> (defun add-function (name function)
>> (push (cons name function) *functions*))
>>
>> (defun find-function (name)
>> (or (cdr (assoc name *functions))
>> (error "Undefined function: ~A" name)))
>>
>> Refactoring this interpreter to use a hash table for the functions will
>> require change in two places. Now, this is a trivial example, but the
>> approach scales. You do not need a generic collections framework to be
>> able to delay data structure implementation decisions without risking
>> large code refactorings.
>
> This is the abstraction approach advocated in SICP. It has two problems:
>
> 1) It's more work than REF. You have to build one of these every single
> time you address a new problem. This is the essence of Alessio's
> critique.
Yes, but each such abstraction will be different, because it will be
tailored to fit the semantics of the program, rather than abstracted in
the direction of generic collections.
There are two reasons why Alessio's critique is not applicable. The
first one is that whereas numbers as a consequence of mathematical
definitions support the same set of operations and behave the same under
these operations, collections do not. A generic collections framework
will by necessity have a lowest common denominator API, hiding relevant
details. You have to give something up to use it, which you do not give
up in the case of the numbers. Just as an example, for lists a FOREACH
operator quite naturally will process the elements in order, whereas for
a tree you would have to choose which traversal order should be the
canonical one. Other orders would be unavailable without breaking the
abstraction.
The second reason why Alessio's argument misses its mark is that almost
always functions using the arithmetic operators will _at_run-time_
accept different number types. This is the common case. In contrast,
although a generic collections framework can be used for this purpose
too, the topic of discussion here is how to be able to change an
underlying implementation in a later version of the code without
excessive refactoring. At run-time in our example, the collection type
will be the same. This does not make Alessio's critique invalid as a
counter argument in some other discussion, but makes it not applicable
in this case.
> 2) Abstractions built this way are leaky. ADD-FUNCTION implemented your
> way doesn't return an abstract opaque structure, it returns a cons cell.
> So nothing prevents you from, say, taking the CDR of the result of
> ADD-FUNCTION. If you do that you lose all the benefit of the
> abstraction. So reaping the benefit of this kind of abstraction is
> entirely dependent on programmer discipline.
Programming is very dependent on discipline, indeed. This case is hardly
unique. And transparent structures have the advantage of simplifying
debugging. That aside, this is not a problem with the approach
itself. Nothing prevents you from having add-function returning nil, the
name of the function added or something else entirely if you do not
trust yourself or your colleagues. And even then, the generic approach
suffers the same dilemma. SUBSEQ for instance, being generic over lists
and vectors, will for some invocations return a list, which one of those
undisciplined programmers surely would take the CDR of.
Björn Lindberg
No, Pascal just showed how he encapsulated the choice of using lists
such that even when the code base grows to 10,000 lines, changing it to
use vectors still only involves a few lines. Your 40% would come from
littering the code with pushnews and removes. The difference in approach
is creating an application specific abstraction -- in this case
register/deregister -- versus littering the code with generic REF
accesses. (Although they are not mutually exclusive, of course.)
Björn Lindberg
They might not. But as the data volume grows, the number of objects
that fill each different kind of container grows at more or less the
same rate. Also...
> If they all require the same boilerplate, why didn't you write a macro
> to generate it? Then you would have greatly reduced the workload again...
It's not my code. But such a macro would not have solved the problem in
this particular case. This is a system that manages financial data
feeds from around the world. It is live 24x7. Down time is somewhere
between extremely expensive and disastrous depending on when it occurs.
When you change representation you not only have to change the code, you
also have to convert the currently-live data structures. So a macro
that generated accessors would not by itself be enough. You would
either have to reboot the system after every change, or write additional
code to convert the currently-live data structures in situ. And this is
not an uncommon situation in high-availability applications.
rg
Are you seriously saying that in such an application, a random access
operator (like elt) is used on lists, or that it would ever makes sense
to use elt on lists in the future? An O(n) operation?
I find that very hard to believe...
Do you also find it hard to believe than that anyone would ever use an
AList or a Plist in production code? ASSOC and GETF are also O(n)
operations, no? Or that it might make sense to use an O(n) operation if
it is in a part of the system that runs is rarely used and is not
mission-critical?
In this particular case I have no idea what's really going on under the
hood. Like I said, it's not my code. (In this case I'm a user, not a
developer.) But what difference does it make what the particular issue
happens to be? Do you seriously doubt that representations in
production applications often need to be changed after deployment to
meet unanticipated demands or to support new features? And that down
time can be expensive, and in some cases even catastrophic?
rg
That's a bad comparison. An alternative to alists and plists is hash
tables, but hash tables are slower than alists or plists if there are
less than 50 (or so) mappings. In contrast, arrays are always faster for
random access than lists.
> Or that it might make sense to use an O(n) operation if
> it is in a part of the system that runs is rarely used and is not
> mission-critical?
That can happen, sure.
> In this particular case I have no idea what's really going on under the
> hood. Like I said, it's not my code. (In this case I'm a user, not a
> developer.) But what difference does it make what the particular issue
> happens to be? Do you seriously doubt that representations in
> production applications often need to be changed after deployment to
> meet unanticipated demands or to support new features? And that down
> time can be expensive, and in some cases even catastrophic?
I don't doubt that run-time changes are important, to the contrary. I
still believe that the collections is the wrong level of abstraction to
make such changes.
You're right in that you may not have any other opportunity in badly
designed code.
> On 05/08/2010 19:01, RG wrote:
> > Do you also find it hard to believe than that anyone would ever use an
> > AList or a Plist in production code? ASSOC and GETF are also O(n)
> > operations, no?
>
> That's a bad comparison. An alternative to alists and plists is hash
> tables, but hash tables are slower than alists or plists if there are
> less than 50 (or so) mappings. In contrast, arrays are always faster for
> random access than lists.
And in fact in some of our code where performance was an issue we ended
up writing an auto-switching data structure. We had particular lookup
lists (dictionaries) where many were short but occasionally longer ones
would pop up. This depended on some runtime actions so it wasn't
predictable.
So our solution was to build an abstract data structure that would use
ALISTs until the (empirically determined) performance thresdhold was
reached and then substituting a hash table internally to hold the data.
I suppose that makes this tale an endorsement of the ADD-FOO and GET-FOO
technique.
--
Thomas A. Russ, USC/Information Sciences Institute
No. Both approaches are equivalent in this regard. See
http://www.flownet.com/ron/lisp/dictionary.lisp for an existence proof.
rg
That's a straw man. Hash tables are not the only alternative to ALists
and PLists.
> > Or that it might make sense to use an O(n) operation if
> > it is in a part of the system that runs is rarely used and is not
> > mission-critical?
>
> That can happen, sure.
>
> > In this particular case I have no idea what's really going on under the
> > hood. Like I said, it's not my code. (In this case I'm a user, not a
> > developer.) But what difference does it make what the particular issue
> > happens to be? Do you seriously doubt that representations in
> > production applications often need to be changed after deployment to
> > meet unanticipated demands or to support new features? And that down
> > time can be expensive, and in some cases even catastrophic?
>
> I don't doubt that run-time changes are important, to the contrary. I
> still believe that the collections is the wrong level of abstraction to
> make such changes.
Yes, I get that. What I don't get is *why* you think that. But at this
point maybe we just ought to agree to disagree.
rg
Lua "tables" do something like that, but are instead always *both*
hash tables and arrays. Elements of a table whose keys are a
"dense-enough" set of natural numbers[1] are stored in the array
part, and all other elements are stored in the hash table part
[including those with numeric keys that outside the "dense enough"
range]:
http://www.lua.org/doc/hopl.pdf
The Evolution of Lua
...
Until Lua 4.0, tables were implemented as pure hash tables,
with all pairs stored explicitly. In Lua 5.0 we introduced a
hybrid representation for tables: every table contains a hash
part and an array part, and both parts can be empty. Lua detects
whether a table is being used as an array and automatically
stores the values associated to integer indices in the array
part, instead of adding them to the hash part[31]. This division
occurs only at a low implementation level; access to table fields
is transparent, even to the virtual machine. Tables automatically
adapt their two parts according to their contents.
Reference [31] explains further:
http://www.lua.org/doc/jucs05.pdf
The implementation of Lua 5.0
...
When a table needs to grow, Lua recomputes the sizes for its hash
part and its array part. Either part may be empty. The computed
size of the array part is the largest N such that at least half
the slots between 1 and N are in use (to avoid wasting space with
sparse arrays) and there is at least one used slot between N/2+1
and N (to avoid a size N when N/2 would do). After computing the
new sizes, Lua creates the new parts and re-inserts the elements
from the old parts into the new ones.
...
As a consequence, if a table is being used as an array, it performs
as an array, as long as its integer keys are dense. Moreover, no
memory or time penalty is paid for the hash part, because it does
not even exist. The converse holds: if the table is being used as
an associative array, and not as an array, then the array part is
likely to be empty.
-Rob
[1] Yes, Lua arrays are indexed starting at *1*!! (*sigh*)
To be precise, the number 0 may certainly be used as a key,
but its associated value will go into the hash-table part
of the table, not the array part.
-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607
Nice interesting abstraction.
I only looked at this briefly and saw that it supported changing the
underlying implementation of dictionaries dynamically. I didn't see any
place where this happened automatically, although I might have missed
it.
Unless it also supports automatic changes to the implementation, it
seems that one would still need to have the ADD-FOO, etc. interface on
top of the dictionary in order to trigger the change-over.
Although I guess it wouldn't be too hard to extend your dictionary code
to have an automatic switch over by defining an
auto-switching-dictionary class.
> RG <rNOS...@flownet.com> writes:
>
> > In article <ymid3tv...@blackcat.isi.edu>,
> > t...@sevak.isi.edu (Thomas A. Russ) wrote:
> > > So our solution was to build an abstract data structure that would use
> > > ALISTs until the (empirically determined) performance thresdhold was
> > > reached and then substituting a hash table internally to hold the data.
> > >
> > > I suppose that makes this tale an endorsement of the ADD-FOO and GET-FOO
> > > technique.
> >
> > No. Both approaches are equivalent in this regard. See
> > http://www.flownet.com/ron/lisp/dictionary.lisp for an existence proof.
>
> Nice interesting abstraction.
>
> I only looked at this briefly and saw that it supported changing the
> underlying implementation of dictionaries dynamically. I didn't see any
> place where this happened automatically, although I might have missed
> it.
No, but that's a pretty straightforward extension, at least conceptually.
> Unless it also supports automatic changes to the implementation, it
> seems that one would still need to have the ADD-FOO, etc. interface on
> top of the dictionary in order to trigger the change-over.
Huh? Why?
> Although I guess it wouldn't be too hard to extend your dictionary code
> to have an automatic switch over by defining an
> auto-switching-dictionary class.
You don't even have to do that. All you have to do is have some
profiling code somewhere that triggers appropriate calls to
change-implementation. This is not a trivial thing to do, especially if
you want it to be thread-safe without bringing everything to a
screeching halt, but it's not rocket science either.
rg
I agree :)
I can see both sides of this. It seems to stand to reason that a
generic collection interface can be useful, but in my experience, it
isn't useful -- at least, not critically so -- as often as one might think.
-- Scott
I don't think anyone ever claimed that collection abstractions were
going to solve every software engineering problem or bring about world
peace. But it still seems to me that the benefits are, at least
potentially, substantial even if they do not accrue "as often as one
might think" (however often that might be) while the cost is essentially
zero. So I don't understand the vehemence with which people resist
using them.
I am particularly mystified by this because one could (and people do)
make exactly the same argument about, say, macros: yes, they can
sometimes be useful, but not "as often as one might think" (again,
however often that might be). I would have thought that if any group of
people would see the flaw in this reasoning without having to have it
explained to them it would be the people who hang out here on c.l.l.
Alas.
rg
> I am particularly mystified by this because one could (and people do)
> make exactly the same argument about, say, macros: yes, they can
> sometimes be useful, but not "as often as one might think" (again,
In theory, generic collections are often useful, and macros are sometimes
useful. But, in practice, macros are useful a lot more and generic
collections a lot less.
This is because CL provides a good set of built-in collections that cover
most cases very well, and are more efficient than generic collections, and
because macros make the interfaces cleaner when building layers of domain
specific languages, which is the main activity of developing good software.
The vast majority of the software engineering world would disagree with
you that "building layers of domain specific languages ... is the main
activity of developing good software." The vast majority of software is
built in languages that don't have macros, and where as a result
building domain specific languages is very hard, and as a result is very
rarely done.
It is only in the Lisp world where one finds this enlightened attitude
towards things like macros, and more to the point, things like being
able to redefine functions, methods, and classes, resume a computation
after an error, and even (gasp!) change the class of an existing
instance in a running system without having to restart. The value of
being able to make changes at run time is implicitly acknowledged in
nearly every aspect of CL's design.
I would also disagree that CL "provides a good set of built-in
collections that cover most cases very well, and are more efficient than
generic collections." CL provides linked lists, arrays (with vectors as
a special case), and hash tables. Then it provides a hodgepodge of puns
on top of linked lists (alists, plists, sets) and a half-assed
non-extensible unification of linked-lists and vectors as "sequences".
That's it. No heaps. No tries. No trees. And that's even before you
get to contemporary issues like distributed collections, or collections
with a non-volatile backing store. As for being "more efficient", the
only efficiency difference between a generic and a non-generic
collection would the cost of a method dispatch, and even that could be
compiled away with the proper declarations.
So I'm sorry, but your explanation does nothing to dispel my
bewilderment. At the risk of repeating myself, the value of being able
to make changes at run time is implicitly acknowledged in nearly every
other aspect of CL's design. So I still don't understand why adding
that ability to this aspect of the language is resisted so vehemently.
rg
>The vast majority of the software engineering world would disagree with
>you that "building layers of domain specific languages ... is the main
>activity of developing good software." The vast majority of software is
>built in languages that don't have macros, and where as a result
>building domain specific languages is very hard, and as a result is very
>rarely done.
The vast majority of the software engineering world doesn't understand
what programming is about. Any time you construct a set of functions
designed to let you address a problem in its own terms, you are
creating a domain specific language.
This is done all the time in every flavor of programming language.
Macros are not required.
George
"Resisted so vehemently"?
Pascal Costanza, 2010-07-30:
> I find it hard to imagine that this matters beyond simple examples.
> But that may just be a limitation of my imagination...
Pascal Costanza, 2010-07-30:
> Don't let me stop you. Everybody should do what they think is right.
Pascal Costanza, 2010-07-31:
> You could use a collection library, while I could continue to use the
> low-level substrates...
Pascal Costanza, 2010-07-31:
> I'm not stopping anybody from creating a good collection framework.
> If the result convinces me, I'm happy to use it. What I have seen
> so far doesn't convince me, but who am I to predict there can't
> be anything better?
I think you're confusing a disagreement about the importance of the
undertaking with outright resistance. Also, glancing through the
thread, I see arguments made by various people on both sides. Björn
Lindberg is the only other person I notice consistently taking the view
that this kind of abstraction is not important (maybe I missed some; I
didn't reread the whole thread). So it appears that even the position
that collection abstractions are (relatively) unimportant is not held by
a majority of the thread participants... maybe it's about 50-50.
It seems that you are frustrated because you can't convince everyone.
I would further point out that even if you did convince everyone, it
would not necessarily accomplish anything. The real work to be done
here is the design of the abstraction. The only people who need to be
convinced are those who are actually going to do that work.
-- Scott
> RG wrote:
> > At the risk of repeating myself, the value of being able
> > to make changes at run time is implicitly acknowledged in nearly every
> > other aspect of CL's design. So I still don't understand why adding
> > that ability to this aspect of the language is resisted so vehemently.
>
> "Resisted so vehemently"?
"Resisted as vehemently as it is"?
> Pascal Costanza, 2010-07-30:
> > I find it hard to imagine that this matters beyond simple examples.
> > But that may just be a limitation of my imagination...
>
> Pascal Costanza, 2010-07-30:
> > Don't let me stop you. Everybody should do what they think is right.
>
> Pascal Costanza, 2010-07-31:
> > You could use a collection library, while I could continue to use the
> > low-level substrates...
>
> Pascal Costanza, 2010-07-31:
> > I'm not stopping anybody from creating a good collection framework.
> > If the result convinces me, I'm happy to use it. What I have seen
> > so far doesn't convince me, but who am I to predict there can't
> > be anything better?
Pascal is not the only one expressing an opinion about this. And this
C.L.L. thread is not the only data I have on this issue.
> I think you're confusing a disagreement about the importance of the
> undertaking with outright resistance. Also, glancing through the
> thread, I see arguments made by various people on both sides. Björn
> Lindberg is the only other person I notice consistently taking the view
> that this kind of abstraction is not important (maybe I missed some; I
> didn't reread the whole thread). So it appears that even the position
> that collection abstractions are (relatively) unimportant is not held by
> a majority of the thread participants... maybe it's about 50-50.
That's enough to puzzle me. I don't understand the quality metric that
allows one to like CL on the one hand, with all of its focus on dynamism
and allowing one to make seamless changes at run-time, and yet not like
abstract collections.
> It seems that you are frustrated because you can't convince everyone.
I'm not frustrated. I'm puzzled.
> I would further point out that even if you did convince everyone, it
> would not necessarily accomplish anything. The real work to be done
> here is the design of the abstraction. The only people who need to be
> convinced are those who are actually going to do that work.
But people have designed such interfaces. I know of at least two. And
yet the objections are not directed at the shortcomings of these designs
but at the idea of abstract collections in general. That's what I don't
get.
Believe me, I know I'm not going to convince anyone. I'm just trying to
understand the thought process that leads people to these conclusions
because it doesn't make any sense to me.
rg
I hope I don't have to defend here on C.L.L. the proposition that macros
are useful, and that a language with macros lets you do useful things
that a language without macros doesn't let you do, at least not as
easily. Whether you want to call those things "designing domain
specific languages" or something else doesn't change the substance of my
point, which is this: Lisp has two characteristics that distinguish it
from other languages. One is a uniquely powerful ability to create new
abstractions (with macros, but that's really beside the point). The
other is the ability to change things at run time (with a long list and
varied list of features). Abstract collections are a win along both of
these dimensions, so I don't understand how someone can like Lisp and
not like abstract collections. It just seems incongruous to me.
rg
Although I support the effort, I'm a little skeptical that API
incompatibility with CL sequences is the major barrier to the adoption
of FSet. FSet already goes well out of its way to support as much of
the existing CL sequence API as possible. Sure, it would be nice if
users didn't have to shadowing-import my versions of the symbols in that
API, but I don't see that as so awfully onerous. Perhaps I'm mistaken.
> One of my original points was that the sequence
> API proposed by Christophe Rhodes for SBCL, while great in many
> aspects, doesn't take into account functional data structures (nor
> does the CL sequence API, of course).
That is one of the two major philosophical divergences between the CL
API (and Christophe's) and FSet. And it's a severe one. It's not ever
going to be possible to just plug an imperative collection in in place
of a functional one; the code that uses them necessarily has to be
written differently. The same can be true for the converse, if the
collection is intentionally being aliased; my opinion, as a matter of
software design, is that such aliasing is rare anyway and should be
vanishingly rare, but it does occur.
The other philosophical divergence between the CL approach and the FSet
approach is that the FSet collections are semantics-heavy: a collection
knows what kind of thing it is, what operations are applicable to it,
and what equivalence relation it should use for its elements. By
contrast, a CL list can be used in many different ways -- as a set, a
sequence, a bag, an alist, a plist, etc.; the semantics are not supplied
by the data structure, but rather by the operations that are used on it
and the arguments (such as :TEST) that are supplied to those operations.
I suppose this second divergence is less difficult to overcome; one
simply says that if you're calling UNION on two FSet sets, it's an error
to supply a :TEST argument. Indeed, FSet signals an error in that case.
But it's something a unified API spec will have to deal with.
Presumably, other collection implementations, be they imperative or
functional, will also mostly be semantics-heavy in this sense, so this
isn't just for benefit of FSet.
I did once start out to study Christophe's proposal, with the intention
of preparing another that would support FSet, but I'm stretched very
thin these days and didn't get far. I certainly applaud you or anyone
else trying to do that.
(While you're at it, better integration of CL hash tables wouldn't be a
bad thing. Granted, hash tables can't meaningfully be concatenated or
even unioned(*), but at least ELT and (SETF ELT) could work on them.
(*) Come to think of it, maybe there is a possible meaning for the union
of two hash tables, if they're assumed to represent sets, i.e. only the
keys are significant.)
As for what is the largest barrier to adoption of FSet, I don't really
know but I would be surprised if it's an ease-of-use problem. It is a
bit unfamiliar, I know, but I put a lot of effort into the design to
make it as easy to use as I could, and wrote a tutorial besides.
I think that the largest barrier is simply that people don't see the
need for it; they're happy with what they have. There are lots of
different kinds of programs and of programming in this world, and I
freely admit FSet is not appropriate for all of them. I think it would
be of some value for most kinds of programs, but perhaps it really
becomes massively useful only for the kind of work I do: static analysis
and automated reasoning.
And the second largest barrier, I suspect, is simply that people expect
it to perform poorly. A lot of people on this list talk about
minimizing garbage generation in their code -- an art that I think
belongs in the same category as writing in assembler: experts
occasionally need to do it, and it's useful to know how to do it, but
most people most of the time don't need to worry about it. And yet it
gets discussed here with some regularity. I'm not sure people really
realize how good modern generational GCs are.
That said, it has occurred to me that it wouldn't be so awfully
difficult, at least in some cases, to integrate imperative collections
with FSet's functional collections in such a way as to give most of the
benefits of both. For instance, if you have an imperative set
implemented as a red-black tree, I could easily write a high-performance
converter from that form into an FSet wb-set. Only slightly harder
would be a general high-performance set constructor interface that would
make it easy for you to write the converter. (It would simply need to
know the size of the set at the beginning, and then be supplied elements
in sorted order without duplicates.)
Given such a converter, you can construct the set imperatively, then
convert it to FSet form for further use. This will benefit the case,
which I believe to be very common, in which a set is not modified very
much after its initial construction.
There is also the possibility that FSet itself can provide a set of such
imperative collection types, which are designed to use a sufficiently
similar internal structure that the conversion is even faster -- and in
certain circumstances can be elided entirely. I am looking into this.
-- Scott
> I don't think anyone ever claimed that collection abstractions were
> going to solve every software engineering problem or bring about world
> peace. But it still seems to me that the benefits are, at least
> potentially, substantial even if they do not accrue "as often as one
> might think" (however often that might be) while the cost is essentially
> zero.
But the cost is far from zero. In rising order of importance, there are
at least three costs associated with the generic collection abstraction:
the cost of developing the abstraction, the cost of increased code
complexity and the cost of the abstraction itself.
The first cost can be avoided if someone else already developed the
collection abstraction and is offering it for free; otherwise it is a
very tangible investment.
The second one has to do with the fact that anything that adds to the
complexity of your code base or adds dependencies to it incurs a cost
related to debugging and maintenance.
Finally, any generic collection abstraction will by necessity have to
conform to some least common denominator between the collections
involved, thus reducing expressivity and preventing the use of specifics
of any one collection. For example, the typical access patterns are
different for vectors and lists, and have different performance
characteristics. A tree can be traversed in several different orders. An
alist or a plist can cheaply and non-destructively be extended or have
entries shadowed by rebinding, something which is not possible to do
with a hash table. All these are examples of properties which can not be
represented in a generic manner, and would break the abstraction if they
were.
Generic collection abstractions, like most programming devices, do incur
costs which have to be outweighed by benefits. Finding the particular
trade off is always a matter local to the problem at hand, not a general
one.
What I initially took issue with was the idea that one should use
generic collections everywhere preventively, so that a later decision to
change a data type would not result in massive refactoring. I think this
mistaken as general advice, since most data types won't change, yet the
costs mentioned above will arise. Instead, I outlined a different
approach which solves the refactoring problem in the cases where it
appears, yet incurs much smaller costs in general.
> I am particularly mystified by this because one could (and people do)
> make exactly the same argument about, say, macros: yes, they can
> sometimes be useful, but not "as often as one might think" (again,
> however often that might be).
If you think that "exactly the same" argument I am making in this
discussion could be made about macros, you have not understood it. In
fact, the dynamic nature of Common Lisp combined with macros makes it
very easy to create application specific abstractions. Much easier than
in Java or C++, where perhaps the needs for and usefulness of collection
abstractions are higher.
Björn Lindberg
I think, from talking to people over the years and from reading/hearing
lots of conversations like this one, that it's exactly the things some
people like about CL that lead them to dislike the idea of abstract
collections. (And that lead at least some of the same people to resist
the idea of changes to CL in general.)
You've got a language with a huge range of built-in data-arrangement
options, combined with really remarkable tools for rolling your own
specific tools. So when someone comes in and says (in effect), "Stop
using those things, just use the system that I've built for you",
they're striking at one of the core competences of being a CL programmer.
Add to that the fact that every single abstract-collection API in the
universe was/is/will be designed wrong, and you have an ironclad case
for opposing them. (How can I say that so confidently? Because every
design involves decisions about how to implement things and what to
implement, and for almost every decision there's someone who can make a
sensible argument it should have been done differently to address the
specific stuff they want to do. QED.)
Of course, almost every facility in CL was designed wrong as well by
this criterion, but it's the devil we know.
paul
>In article <6vju5692s8jli6uki...@4ax.com>,
> George Neuner <gneu...@comcast.net> wrote:
>
>> On Sun, 08 Aug 2010 15:23:57 -0700, RG <rNOS...@flownet.com> wrote:
>>
>> >The vast majority of the software engineering world would disagree with
>> >you that "building layers of domain specific languages ... is the main
>> >activity of developing good software." The vast majority of software is
>> >built in languages that don't have macros, and where as a result
>> >building domain specific languages is very hard, and as a result is very
>> >rarely done.
>>
>> The vast majority of the software engineering world doesn't understand
>> what programming is about. Any time you construct a set of functions
>> designed to let you address a problem in its own terms, you are
>> creating a domain specific language.
>>
>> This is done all the time in every flavor of programming language.
>> Macros are not required.
>
>I hope I don't have to defend here on C.L.L. the proposition that macros
>are useful, and that a language with macros lets you do useful things
>that a language without macros doesn't let you do, at least not as
>easily.
Absolutely not.
My first point was simply that those software "engineers" who would
disagree that building DSLs is their main activity do so because they
are lacking in formal CS education and really don't understand what it
is that they are doing.
My second point is that the main development activity of building DSLs
takes place in all programming languages, regardless of macros.
George
Perhaps it's me who's mistaken. I never actually used FSet, but I
think such a library would be a prime candidate for inclusion in a
"standard library" for CL. And in that case, the easier the
integration with CL is, the better. But as things stand now, I agree
that if one finds the need to use FSet he won't be stopped by the non-
sequence API. What I'm dubious about is how often someone thinks "I
really need FSet to solve this problem!". I suspect that doesn't
happen often and that it would happen more often if FSet was part of
some "batteries included" library for CL and if it integrated well
with sequences.
> > One of my original points was that the sequence
> > API proposed by Christophe Rhodes for SBCL, while great in many
> > aspects, doesn't take into account functional data structures (nor
> > does the CL sequence API, of course).
>
> That is one of the two major philosophical divergences between the CL
> API (and Christophe's) and FSet. And it's a severe one. It's not ever
> going to be possible to just plug an imperative collection in in place
> of a functional one; the code that uses them necessarily has to be
> written differently. The same can be true for the converse, if the
> collection is intentionally being aliased; my opinion, as a matter of
> software design, is that such aliasing is rare anyway and should be
> vanishingly rare, but it does occur.
I agree to a certain point: seamlessly swapping an imperative
collection with a functional one (or vice-versa, perhaps to a lesser
extent) is not going to work. However, having an API which allows to
use sequences functionally, even with terrible performance for some
imperative collections (e.g. vectors) imho would be an improvement.
With all the hype for concurrency that's around, having natively
supported functional collections and parallel map and reduce
operations on them could attract some users ;)
> The other philosophical divergence between the CL approach and the FSet
> approach is that the FSet collections are semantics-heavy: a collection
> knows what kind of thing it is, what operations are applicable to it,
> and what equivalence relation it should use for its elements. By
> contrast, a CL list can be used in many different ways -- as a set, a
> sequence, a bag, an alist, a plist, etc.; the semantics are not supplied
> by the data structure, but rather by the operations that are used on it
> and the arguments (such as :TEST) that are supplied to those operations.
>
> I suppose this second divergence is less difficult to overcome; one
> simply says that if you're calling UNION on two FSet sets, it's an error
> to supply a :TEST argument. Indeed, FSet signals an error in that case.
Why must it be an error? I'd prefer to have it work with an
arbitrary :test, albeit more slowly, presumably - or isn't it
possible?
> But it's something a unified API spec will have to deal with.
> Presumably, other collection implementations, be they imperative or
> functional, will also mostly be semantics-heavy in this sense, so this
> isn't just for benefit of FSet.
I don't quite get the point for semantics-heavy collections in
general, as a design principle. I understand that some collections
have to be semantics-heavy - for example hash-based data structures
inherently depend on a given hashing function and a given equality
relation - but in general, what's the benefit of encoding in the
collection how it's going to be used and prematurely forbidding other
uses?
> I did once start out to study Christophe's proposal, with the intention
> of preparing another that would support FSet, but I'm stretched very
> thin these days and didn't get far. I certainly applaud you or anyone
> else trying to do that.
>
> (While you're at it, better integration of CL hash tables wouldn't be a
> bad thing. Granted, hash tables can't meaningfully be concatenated or
> even unioned(*), but at least ELT and (SETF ELT) could work on them.
> (*) Come to think of it, maybe there is a possible meaning for the union
> of two hash tables, if they're assumed to represent sets, i.e. only the
> keys are significant.)
For hashes, and dictionaries in general, one would need another API.
ELT & friends are specified to work on objects of type SEQUENCE. In
any case, I wouldn't use ELT for accessing dictionaries.
> As for what is the largest barrier to adoption of FSet, I don't really
> know but I would be surprised if it's an ease-of-use problem. It is a
> bit unfamiliar, I know, but I put a lot of effort into the design to
> make it as easy to use as I could, and wrote a tutorial besides.
>
> I think that the largest barrier is simply that people don't see the
> need for it; they're happy with what they have. There are lots of
> different kinds of programs and of programming in this world, and I
> freely admit FSet is not appropriate for all of them. I think it would
> be of some value for most kinds of programs, but perhaps it really
> becomes massively useful only for the kind of work I do: static analysis
> and automated reasoning.
That may be true. However, a significant part of the hype around
Clojure is precisely about its functional collections.
> And the second largest barrier, I suspect, is simply that people expect
> it to perform poorly. A lot of people on this list talk about
> minimizing garbage generation in their code -- an art that I think
> belongs in the same category as writing in assembler: experts
> occasionally need to do it, and it's useful to know how to do it, but
> most people most of the time don't need to worry about it. And yet it
> gets discussed here with some regularity. I'm not sure people really
> realize how good modern generational GCs are.
I don't think so; at least, I don't think like that. I understand that
functional collections are necessarily bound to perform worse than
imperative ones, but I don't think that's going to matter in general.
> That said, it has occurred to me that it wouldn't be so awfully
> difficult, at least in some cases, to integrate imperative collections
> with FSet's functional collections in such a way as to give most of the
> benefits of both. For instance, if you have an imperative set
> implemented as a red-black tree, I could easily write a high-performance
> converter from that form into an FSet wb-set. Only slightly harder
> would be a general high-performance set constructor interface that would
> make it easy for you to write the converter. (It would simply need to
> know the size of the set at the beginning, and then be supplied elements
> in sorted order without duplicates.)
>
> Given such a converter, you can construct the set imperatively, then
> convert it to FSet form for further use. This will benefit the case,
> which I believe to be very common, in which a set is not modified very
> much after its initial construction.
>
> There is also the possibility that FSet itself can provide a set of such
> imperative collection types, which are designed to use a sufficiently
> similar internal structure that the conversion is even faster -- and in
> certain circumstances can be elided entirely. I am looking into this.
afaik, the Clojure folks have implemented something like that - the
possibility of building a collection imperatively in performance-
critical code, and successively use it functionally.
Alessio
> RG <rNOS...@flownet.com> writes:
Thanks for that explanation. I still don't agree with you, but at least
now I can see your side of it.
> What I initially took issue with was the idea that one should use
> generic collections everywhere preventively, so that a later decision to
> change a data type would not result in massive refactoring. I think this
> mistaken as general advice, since most data types won't change, yet the
> costs mentioned above will arise. Instead, I outlined a different
> approach which solves the refactoring problem in the cases where it
> appears, yet incurs much smaller costs in general.
But your approach has costs too: the cost of designing, implementing,
and maintaining the domain-specific abstractions, to say nothing of the
risk of serious problems down the road if you have to do a large
refactor on a live system. (Do you use Reddit? Have you noticed how
much down-time they've been having lately? Do you think that they may
be wishing now that they had a design that scaled more easily than
theirs does?)
In any particular instance those costs are almost certainly smaller than
the cost of designing and maintaining a generic abstraction, but the
cost of building and maintaining domain-specific abstractions is an
ongoing cost whereas the cost of developing and maintaining a generic
abstraction is a one-time cost (and at this point in history, arguably a
sunk cost) which then at least potentially pays dividends across a large
number of projects.
(I note in passing that it was in the dim and distant past not uncommon
to hear the same form of argument used to oppose high level languages in
general: developing and maintaining a HLL is expensive. The design of
the HLL must cater to some sort of least common denominator. Etc.)
rg
> In article <9mptyn4...@runa.se>, bj...@runa.se (Björn Lindberg)
> wrote:
>> What I initially took issue with was the idea that one should use
>> generic collections everywhere preventively, so that a later decision to
>> change a data type would not result in massive refactoring. I think this
>> mistaken as general advice, since most data types won't change, yet the
>> costs mentioned above will arise. Instead, I outlined a different
>> approach which solves the refactoring problem in the cases where it
>> appears, yet incurs much smaller costs in general.
>
> But your approach has costs too: the cost of designing, implementing,
> and maintaining the domain-specific abstractions,
No matter how generic collection abstractions I had access to, I still
would encapsulate important data structures in domain-specific
abstractions as a matter of software engineering. This cost, if it can
be counted as one, would be incurred in either case.
> to say nothing of the risk of serious problems down the road if you
> have to do a large refactor on a live system.
I have repeatedly explained how to address this just as well as with a
generic framework. Avoiding large refactorings in this sense is about
encapsulating relevant aspects so as not to proliferate a particular
design choice throughout the code. Deciding on what is relevant may be
difficult, but can be done at least as well in the application itself as
in a generic library that knows nothing about your program.
> In any particular instance those costs are almost certainly smaller
> than the cost of designing and maintaining a generic abstraction, but
> the cost of building and maintaining domain-specific abstractions is
> an ongoing cost whereas the cost of developing and maintaining a
> generic abstraction is a one-time cost (and at this point in history,
> arguably a sunk cost) which then at least potentially pays dividends
> across a large number of projects.
In the abstract(!) this is a good idea. The potential problem
specifically with collections is the enforced least common denominator
interface that will make a lot of important aspects of them unavailable
through the generic interface. Especially those aspects that cause you
to prefer one collection over another.
As far as code reuse go, it is certainly interesting to be able to reuse
implementations of algorithms and data structures. Reusing code just to
get to have to conform to some particular interface, which may or may
not be well suited to the application at hand, is less appealing. A
generic collection library that actually implemented collections and
operations upon them could be interesting for that reason. A library
whose only purpose were to serve as an abstraction layer for the built
in collections much less so. Such a layer is comparatively cheap to
construct in the application itself, with the benefit that it can be
adapted to the task at hand.
> (I note in passing that it was in the dim and distant past not uncommon
> to hear the same form of argument used to oppose high level languages in
> general: developing and maintaining a HLL is expensive. The design of
> the HLL must cater to some sort of least common denominator. Etc.)
I disagree that these are the same. I would rather note that one common
fallacy in computer science is the one of "false generalization", of
considering things to be the same which are not.
Björn Lindberg
I think another large barrier is, as with any other CL library, that it
adds another dependency to your system, which needs to be followed and
maintained.
You keep mentioning this LCD interface. Can you give an example of the
sort of functionality that you fear you would lose?
> As far as code reuse go, it is certainly interesting to be able to reuse
> implementations of algorithms and data structures. Reusing code just to
> get to have to conform to some particular interface, which may or may
> not be well suited to the application at hand, is less appealing. A
> generic collection library that actually implemented collections and
> operations upon them could be interesting for that reason. A library
> whose only purpose were to serve as an abstraction layer for the built
> in collections much less so. Such a layer is comparatively cheap to
> construct in the application itself, with the benefit that it can be
> adapted to the task at hand.
>
> > (I note in passing that it was in the dim and distant past not uncommon
> > to hear the same form of argument used to oppose high level languages in
> > general: developing and maintaining a HLL is expensive. The design of
> > the HLL must cater to some sort of least common denominator. Etc.)
>
> I disagree that these are the same. I would rather note that one common
> fallacy in computer science is the one of "false generalization", of
> considering things to be the same which are not.
Like for example?
rg
I would be interested in seeing a _single_ person here opposing a
generic collection framework which is well integrated in the language,
i.e. especially supporting the sequences which CL already has (also
typed arrays) and being integrated with LOOP.
[And, yes, I also use my own generic "dictionary" interface given mostly
by DIC-REF, DIC-FOR-EACH, DIC-REMOVE and DIC-EMPTY-P.]
Nicolas
IMHO the biggest single win would be some standard way to extend LOOP
for new datatypes. Ideally, this could be as simple as meaning that
'(loop for item in items...)' always Does What I Mean. It's not a bad
idea to have type-specific LOOP syntax for built-in type, but it's also
a good idea to have generic functionality.
And in my ideal world that same syntax would support hash tables and
hash-table-like objects.
Also, I'd have a pony *grin*
--
Robert A. Uhl
That's a tall order, particularly since LOOP is non-extensible. (Also,
there is a pretty strong anti-loop faction in the CL community as well.)
> [And, yes, I also use my own generic "dictionary" interface given mostly
> by DIC-REF, DIC-FOR-EACH, DIC-REMOVE and DIC-EMPTY-P.]
That's just my point: everyone who wants a generic collection rolls
their own. The discussion does not seem to proceed from the premise
that such things are desirable and focus on aspects of the design,
instead the discussion hangs up on whether or not such things are even
worth doing in the first place.
So let me try to change that. *My* generic collection library (yes, I
rolled my own as well) is even more generic than yours. It has only a
single REF form that can be applied to almost everything dereferencable
in CL:
? (ref '(a b c d e f) 3)
D
? (ref "abcdef" 3)
#\d
? (ref #(a b c d e f) 3)
D
It supports negative indices which count from the end of a sequence:
? (ref "abcdef" -1)
#\f
It replaces slot-value:
? (defclass c1 () ((a :initform 'one)))
#<STANDARD-CLASS C1>
? (ref (make-instance 'c1) 'a)
ONE
It supports arrays:
? (ref #2a((a b c) (d e f)) '(1 2))
F
And all this is even before we get to dictionaries, which work like this:
? (dict 'a 1 'b 2 'c 3)
#<DICTIONARY HASH-TABLE { A 1 C 3 B 2 }>
? (setf d1 *)
#<DICTIONARY HASH-TABLE { A 1 C 3 B 2 }>
? (ref d1 'a)
1
T
? (ref d1 'q)
NIL
? (ref d1 'q 'default)
DEFAULT
And of course it lets you change implementations on the fly:
? (change-implementation d1 'plist)
#<DICTIONARY PLIST { B 2 C 3 A 1 }>
It is not integrated into loop, but it is integrated into my iterator
library which expands into LOOP (and thus allows many of LOOP's features
to be used):
? (for (k v) in d1 collect (cons k v))
((B . 2) (C . 3) (A . 1))
FOR...IN is based on an iterator protocol, and it *is* extensible, so
using FOR you can loop over anything for which you can define an
iterator:
? (for c in "abc" collect c)
(#\a #\b #\c)
? (with-input-from-string (s "abc") (for c in s collect c))
(#\a #\b #\c)
Iterators are composable so you can do things like this:
? (setf s "line1
line2")
"line1
line2"
? (for c in (lines s) collect c)
("line1" "line2")
? (with-input-from-string (s s) (for c in (lines s) collect c))
("line1" "line2")
And just in case Bjorn is reading this (and no doubt recoiling in
horror) I also have my own class-defining form that automatically
defines appropriate accessors, so I don't *have* to go generic when I
know what kind of object I'm dealing with:
? (define-class frob slot1 slot2)
#<STANDARD-CLASS FROB>
? (frob-slot1 (make-frob :slot1 'foo))
FOO
So, does my design meet your criteria?
rg
http://rondam.blogspot.com/2008/02/joy-of-iterators.html
Lets you iterate over anything for which you care to define an iterator
method. Iterators can be composed. Expands into LOOP so you can do
loopy things. Some quick examples:
? h
#<HASH-TABLE :TEST EQUAL size 3/60 #x302000FD5E3D>
? (for (k v) in h collect (cons k v))
((A . 1) (C . 3) (B . 2))
? (for (a b c) in (n-at-a-time 3 "abcdefghi") collect (list a b c))
((#\a #\b #\c) (#\d #\e #\f) (#\g #\h #\i))
? (setf s "line1
line2")
"line1
line2"
? (for l in (lines s) collect l)
("line1" "line2")
Code is in here somewhere:
http://flownet.com/ron/lisp/rg-utils.lisp
> Also, I'd have a pony *grin*
Sorry, can't help you there.
rg
> That's a tall order, particularly since LOOP is non-extensible. (Also,
> there is a pretty strong anti-loop faction in the CL community as
> well.)
But I'm not part of it:-)
As much as I can see from the above, I like it. Some questions:
- Can you iterate with FOR over keys/values seperately?
- Does FOR expand in some generic "FOR-EACH"?
- Does it work (or would it be possible to make it work) on FSet
collections?
Anyway, you probably should publish that code. I would be interested in
trying it, especially if it comes under a reasonably liberal license
like BSD or LLGPL.
Nicolas
Woohoo! :-)
> Some questions:
>
> - Can you iterate with FOR over keys/values seperately?
Yes. The protocol for iterators that naturally produce multiple values
per iteration is to use multiple-value-bind (imagine that!) So by
default if you just provide one variable you'll get the keys:
? (dict 'a 1 'b 2 'c 3)
#<DICTIONARY HASH-TABLE { A 1 C 3 B 2 }>
? (setf d *)
#<DICTIONARY HASH-TABLE { A 1 C 3 B 2 }>
? (for k in d collect k)
(A C B)
If you want just the values you'd have to define a new iterator for
that, e.g.:
(defmethod values-of ((h hash-table))
(loop for v being the hash-values of h collect v))
Now you can do:
? (for v in (values-of (dictionary-implementation d)) collect v)
(1 3 2)
Of course, this is a Horrible Hack. To really do this the Right Way
you'd want to write a real iterator that actually iterated over the keys
and values using WITH-HASH-TABLE-ITERATOR rather than just collecting
them all into a list and then iterating over that. But unfortunately
there's a bug in CCL that prevents that from working at the moment.
> - Does FOR expand in some generic "FOR-EACH"?
FOR expands into LOOP:
? (pprint (macroexpand '(for k in d collect k)))
(LET ((#:ITERVAR2287 (ITERATOR D)))
(LOOP FOR K = (FUNCALL #:ITERVAR2287) UNTIL (EQ K +ITEREND+) COLLECT
K))
The real heart of the system is the ITERATOR generic function, which
returns a closure that yields successive values of the iteration on each
call, and terminates by returning the privileged value +iterend+. (It
is not clear that this is really the Right Thing. The Right Thing is
probably to call THROW or raise a condition. The current implementation
should be considered more of a proof-of-concept than a final design.)
> - Does it work (or would it be possible to make it work) on FSet
> collections?
Of course.
> Anyway, you probably should publish that code.
It's been out there for a long time.
http://rondam.blogspot.com/2008/02/joy-of-iterators.html
Code is here:
http://flownet.com/ron/lisp/rg-utils.lisp
> I would be interested in
> trying it, especially if it comes under a reasonably liberal license
> like BSD or LLGPL.
I released it to the public domain last year.
rg
>> I would be interested in
>> trying it, especially if it comes under a reasonably liberal license
>> like BSD or LLGPL.
>
> I released it to the public domain last year.
Which means that you remains the only one entitled with the right to
make a copy of it, and your heirs until 70 years after your death.
Since I'll be dead probably before then, you won't reproach me if I
don't thank you.
Could you please try the GPL?
--
__Pascal Bourguignon__ http://www.informatimago.com/
> RG <rNOS...@flownet.com> writes:
>
> >> I would be interested in
> >> trying it, especially if it comes under a reasonably liberal license
> >> like BSD or LLGPL.
> >
> > I released it to the public domain last year.
>
> Which means that you remains the only one entitled with the right to
> make a copy of it, and your heirs until 70 years after your death.
I don't know where you got that idea, but you are absolutely wrong.
Releasing the work to the public domain means I have forfeited all my
intellectual property rights and anyone can do anything they want with
it, including use it in commercial software without any constraints
whatsoever.
http://en.wikipedia.org/wiki/Public_domain
> Could you please try the GPL?
Actually, I can't. The GPL requires that the work be copyrighted. Once
a work enters the public domain then by definition it is no longer (and
can no longer be) copyrighted.
rg
AFAIK, not all legislations have the concept of public domain. In some
countries, you cannot give up your copyright on your work. So in
general, if you really want your work to be accessible to anyone, the
best option is to use a liberal open source license (such as MIT or
BSD).
> On 11 Ago, 23:05, RG <rNOSPA...@flownet.com> wrote:
> > In article <877hjwandt....@kuiper.lan.informatimago.com>,
> > p...@informatimago.com (Pascal J. Bourguignon) wrote:
> >
> > > RG <rNOSPA...@flownet.com> writes:
> >
> > > >> I would be interested in
> > > >> trying it, especially if it comes under a reasonably liberal license
> > > >> like BSD or LLGPL.
> >
> > > > I released it to the public domain last year.
> >
> > > Which means that you remains the only one entitled with the right to
> > > make a copy of it, and your heirs until 70 years after your death.
> >
> > I don't know where you got that idea, but you are absolutely wrong.
> > Releasing the work to the public domain means I have forfeited all my
> > intellectual property rights and anyone can do anything they want with
> > it, including use it in commercial software without any constraints
> > whatsoever.
> >
> > http://en.wikipedia.org/wiki/Public_domain
> >
> > > Could you please try the GPL?
> >
> > Actually, I can't. The GPL requires that the work be copyrighted. Once
> > a work enters the public domain then by definition it is no longer (and
> > can no longer be) copyrighted.
>
> AFAIK, not all legislations have the concept of public domain.
I am quite certain you are quite mistaken.
> In some countries, you cannot give up your copyright on your work.
Which countries are those exactly? What country do you live in?
> So in
> general, if you really want your work to be accessible to anyone, the
> best option is to use a liberal open source license (such as MIT or
> BSD).
OK, fine. I'll get right on that.
rg
=== I am not a lawyer ===
In countries operating under WIPO (184 now) technically you never can
surrender or transfer an *implicit* copyright - that's the one granted
by fact of authorship. In theory, implicit copyright is passed to the
estate and can persist as long as descendants claim it - but implicit
copyrights are worthless in practice.
Explicit copyrights (marked with a copyright notice) and registered
copyrights (explicit AND filed with the government copyright office)
can be transferred in whole or individual rights transferred or
conferred via contract. This is how the various open-source licenses
operate: the author retains "authorship" but confers some derivative
and distributions rights via the licence agreement.
However, by deliberately placing an explicit or registered work in
public domain, the copyright holder explicitly confers to the public
derivation and distributions rights to the work.
Enforcement varies quite a bit - particularly with respect to open
source software. AFAIK, most WIPO countries do recognize public
domain but not all of them routinely recognize other OSS ... and some
of them get downright nasty if they catch people with unsanctioned
non-PD software.
=== I am not a lawyer ===
George
It's in LibCL! :) :)
Alas, it's not clear LibCL itself is seeing much use. That's too bad --
Daniel has done a lot of work on it and I think he's done a good job.
If there's another "batteries included" library you think FSet should be
in, let me know. (It's also ASDF-Installable, or should be -- I haven't
tested that in some time.)
>> The other philosophical divergence between the CL approach and the FSet
>> approach is that the FSet collections are semantics-heavy: a collection
>> knows what kind of thing it is, what operations are applicable to it,
>> and what equivalence relation it should use for its elements. By
>> contrast, a CL list can be used in many different ways -- as a set, a
>> sequence, a bag, an alist, a plist, etc.; the semantics are not supplied
>> by the data structure, but rather by the operations that are used on it
>> and the arguments (such as :TEST) that are supplied to those operations.
>>
>> I suppose this second divergence is less difficult to overcome; one
>> simply says that if you're calling UNION on two FSet sets, it's an error
>> to supply a :TEST argument. Indeed, FSet signals an error in that case.
>
> Why must it be an error? I'd prefer to have it work with an
> arbitrary :test, albeit more slowly, presumably - or isn't it
> possible?
It doesn't make any sense. Each set which is passed to UNION already
has its own idea of what equivalence relation it uses. If you now try
to treat it as a set with a coarser equivalence relation, in general it
will now contain duplicate elements, which means it's not really a set
anymore.
I suppose one could define this to mean, convert both sets to use the
new equivalence relation (possibly thus making them smaller), and then
union them. But once you get used to the way FSet is designed, I think
you will agree this is pointless, because it would be vanishingly rare
to need it, and therefore it would be too obscure a behavior to have
built in. If you actually ever come across a case where you want to do
this, it would be better, I think, to write the conversions explicitly
so the reader of the code knows what you really mean.
> I don't quite get the point for semantics-heavy collections in
> general, as a design principle. I understand that some collections
> have to be semantics-heavy - for example hash-based data structures
> inherently depend on a given hashing function and a given equality
> relation - but in general, what's the benefit of encoding in the
> collection how it's going to be used and prematurely forbidding other
> uses?
I designed FSet to have semantics-heavy (maybe I should say
"high-level") collections because I know, from using other languages,
how convenient they are to use and how clear the resulting code is.
When I tell you that you can't supply :TEST to UNION of two FSet sets,
your reaction, naturally, is to think that something is being lost.
What you don't yet know is what you get in return. Having the
equivalence relation be implicit, rather than explicit in every
operation, is actually a great convenience.
In my experience, the vast majority of types have a single natural
equivalence relation that is part of the definition of the type. If one
finds oneself thinking in terms of two different equivalence relations
for the same type, I would argue those are really two different types.
FSet is a little odd at the moment because it doesn't even give you a
way to specify a non-default ordering/equivalence relation when you
create a collection. (When you create a type, you can specify the
ordering for that type, which is then used for all collections
containing the type.) I concede that collection-specific orderings
would be a desirable feature, but I haven't bothered to implement it yet
because I don't think it's very important. If you really want to
enumerate the elements of a set in some specific order that's different
from the default order, it's just not that hard to convert the set to a
list and sort it. If you want to continuously maintain the set in some
non-default order, you'll have to create a new type (at the very least,
a one-member struct that wraps each element of the set) and tell FSet to
use your desired ordering for that type. This would be a bit onerous,
but I've never had to do it.
>> (While you're at it, better integration of CL hash tables wouldn't be a
>> bad thing. Granted, hash tables can't meaningfully be concatenated or
>> even unioned(*), but at least ELT and (SETF ELT) could work on them.
>> (*) Come to think of it, maybe there is a possible meaning for the union
>> of two hash tables, if they're assumed to represent sets, i.e. only the
>> keys are significant.)
>
> For hashes, and dictionaries in general, one would need another API.
> ELT & friends are specified to work on objects of type SEQUENCE. In
> any case, I wouldn't use ELT for accessing dictionaries.
Why not??? A sequence is just a map whose domain is a subrange of the
integers. A hash table is a different kind of map. There are languages
where these two are accessed with identical syntax. (They're not all
obscure languages, either; JavaScript is one.) FSet uses LOOKUP (or @)
for both. Ron proposes using REF for both (see elsewhere in this
thread). And the problem with GETHASH is that it uses the "wrong"
argument order (at least, it's inconsistent with AREF and ELT, though
consistent with ASSOC; my own preference is that the collection come
before the index/key, and FSet uses this order consistently).
>> As for what is the largest barrier to adoption of FSet, I don't really
>> know but I would be surprised if it's an ease-of-use problem. It is a
>> bit unfamiliar, I know, but I put a lot of effort into the design to
>> make it as easy to use as I could, and wrote a tutorial besides.
>>
>> I think that the largest barrier is simply that people don't see the
>> need for it; they're happy with what they have. There are lots of
>> different kinds of programs and of programming in this world, and I
>> freely admit FSet is not appropriate for all of them. I think it would
>> be of some value for most kinds of programs, but perhaps it really
>> becomes massively useful only for the kind of work I do: static analysis
>> and automated reasoning.
>
> That may be true. However, a significant part of the hype around
> Clojure is precisely about its functional collections.
Ah yes, right -- when I wrote that I wasn't thinking about concurrency.
I think concurrency is the key motivator for functional data
structures (including collections) in Clojure.
I think somebody started a CL-STM project, but I don't know whether it
went anywhere. I haven't done much with concurrency myself.
-- Scott
> In article <9mpaaou...@runa.se>, bj...@runa.se (Björn Lindberg)
> wrote:
>> In the abstract(!) this is a good idea. The potential problem
>> specifically with collections is the enforced least common denominator
>> interface that will make a lot of important aspects of them unavailable
>> through the generic interface. Especially those aspects that cause you
>> to prefer one collection over another.
>
> You keep mentioning this LCD interface. Can you give an example of the
> sort of functionality that you fear you would lose?
I have given several examples already.
>> As far as code reuse go, it is certainly interesting to be able to reuse
>> implementations of algorithms and data structures. Reusing code just to
>> get to have to conform to some particular interface, which may or may
>> not be well suited to the application at hand, is less appealing. A
>> generic collection library that actually implemented collections and
>> operations upon them could be interesting for that reason. A library
>> whose only purpose were to serve as an abstraction layer for the built
>> in collections much less so. Such a layer is comparatively cheap to
>> construct in the application itself, with the benefit that it can be
>> adapted to the task at hand.
>>
>> > (I note in passing that it was in the dim and distant past not uncommon
>> > to hear the same form of argument used to oppose high level languages in
>> > general: developing and maintaining a HLL is expensive. The design of
>> > the HLL must cater to some sort of least common denominator. Etc.)
>>
>> I disagree that these are the same. I would rather note that one common
>> fallacy in computer science is the one of "false generalization", of
>> considering things to be the same which are not.
>
> Like for example?
This one. Or the idea that generic collection abstractions are "the
same" as generic arithmetic.
Björn Lindberg
Ok, probably I'm mistaken about that. However, I'm quite confident
about the proposition below.
> > In some countries, you cannot give up your copyright on your work.
>
> Which countries are those exactly? What country do you live in?
I live in Italy. I'm not a lawyer; doing some research it's clear that
public domain is automatically granted after copyright expires, but I
found no source that claims that you can voluntarily place your work
under the public domain. And since in Italy (but I guess in the whole
Europe, more or less) any work is copyrighted by default, it's very
unclear to me whether the fact that it's public domain in the US, say,
has any relevance.