I tend to replace every instance of creating classes with creating
structs which, if I understood correctly, are maps too. Good habit or
should structs not be abused?
I'm pretty sure structs are only appropriate for when you need to eek
the absolute last iota of performance out of a collection, in which case
they can provide greater speed than maps. But since the list of keys is
fixed, it means it's more effort to add or rename a key than it is with
a map.
You shouldn't trade that flexibility for speed until (0) you are pretty
sure the keys are not going to change soon and (1) you know you can't
get the speed you need from maps. Neither of these are true when you're
just starting out on a piece of code.
-Phil
Not really, I can assoc and dissoc as I wish and leave blank values I
wish. Any function can treat it as a map.
> You shouldn't trade that flexibility for speed until (0) you are pretty
> sure the keys are not going to change soon and (1) you know you can't
> get the speed you need from maps. Neither of these are true when you're
> just starting out on a piece of code.
I don't use it for performance reason but for semantic ones. For
instance, in my code, I have:
(defstruct polygon :points :color)
This line tells me when I reread that polygon is significant concept
and that its attributes should be points and color. I'm relatively
confident this isn't going to change soon and if it does, I'll just
have to change the defstruct and the places that create polygons. Not
a significant burden.
Even if there was not performance implications, I'd use structs.
However, that might be the wrong thing to do so that's why I'm asking.
> First up is contrib.sql, where insert-rows and insert-values both take
> a vector of column names followed by vectors of unlabeled values that
> must be in the same order as the corresponding columns. I would hope
> never to have such fragile things as those vectors in my programs.
For large data sets with a regular structure, insert-rows and insert-
values use the jdbc interface very efficiently. They are also
convenient building blocks for other functions to use.
> OTOH, the query and update APIs take and return maps.
clojure.contrib.sql now includes:
(defn insert-records
"Inserts records into a table. records are maps from strings or
keywords (identifying columns) to values."
[table & records]
(doseq [record records]
(insert-values table (keys record) (vals record))))
Here's an example from clojure.contrib.sql.test:
(defn insert-records-fruit
"Insert records, maps from keys specifying columns to values"
[]
(sql/insert-records
:fruit
{:name "Pomegranate" :appearance "fresh" :cost 585}
{:name "Kiwifruit" :grade 93}))
--Steve
Not really, I can assoc and dissoc as I wish and leave blank values I
> I'm pretty sure structs are only appropriate for when you need to eek
> the absolute last iota of performance out of a collection, in which case
> they can provide greater speed than maps. But since the list of keys is
> fixed, it means it's more effort to add or rename a key than it is with
> a map.
wish. Any function can treat it as a map.
> Close... you can assoc new keys into a struct instance, but you
> can't dissoc any of the basis keys.
That's right.
Given:
user=> (defstruct foo :a :b)
#'user/foo
user=> (def t (struct foo 3))
#'user/t
dissoc of a basis key throws an exception:
user=> (dissoc t :a)
java.lang.Exception: Can't remove struct key (NO_SOURCE_FILE:0)
I wonder if it's important to throw in this case or if it would be
more in keeping with the description:
"struct maps act just like maps, except they store their basis keys
efficiently"
if dissoc would associate nil ("nothing") with the key instead:
user=> (dissoc t :a)
{:a nil, :b nil}
Doing so would make the value associated with :a become the same as if
it had never been initialized, just like :b in this case.
Perhaps the choice between an exception and assoc'ing "nothing" comes
down to the distinction between:
"dissoc means remove this key from this map"
(where throwing an exception is clearly correct), and
"dissoc means remove any value associated with this key from this map"
(where assoc'ing nil might be preferable).
--Steve
> Do you have a case where the map-unpacking dominates the I/O time? Or
> is this just a speculative optimization?
I was talking about the distinction between sending N value sets
across the JDBC interface in one call vs. in N calls. Unpacking maps
on the Clojure side and making the same one call is a good idea. Thanks.
> I want to be clear, just because things come in maps doesn't mean you
> can't have a higher-performance insert-uniform-records that takes maps
> with identical sets of keys.
Good point.
To offer the most efficiency in unpacking, the API could include:
insert-records
each record treated independently
insert-uniform-records
all subsequent records contain at least all the keys of the first
unpack with select-keys
insert-structs
all records are structs with the same basis
unpack with vals
My current thinking is that insert-structs doesn't offer enough
benefit over insert-uniform-records to be worth including.
--Steve
Yeah, as much as I like maps, I feel like there are several common
uses cases for maps that require more work in Clojure than other
languages. The most obvious example is tagged structs. In Clojure,
you need to do a defstruct, and then make your own custom constructor
that adds the tag, possibly another custom constructor that emulates
struct-map but adds the tag, and possibly a predicate that tests for
the tag.
I particularly like the way the Mozart language makes tagged structs
(they call them records) one of the core data structures that the
language is built around.
See section 3.6 at http://www.mozart-oz.org/home/doc/tutorial/node3.html
I can see why the existing system is more flexible (you can have more
than one tag, or no tag, or make the tag part of the metadata, or use
different labels for the tag other than :tag or :type), but I keep
feeling like 90% of the time I'd be happy to just use a standard
tagged struct. The good news is that it's easy to write a macro to do
all the boilerplate. The bad news is that everyone will write
different macros that tag these structures in different ways, and it's
not clear to me how well code written based on different tagging
standards will coexist.
This is probably a good example (ordered pairs) of when the logical
entity is in fact not a map. I don't think Rich is advocating that
everything is a nail, because we have this great hammer.
I think using maps gives you much more flexibility. I don't disagree,
but I have trouble imagining exactly how it works sometimes. I guess I
just have too much of a Ruby/Java mindset. I keep thinking of a type
hierarchy and multimethods. How should you write the dispatch
function? I guess you could add a tag to the maps and use that to
dispatch. Or should you use a set of keys to dispatch (i.e. if the map
has :center, and :radius it is a circle, if it has :length and :width
it is a rectangle)? That can get messy for something slightly more
complicated. I guess you could just write a predicate for each type,
or a get-type function that does the check, but it still seems more
complicated than just declaring some classes. It's just a different
way of thinking for me that I have to get used to.
When reading Stu's book I found it interesting that you could declare
arbitrary type hierarchies using 'derive, so I know there are corners
of Clojure that I have not explored.
Paul
> I know people usually think of collections when they see vector/map/
> set, and they think classes and types define something else. However,
> the vast majority of class and type instances in various languages are
> actually maps, and what the class/type defines is a specification of
> what should be in the map. Many of the languages don't expose the
> instances as maps as such and in failing to do so greatly deprive the
> users of the language from writing generic interoperable code.
My own experience is mostly with Python. Python objects are indeed
essentially maps (Python calls them dictionaries). But even though it
is easy to obtain the map equivalent of any object (object.__dict__),
I hardly see this being done. Python programmers tend to use maps and
objects in very different ways, and that includes experienced
programmers who are very well aware that objects are just maps plus a
type tag plus a set of methods.
One reason why generic everything-is-a-map code is not very common is
that the majority of object definitions include specific constraints
on the maps, most of them of the form "the map must have a key :x
whose value is an integer". The object's methods don't make sense for
maps that don't satisfy the constraints, and most generic map
operations don't make sense on most objects because they are unaware
of the constraints and in particular don't satisfy them in their
return values.
The one area where I have seen uses of object.__dict__ is low-level
data massaging protocols, like serialization or storage in databases.
And that is indeed a good reason to use just a few fundamental data
structures to represent everything.
> Finally we have contrib.types, an algebraic data type system where
> constructors generate vectors of unnamed components as instances.
> Again, unnamed positional components are an onion of some algebraic
> data type systems, there's no need to repeat that.
Positional arguments do have their uses. In particular when there is
only one argument, it would be an unnecessary pain to have to name
it. On the other hand, I do see your point of having a uniform
internal representation in the form of maps.
> I'd very much like to see these libraries be interoperable, e.g. to
> store ADTs in a database or query them with Datalog, and I know that
> would be possible if they were all using maps consistently.
One problem I see with storing ADTs (or anything with a type tag) in
a database is that the metadata and thus the type tag would be lost
after a storage-retrieval cycle.
Konrad.
My own experience is mostly with Python. Python objects are indeed
On Mar 8, 2009, at 18:53, Rich Hickey wrote:
> I know people usually think of collections when they see vector/map/
> set, and they think classes and types define something else. However,
> the vast majority of class and type instances in various languages are
> actually maps, and what the class/type defines is a specification of
> what should be in the map. Many of the languages don't expose the
> instances as maps as such and in failing to do so greatly deprive the
> users of the language from writing generic interoperable code.
essentially maps (Python calls them dictionaries). But even though it
is easy to obtain the map equivalent of any object (object.__dict__),
I hardly see this being done. Python programmers tend to use maps and
objects in very different ways, and that includes experienced
programmers who are very well aware that objects are just maps plus a
type tag plus a set of methods.
It's interesting to compare a Python class with a dict inside to a
Clojure map with metadata "outside".
Interacting directly with a class dict feels a little dirty, because
you could be circumventing the API provided by the class methods,
making it easy to get the object into a bad state. Clojure's maps
being immutable reduces the amount of trouble you can cause by dealing
directly with the map.
Defining a instance method for a Python class allows you to connect
some code to your data, which internally uses a type pointer from the
instance to the class. In Clojure you can put functions directly in
the metadata (as clojure.zip does), or put a type tag in the map or in
the metadata, and use a multimethod dispatching on that to connect
code to your data.
Similarly, any inheritance in Clojure would normally be defined on a
keyword (or symbol or collection of either) that is in the map or the
map's metadata. In Python, the object knows its class, and the class
knows about the hierarchy.
I don't know if that leads to any particular conclusion. I suppose it
does suggests a trivial program (or a trivial part of a program) in
Clojure will likely have less code for setting up classes than the
Python equivalent -- you start with the data you actually need, and
can add "methods", polymorphism, etc. if needed later.
--Chouser
Defining a instance method for a Python class allows you to connect
some code to your data, which internally uses a type pointer from the
instance to the class. In Clojure you can put functions directly in
the metadata (as clojure.zip does), or put a type tag in the map or in
the metadata, and use a multimethod dispatching on that to connect
code to your data.
Similarly, any inheritance in Clojure would normally be defined on a
keyword (or symbol or collection of either) that is in the map or the
map's metadata. In Python, the object knows its class, and the class
knows about the hierarchy.
I don't know if that leads to any particular conclusion. I suppose it
does suggests a trivial program (or a trivial part of a program) in
Clojure will likely have less code for setting up classes than the
Python equivalent -- you start with the data you actually need, and
can add "methods", polymorphism, etc. if needed later.
> Interacting directly with a class dict feels a little dirty, because
> you could be circumventing the API provided by the class methods,
> making it easy to get the object into a bad state. Clojure's maps
> being immutable reduces the amount of trouble you can cause by dealing
> directly with the map.
Not really. Most map operations, such as assoc and dissoc, return a
map with the same metadata as the input map. The result thus looks
like being of a specific type, even if dissoc just removed a key that
is important for that type's semantics.
> I don't know if that leads to any particular conclusion.
My main conclusion is that Clojure's system is a lot more flexible
but also a lot more fragile. Any function can modify data of any
"type" (as defined by metadata), even without being aware of this.
Any function can at any time modify the global inheritance hierarchy
in any way it wants. Any module can add an implementation for any
type to any multimethod. That opens the way to many interesting
strategies for data handling, but also to errors that will probably
be hard to track down.
Konrad.
Modifying type tags without being aware of it? That sounds like FUD to
me. Using metadata is relatively atypical in the first place, and
modifying the :type tag without being aware of it sounds like an
extremely minimal risk.
> Any function can at any time modify the global inheritance hierarchy
> in any way it wants. Any module can add an implementation for any
> type to any multimethod. That opens the way to many interesting
> strategies for data handling, but also to errors that will probably
> be hard to track down.
>
> Konrad.
We've heard this line of reasoning before when moving from static to
dynamic languages. If having the power to do what you want with the
language scares you, then maybe Java is a better choice. All these
"hard to track down" bugs people worry about when having more
flexibility in the language don't seem to crop up often enough to drive
people away though.
Adding an implementation for a new type to a multimethod is equivalent
to adding an interface implementing method to a class you defined. So
for example you could add to-string or to-xml or to-bytes or whatever to
your own objects to make them interoperate with some existing library.
Having libraries built on top of abstract interfaces like this is
exactly what makes them interesting.
In Ruby you can open any built-in class you want, like String, and add
or modify any methods you want. In practice it happens rarely and
almost never causes problems.
-Jeff
>> My main conclusion is that Clojure's system is a lot more flexible
>> but also a lot more fragile. Any function can modify data of any
>> "type" (as defined by metadata), even without being aware of this.
>
> Modifying type tags without being aware of it?
Not modifying type tags, but modifying data that has a type tag
without being aware of the fact that the data has a type tag, and
thus perhaps specific constraints on its contents. The most basic
example is calling dissoc on a map to remove a key that is required
by the semantics of the type implemented as a map. dissoc is agnostic
about type tags, so it won't complain.
In this specific case, struct maps can be used to prevent a key from
being removed, but that's a solution only for this specific case, and
not necessarily a simple one to implement.
> We've heard this line of reasoning before when moving from static to
> dynamic languages. If having the power to do what you want with the
> language scares you, then maybe Java is a better choice. All these
It doesn't scare me, otherwise I wouldn't be using Clojure. And I
wouldn't be using Python as my main language either. However, I think
it is important to be aware of the risks in order to watch out for them.
> Adding an implementation for a new type to a multimethod is equivalent
> to adding an interface implementing method to a class you defined. So
> for example you could add to-string or to-xml or to-bytes or
> whatever to
> your own objects to make them interoperate with some existing library.
> Having libraries built on top of abstract interfaces like this is
> exactly what makes them interesting.
I agree, of course. And yet, it is important to be aware of the
consequences. For example, don't ever try to memoize the dispatching
function of a multimethod - its result may well change after
importing another library module.
Konrad.
Ahh, I see what you were getting at, and it is a more interesting point
than I originally realized. I guess to achieve this level of safety
while still being map-compatible you would need to implement the
Associative interface and maintain the constraints using getter/setters
or something of the sort.
>> We've heard this line of reasoning before when moving from static to
>> dynamic languages. If having the power to do what you want with the
>> language scares you, then maybe Java is a better choice. All these
>
> It doesn't scare me, otherwise I wouldn't be using Clojure. And I
> wouldn't be using Python as my main language either. However, I think
> it is important to be aware of the risks in order to watch out for them.
>
>> Adding an implementation for a new type to a multimethod is equivalent
>> to adding an interface implementing method to a class you defined. So
>> for example you could add to-string or to-xml or to-bytes or
>> whatever to
>> your own objects to make them interoperate with some existing library.
>> Having libraries built on top of abstract interfaces like this is
>> exactly what makes them interesting.
>
> I agree, of course. And yet, it is important to be aware of the
> consequences. For example, don't ever try to memoize the dispatching
> function of a multimethod - its result may well change after
> importing another library module.
>
> Konrad.
True. Sorry if I came out swinging in the last message. I've gotten
sick of static typers spreading FUD about languages like Clojure not
being usable for "real" or large pieces of software. Your points are
well taken. Although I don't find them to be great risks, it is
worthwhile to understand them.
-Jeff