Why no anonymous record types, or (defstruct, create-struct) vs (defrecord, ???)

197 views
Skip to first unread message

John McDonald

unread,
Nov 14, 2011, 5:09:47 PM11/14/11
to Clojure
Structmaps can be defined either named, thru the defstruct macro, or
anonymously, thru the create-struct function call. Record types must
be named and defined thru a call to defrecord.

This seems to contradict one of the Clojure library coding standards
(http://dev.clojure.org/display/design/Library+Coding+Standards):
"Don't use a macro when a function can do the job. If a macro is
important for ease-of-use, expose the function version as well."

My particular problem is the following: I read a set of data, say from
a CSV file with column headers. I produce a representation of the
data as a sequence of cases where each case is a a collection of
annotated key-value pairs. The annotation describes the possible
values that can be associated with a given key, and is usually
determined by introspection on the values that I just read. The
annotation is used to determine details of how certain machine
learning methods can be applied to that data.

I want the case representation to be as compact and speedy as
possible.
The number of cases x number of keys may exceed 10^9, so I don't want
to repeat the actual keys and annotation with each case, and so using
bare maps seems unlikely to be a good solution.

So it looks like the choice is between struct-maps and records. Many
of the values are likely to be byte, short, etc., and it will be
important to save space with an un-boxed representation.

It seem record plus type hints would be the best choice. I can work
around the missing function, by writing my own macro that calls eval,
but that doesn't smell nice.

(PS: 'create-struct' is an unfortunately confusing name. It sounds
more like instantiation of an existing structure definition than a new
definition. I'd suggest 'define-record' or 'create-record-definition'
for the function equivalent of 'defrecord'.)

Stephen Compall

unread,
Nov 19, 2011, 5:15:40 PM11/19/11
to John McDonald, Clojure
On Mon, 2011-11-14 at 14:09 -0800, John McDonald wrote:
> Structmaps can be defined either named, thru the defstruct macro, or
> anonymously, thru the create-struct function call. Record types must
> be named and defined thru a call to defrecord.

1.2-style records and types are very different from structs, so don't
assume that what applies to one reasonably applies to the other.
Namely:

> This seems to contradict one of the Clojure library coding standards
> (http://dev.clojure.org/display/design/Library+Coding+Standards):
> "Don't use a macro when a function can do the job. If a macro is
> important for ease-of-use, expose the function version as well."

There is no function version of defrecord to expose, so this standard
doesn't apply.

defrecord needs a great deal of compiler support. It doesn't provide an
ease-of-use wrapper for a simple Java method or function call, like
defstruct does; it wraps deftype*, a compiler special form (primitive)
whose contents must be present at compile-time for it to work.

> So it looks like the choice is between struct-maps and records. Many
> of the values are likely to be byte, short, etc., and it will be
> important to save space with an un-boxed representation.
>
> It seem record plus type hints would be the best choice. I can work
> around the missing function, by writing my own macro that calls eval,
> but that doesn't smell nice.

Each defrecord creates a Java class, normally stuck in memory forever
unless you mess with the GC settings. I can't remember the details, but
each eval might make a class, too, depending on its complexity. Offset
the savings you get with unboxed representations by these costs when
evaluating records as your solution.


--
Stephen Compall
^aCollection allSatisfy: [:each|aCondition]: less is better

Stuart Sierra

unread,
Nov 28, 2011, 5:16:28 PM11/28/11
to Clojure
There are other possibilities:

* using interned Strings as keys will prevent duplicate storage of the
keys
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#intern%28%29

* you could make a custom data structure that stores the keys / rows
as vectors and generates a sequence of maps when you want to iterate
over the rows

Type hinted record fields only matter for memory usage when you're
hinting to primitive types. Everything else in Java, including
Strings, is an object, with the same memory overhead.

-Stuart Sierra
clojure.com

Reply all
Reply to author
Forward
0 new messages