Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

structures vs hash tables

232 views
Skip to first unread message

Frank Zalkow

unread,
Apr 9, 2014, 5:39:40 AM4/9/14
to
Dear Lispers,

both structures and hash tables map keys to values. I would like to
know what are the advantages and disadvantages of those and what are
more idiomatic to use in what cases.

As far as I see main differences are
- hash-tables can grow, structure slots cannot
- structures have a distinct type, hash tables haven't
- hash-tables do have arbitrary keys, structures do have symbols as
slot names
- structures have some extras like read-only slots etc., that hash
tables do not have

A thing I am not sure about: Hash-Tables are hold in memory similar to
arrays. Structures can be of type VECTOR or LIST - but I think it is
rather uncommon to specify the type. It is more common to leave it
unspecified and than they are neither of type VECTOR nor of type LIST,
they are of their own type and they are stored in memory in a
implementation-dependent way. What it is a common choice for this
implementation-dependent way? It is like a hash table with fixed size?

I did this very simple and unscientific time test in LispWorks and CMUCL:
(defun test-hash ()
(let ((foo (make-hash-table)))
(setf (gethash 'a foo) 1)
(setf (gethash 'b foo) 2)
(setf (gethash 'c foo) 3)
foo))

(defstruct my-struct a b c)
(defun test-struct ()
(let ((foo (make-my-struct)))
(setf (my-struct-a foo) 1)
(setf (my-struct-b foo) 2)
(setf (my-struct-c foo) 3)
foo))

Uncompiled, test-struct is faster in LispWorks whereas test-hash is
faster in CMUCL. Compiled, test-hash is faster in both implementations.

Thanks for your ideas!

Best wishes,

Frank

Pascal J. Bourguignon

unread,
Apr 9, 2014, 7:25:55 AM4/9/14
to
Frank Zalkow <fza...@gmail.com> writes:

> both structures and hash tables map keys to values.

Wrong.

Structures don't map keys. Actually, the ANSI Common Lisp does all it
can to dissuade the reader believing that structure may ever map keys:

"defstruct defines readers for the slots and arranges for setf to
work properly on such reader functions."

Those are very strong words, in the mouth of the standard…


Try to define the functions:

(defgeneric get-value (container key))
(defgeneric set-value (container key value))


with hash-tables, it's trivial:

(defmethod get-value ((container hash-table) key)
(multiple-value-bind (value presentp) (gethash key container)
(if presentp
value
(error "key ~S not present" key))))

(defmethod set-value ((container hash-table) key value)
(multiple-value-bind (old-value presentp) (gethash key container)
(if presentp
(setf (gethash key container) value)
(error "key ~S not present" key))))

with structures, it's a little more complex:

(defstruct point x y z)

(defmethod get-value ((container point) key)
(ecase key
((x) (point-x container))
((y) (point-y container))
((z) (point-z container))))

(defmethod set-value ((container point) key value)
(ecase key
((x) (setf (point-x container) value))
((y) (setf (point-y container) value))
((z) (setf (point-z container) value))))



> I would like to know what are the advantages and disadvantages of
> those and what are more idiomatic to use in what cases.
>
> As far as I see main differences are
> - hash-tables can grow, structure slots cannot
> - structures have a distinct type, hash tables haven't
> - hash-tables do have arbitrary keys, structures do have symbols as
> slot names
> - structures have some extras like read-only slots etc., that hash
> tables do not have


So you know it.


> A thing I am not sure about: Hash-Tables are hold in memory similar to
> arrays.

It is not specified.



> Structures can be of type VECTOR or LIST - but I think it is
> rather uncommon to specify the type. It is more common to leave it
> unspecified and than they are neither of type VECTOR nor of type LIST,
> they are of their own type and they are stored in memory in a
> implementation-dependent way. What it is a common choice for this
> implementation-dependent way?

Look at the sources of the implementations.
You have the FREEDOM to do that!
A FREEDOM that has been won by hard work and a long sociological
combat. Use it! GO READ THE FINE SOURCES!!!




> It is like a hash table with fixed size?

No, it's like a red sound.



> I did this very simple and unscientific time test in LispWorks and CMUCL:
> (defun test-hash ()
> (let ((foo (make-hash-table)))
> (setf (gethash 'a foo) 1)
> (setf (gethash 'b foo) 2)
> (setf (gethash 'c foo) 3)
> foo))
>
> (defstruct my-struct a b c)
> (defun test-struct ()
> (let ((foo (make-my-struct)))
> (setf (my-struct-a foo) 1)
> (setf (my-struct-b foo) 2)
> (setf (my-struct-c foo) 3)
> foo))
>
> Uncompiled, test-struct is faster in LispWorks whereas test-hash is
> faster in CMUCL. Compiled, test-hash is faster in both implementations.

There you are…

--
__Pascal Bourguignon__
http://www.informatimago.com/
"Le mercure monte ? C'est le moment d'acheter !"

Frank Zalkow

unread,
Apr 9, 2014, 9:18:53 AM4/9/14
to
Thanks for your reply.
It is a good hint that keys/values are very different from slots.

To be honest it would be a very hard and highly time consuming
work to go through the source of a couple implementations and
comprehend how it is exactly done. Undoubtedly it would be
fruitful for my skills but I think a cannot do this effort
soon. That's why I asked you more experienced Lisp programmers
for explanation. If that is seen as undue laziness, I apologise.

So I am right if I assume that for a simple task that could be
solved with structures as good as with hash tables, in principle
the latter is the batter choice due to performance?

Am Mittwoch, 9. April 2014 13:25:55 UTC+2 schrieb informatimago:
> Frank Zalkow <fza...@gmail.com> writes:
>
>
>
> > both structures and hash tables map keys to values.
>
>
>
> Wrong.
>
>
>
> Structures don't map keys. Actually, the ANSI Common Lisp does all it
>
> can to dissuade the reader believing that structure may ever map keys:
>
>
>
> "defstruct defines readers for the slots and arranges for setf to
>
> work properly on such reader functions."
>
>
>
> Those are very strong words, in the mouth of the standard...
> There you are...

Dimitri Fontaine

unread,
Apr 9, 2014, 9:29:05 AM4/9/14
to
Frank Zalkow <fza...@gmail.com> writes:
> So I am right if I assume that for a simple task that could be
> solved with structures as good as with hash tables, in principle
> the latter is the batter choice due to performance?

This might brings more confusion or help you decide:

CL-USER> (defstruct foo a b c)
FOO

CL-USER> (let ((h (make-hash-table))
(s (make-foo :a 1 :b 2 :c 3)))
(setf (gethash 's h) s)
(gethash 's h))
#S(FOO :A 1 :B 2 :C 3)
T

CL-USER> (let ((h (make-hash-table))
(s (make-foo :a 1 :b 2 :c 3)))
(setf (gethash s h) "some value")
(gethash s h))
"some value"
T

Really, I don't know that I would compare structs and hash tables.

It seems to me that structs are all about declaring your data types,
making it easier to use them, and hash data all about data access
patterns.

Regards,
--
dim

Pascal J. Bourguignon

unread,
Apr 9, 2014, 10:23:03 AM4/9/14
to
Frank Zalkow <fza...@gmail.com> writes:

> Thanks for your reply.
> It is a good hint that keys/values are very different from slots.
>
> To be honest it would be a very hard and highly time consuming
> work to go through the source of a couple implementations and
> comprehend how it is exactly done. Undoubtedly it would be
> fruitful for my skills but I think a cannot do this effort
> soon. That's why I asked you more experienced Lisp programmers
> for explanation. If that is seen as undue laziness, I apologise.
>
> So I am right if I assume that for a simple task that could be
> solved with structures as good as with hash tables, in principle
> the latter is the batter choice due to performance?


Mathematically, the two data structures are different.

Structures are product sets.

If you want to represent a colored 2D point, you can use the set:

Colored-Point = Color × ℝ × ℝ

with the functions: (colored-point-color k) = c
(colored-point-x k) = x
(colored-point-y k) = y

for a Colored-Point k=(c,x,y).

On the other hand, hash-tables are mathematically, functions (maps).
For example, the hash-table H is a function (H key) = value.


Of course, for each element k of the Colored-Point structure, you can
define a function

K: {C,X,Y} ---> Color ∪ ℝ ∪ ℝ
C |--> c
X |--> x
Y |--> y


But the difference is that hash-table are dynamic data structures, where
the set of keys (the domain of the function) is variable at run-time:
you just write (setf (gethash new-key H) new-value) to add a point to
he function/hash-table.

While changing the structure can be done only at compilation time, and
the functions K are not defined automatically by the language, you have
to write them yourself (as I did in my previous post), using a CASE
instruction.


As you've noticed, performance depends on the implementation and other
factor. If you are developping an application where performance will be
important, my advice would be to abstract away this implementation (for
your program) choice.
See for example com.informatimago.common-lisp.cesarum.dictionary

Raymond Wiker

unread,
Apr 9, 2014, 1:24:12 PM4/9/14
to
Frank Zalkow <fza...@gmail.com> writes:

> Thanks for your reply.
> It is a good hint that keys/values are very different from slots.
>
> To be honest it would be a very hard and highly time consuming
> work to go through the source of a couple implementations and
> comprehend how it is exactly done. Undoubtedly it would be
> fruitful for my skills but I think a cannot do this effort
> soon. That's why I asked you more experienced Lisp programmers
> for explanation. If that is seen as undue laziness, I apologise.
>
> So I am right if I assume that for a simple task that could be
> solved with structures as good as with hash tables, in principle
> the latter is the batter choice due to performance?

Only if by "batter" you mean "badder", which should really be "worse" :-)

If you have a known set of slots (key names), structures or classes are
most likely going to give you much better performance than hash-tables.

Kaz Kylheku

unread,
Apr 9, 2014, 3:33:16 PM4/9/14
to
On 2014-04-09, Frank Zalkow <fza...@gmail.com> wrote:
> Thanks for your reply.
> It is a good hint that keys/values are very different from slots.

In a nutshell, Common Lisp isn't Javascript.

Some languages have a quick and dirty implementation of structures or objects
which are just hashes in disguise.

They will have to rethink that strategy when it comes time to develop a
compiler, and the users expect structure member access to be scarcely
a machine cycle more expensive than accessing a variable on the stack.

D Herring

unread,
Apr 9, 2014, 10:21:18 PM4/9/14
to
On 04/09/2014 05:39 AM, Frank Zalkow wrote:
> both structures and hash tables map keys to values. I would like to
> know what are the advantages and disadvantages of those and what are
> more idiomatic to use in what cases.

In CL, a hash table is a data structure containing a set of key/value
pairs. It is unspecified whether this is a list, array, or other data
structure. The gethash function does some computation with the key to
find the associated value.

A CL structure gives names to memory locations. Whether this memory
is stored as a list, in an array, or otherwise is mostly unspecified.
There are ways to specify that a list should be used, but alas the
standard does not provide support for other specifications. The
structure accessors are similar to gethash, in that they find a value
given the name. However, the list of accessors is static, and thus
more compile-time optimizations are available. For example, an
accessor function can be replaced by a static offset.

As to idiom: OOP code usually uses defstruct or defclass. The
defstruct objects generally optimize the best, and the defclass
objects provide more features at the cost of some performance. Hash
tables are usually only used when the set of keys is dynamically
changing, or statically known but sparsely used, or in other
situations that do not fit the defstruct or defclass APIs. When the
number of keys and values is small (under a hundred or so), an alist
or plist is generally faster than a hash table. (Linear lookup is
faster than the hash function and tree traversal for small N.)

Later,
Daniel

Pascal J. Bourguignon

unread,
Apr 10, 2014, 4:10:29 AM4/10/14
to
D Herring <dher...@at.tentpost.dot.com> writes:

> On 04/09/2014 05:39 AM, Frank Zalkow wrote:
>> both structures and hash tables map keys to values. I would like to
>> know what are the advantages and disadvantages of those and what are
>> more idiomatic to use in what cases.
>
> In CL, a hash table is a data structure containing a set of key/value
> pairs. It is unspecified whether this is a list, array, or other data
> structure. The gethash function does some computation with the key to
> find the associated value.
>
> A CL structure gives names to memory locations. Whether this memory
> is stored as a list, in an array, or otherwise is mostly
> unspecified. There are ways to specify that a list should be used, but
> alas the standard does not provide support for other specifications.

Nothing more wrong.

The standard specifies there different types of structures:

- list,
- vector, and
- structure.


Actually, there's even an infinite number of different type structures
if you consider that teh standard allows to specify a specialized vector
type!



What about reading CLHS?

Marco Antoniotti

unread,
Apr 10, 2014, 5:00:05 AM4/10/14
to
On Wednesday, April 9, 2014 3:18:53 PM UTC+2, Frank Zalkow wrote:
> Thanks for your reply.
>
> It is a good hint that keys/values are very different from slots.
>
> To be honest it would be a very hard and highly time consuming
> work to go through the source of a couple implementations and
> comprehend how it is exactly done. Undoubtedly it would be
> fruitful for my skills but I think a cannot do this effort
> soon. That's why I asked you more experienced Lisp programmers
> for explanation. If that is seen as undue laziness, I apologise.
>
> So I am right if I assume that for a simple task that could be
> solved with structures as good as with hash tables, in principle
> the latter is the batter choice due to performance?

First get it right, then get it fast. This is particularly true for "simple tasks", which usually ballon out of proportion in usually unpredictable ways.

If you use structures (or classes) you get "infrastructure" in Common Lisp that you don't get (out of the box) with hash tables.

Hash tables are good when you really need a "Dictionary" data type that allows for dynamic insertions, search and deletion. But with structures (which, BTW may be faster in implementation X), you can do things like the following...

(defstruct student name id)

(defvar *the-students* #| build here your list of students |#)

(defvar *students-ids*
(and (every #'student-p *the-students*) ; Redundant, but just to make the point.
(mapcar #'student-id *the-students*)))


You do not get the predicate STUDENT-P and the function STUDENT-ID if you want to use hash tables. You have to build them by yourself, with all the caveats of the case.

Therefore the advice is: performance is evil until you program is correct and you should use whatever makes your program easier to read and understand. Therefore use structures unless you really need a dictionary. Besides, association lists are even faster than hash tables on many implementations for small dictionary sizes (YMMV).

Having said that, DEFSTRUCT base a :TYPE option that is limited to LIST, VECTOR and (VECTOR <element-type>) by the ANSI. It key be able to accommodate a HASH-TABLE too.

Any taker to write a CDR on this? :) :)

Cheers
--
MA

D Herring

unread,
Apr 10, 2014, 8:20:11 PM4/10/14
to
On 04/10/2014 04:10 AM, Pascal J. Bourguignon wrote:
> D Herring <dher...@at.tentpost.dot.com> writes:
>
>> On 04/09/2014 05:39 AM, Frank Zalkow wrote:
>>> both structures and hash tables map keys to values. I would like to
>>> know what are the advantages and disadvantages of those and what are
>>> more idiomatic to use in what cases.
>>
>> In CL, a hash table is a data structure containing a set of key/value
>> pairs. It is unspecified whether this is a list, array, or other data
>> structure. The gethash function does some computation with the key to
>> find the associated value.
>>
>> A CL structure gives names to memory locations. Whether this memory
>> is stored as a list, in an array, or otherwise is mostly
>> unspecified. There are ways to specify that a list should be used, but
>> alas the standard does not provide support for other specifications.
>
> Nothing more wrong.
>
> The standard specifies there different types of structures:
>
> - list,
> - vector, and
> - structure.

http://www.lispworks.com/documentation/HyperSpec/Body/m_defstr.htm

So I forgot some of the type-options...

Epic fail!

> What about reading CLHS?

...

Steve Gonedes

unread,
Apr 17, 2014, 10:39:35 PM4/17/14
to
Frank Zalkow <fza...@gmail.com> writes:

< To be honest it would be a very hard and highly time consuming
< work to go through the source of a couple implementations and
< comprehend how it is exactly done. Undoubtedly it would be
< fruitful for my skills but I think a cannot do this effort
< soon. That's why I asked you more experienced Lisp programmers
< for explanation. If that is seen as undue laziness, I apologise.

I agree...

>> > both structures and hash tables map keys to values.

Yes. Structures require an accessor function such ast elt <seq> index.
Hash tables would most likely hash the key to find a slot with a few
chains of values. How full does the table get? Also weak hashtables
keys, or weak references comes to mind ...

< So I am right if I assume that for a simple task that could be
< solved with structures as good as with hash tables, in principle
< the latter is the batter choice due to performance?

I think that when using lisp many would recommend that performance comes
second to flexibility and understanding your problem.

Frank Zalkow

unread,
May 8, 2014, 1:14:02 PM5/8/14
to
Just wanted to say thank you to you all for your ideas! It helped me getting things clear to me. I saw hashes and structs way closer than they actually are.

Best

Frank
0 new messages