Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?

82 views
Skip to first unread message

dpapathanasiou

unread,
Nov 3, 2006, 1:43:25 PM11/3/06
to
I have a hash table I want to serialize as an S-expression using PRIN1.

Since PRIN1 does not preserve the hash contents to the output file (it
writes just the hash handle, e.g. "#<EQUALP hash table, 11723 entries
{581579ED}>"), I've written a simple routine to convert the hash to a
two dimensional array instead:

(defun hash-to-2D-array (hash)
(let ((elems (hash-table-count hash)))
(when (> elems 0)
(let ((index (make-array (list elems 2)))
(counter 0))
(maphash #'(lambda (k v)
(setf (aref index counter 0) k)
(setf (aref index counter 1) v)
(incf counter))
hash)
index))))

Because the hash table is simple (each key is a string and each value
is a list of strings), using PRIN1 on the resulting array does
serialize all the content.

Although it's functional, this conversion method is slow: for example,
even for hash tables containing around 25,000 entries, it takes several
seconds to produce the array.

Is there a better way of writing (hash-to-2D-array) or is there a
better way to serialize a simple hash table without having to convert
to an array first?

Pascal Bourguignon

unread,
Nov 3, 2006, 2:58:12 PM11/3/06
to
"dpapathanasiou" <denis.pap...@gmail.com> writes:

I don't understand. Do you need to print a hash table, or do you need
to convert it to a 2D array?

(defpackage "MINE" (:use "CL"))

(defun mine::hash-table (&key size test rehash-size rehash-threshold elements)
(let ((table (make-hash-table :size size :test test
:rehash-size rehash-size
:rehash-threshold rehash-threshold)))
(dolist (item elements table)
(setf (gethash (first item) table) (second item)))))

(defmethod print-object ((self hash-table) stream)
(format stream "#.(MINE::HASH-TABLE :SIZE ~D :TEST (FUNCTION ~S) ~%~
~& :REHASH-SIZE ~A :REHASH-THRESHOLD ~A~%~
~& :ELEMENTS '("
(hash-table-count self) (hash-table-test self)
(hash-table-rehash-size self) (hash-table-rehash-threshold self))
(maphash (lambda (k v) (format stream "~%(~S ~S)" k v)) self)
(format stream "))")
self)


(print-object (hash-table :size 3 :test (function equal)
:rehash-size 1.5 :rehash-threshold 0.75
:elements '((a 1) (b 2) (c 3))) t)
prints:
#.(MINE::HASH-TABLE :SIZE 3 :TEST (FUNCTION EXT:FASTHASH-EQUAL)
:REHASH-SIZE 1.5S0 :REHASH-THRESHOLD 0.75s0
:ELEMENTS '(
(C 3)
(B 2)
(A 1)))
returns:
#S(HASH-TABLE :TEST EXT:FASTHASH-EQUAL (C . 3) (B . 2) (A . 1))

Of course, the easiest way would be to use the right implementation...


[53]> (let ((h (mine::hash-table :size 25000 :test (function EXT:FASTHASH-EQUAL)
:rehash-size 1.5s0 :rehash-threshold 0.75s0
:elements (loop :for i :from 0 :below 25000 :collect (list (format nil "~R" i) i)))))
(time (prog1 nil (with-output-to-string (out) (print-object h out)))))
Real time: 2.427929 sec.
Run time: 2.416151 sec.
Space: 44662496 Bytes
GC: 21, GC time: 0.832054 sec.
NIL
[54]>


2.5 seconds not even compiled!

--
__Pascal Bourguignon__ http://www.informatimago.com/
Un chat errant
se soulage
dans le jardin d'hiver
Shiki

dpapathanasiou

unread,
Nov 3, 2006, 3:17:57 PM11/3/06
to
Pascal Bourguignon wrote:
> I don't understand. Do you need to print a hash table, or do you need
> to convert it to a 2D array?

The goal is to be able to serialize the hash table to a file, so that
it can be read back into memory as the same hash table object at a
later time, either by the original process or a completely different
process.

The nice thing about converting the hash to an array is that when I
read the file containing the array (i.e. the file created by passing
the array to PRIN1), it is loaded and recongized immediately as an
array object -- of course, I then have to convert the array to a hash
table to get back to the point I want.

So while I recognize that your suggestion will indeed print the hash
contents, what happens when I try to read that output back into memory?


I'll still need some kind of conversion function to get me back to the
original hash table object, correct?

Zach Beane

unread,
Nov 3, 2006, 3:40:42 PM11/3/06
to
"dpapathanasiou" <denis.pap...@gmail.com> writes:

> Although it's functional, this conversion method is slow: for example,
> even for hash tables containing around 25,000 entries, it takes several
> seconds to produce the array.

It seems likely to me that producing the array is very fast, but
printing it (especially if *PRINT-PRETTY* is T) will take some time.

What happens if you do something like:

(progn (hash-to-2D-array hash) nil)

Is it faster than just (hash-to-2D-array hash)?

Zach

dpapathanasiou

unread,
Nov 3, 2006, 4:15:16 PM11/3/06
to
Zach Beane wrote:
> What happens if you do something like:
>
> (progn (hash-to-2D-array hash) nil)
>
> Is it faster than just (hash-to-2D-array hash)?

Zach, that was brilliant!

You're right, the array conversion is fast, but calling PRIN1 with a
large array object causes the delay.

Although *PRINT-PRETTY* was set to T, I didn't think it mattered in
prod because I run all my complied code in -quiet mode.

So even without changing the default *PRINT-PRETTY* value (though I
need to research that a bit more), wrapping the PRIN1 call around
(progn ... nil) made a *huge* difference in performance.

Many thanks for pointing it out.

Otto Diesenbacher

unread,
Nov 3, 2006, 5:32:14 PM11/3/06
to
"dpapathanasiou" <denis.pap...@gmail.com> writes:

> The goal is to be able to serialize the hash table to a file, so that
> it can be read back into memory as the same hash table object at a
> later time, either by the original process or a completely different
> process.

[...]


> I'll still need some kind of conversion function to get me back to the
> original hash table object, correct?

just for serialization and deserialization - do you know:
http://common-lisp.net/project/cl-store/

okflo

--
Otto Karl Florian Diesenbacher

Pascal Bourguignon

unread,
Nov 3, 2006, 6:18:27 PM11/3/06
to
"dpapathanasiou" <denis.pap...@gmail.com> writes:

Perhaps you should read my previous post better...

--
__Pascal Bourguignon__ http://www.informatimago.com/

Nobody can fix the economy. Nobody can be trusted with their finger
on the button. Nobody's perfect. VOTE FOR NOBODY.

Szymon

unread,
Nov 4, 2006, 8:57:10 AM11/4/06
to
dpapathanasiou wrote:

> I'll still need some kind of conversion function to get me back to the
> original hash table object, correct?

It's already in Pascal's code.

dpapathanasiou

unread,
Nov 4, 2006, 12:24:59 PM11/4/06
to
Pascal Bourguignon wrote:
> Perhaps you should read my previous post better...

Well, ever since I saw this post of yours (and I'm sure you were
probably joking):
http://groups.google.com/group/comp.lang.lisp/browse_thread/thread/88a36fb2c239a44e/4c95766a8bdda347?lnk=gst&q=&rnum=63#4c95766a8bdda347
I'm less likely to read your replies very thoroughly.

dpapathanasiou

unread,
Nov 4, 2006, 12:52:12 PM11/4/06
to

Otto Diesenbacher wrote:
> just for serialization and deserialization - do you know:
> http://common-lisp.net/project/cl-store/

Thanks for mentioning this -- I'll definitely try it.

Pascal Bourguignon

unread,
Nov 4, 2006, 1:10:45 PM11/4/06
to
"dpapathanasiou" <denis.pap...@gmail.com> writes:

This post was written to motivate, on the contrary, more VERY throughful reading.

--
__Pascal Bourguignon__ http://www.informatimago.com/

THIS IS A 100% MATTER PRODUCT: In the unlikely event that this
merchandise should contact antimatter in any form, a catastrophic
explosion will result.

Rob Thorpe

unread,
Nov 6, 2006, 7:50:19 AM11/6/06
to

Pascal's code provides this. It serializes the hash table to a list
which becomes a hash-table again when read.

I agree with you about treating Pascal's suggestions with some care,
occasionally stuff he says is crazy, but mostly he knows what he's
talking about.

0 new messages