(defvar *game-hash* (make-hash-table :test 'equal))
(defstruct gamedata
foo bar ... slots)
(dolist (game games)
(let (( game (make-gamedata))
( parsing vars ))
...parse 3 xml files, stuff data into structure 'game'
...including identifying hash-key - which happens to be a string, e.g. "2005/09/12/sdnmlb-sfnmlb-1"
...store structure game into *game-hash* with key
)) ; end of 'dolist
that picks up the first layer of indexing, the gathered info then is
utilized for a second layer of dependent xml files. I'll stuff more
data into 'gamedata'. I can use the hash-key to pull out the game of
interest and since it is a hash-table, I can do that
quickly....fastest?
After all that work, I think saving the result is a good idea. Saving
it in the filesystem at the head of the local tree of XML files seems
best. That way, I can check that it exists, load it, read it, modify
it and put it back into the fs when done. Since this is daily data,
and various incremental searches and combinations might need to be
constructed, something simple that handles a single hash-table seems
appropriate.[1]
Looking at the Persistence and Serialization packages leaves me a
little lost. I strongly prefer something that runs inside CMUCL and
doesn't need a bunch of support packages too. Something very reliable
that clueless noobs can have running in seconds. Speed really isn't
of any concern, just be faster that a full reparse of the XML.
Recommendations? Extra credit for packages with sample code
TIA
[1] the Wikipedia entry for Serialization has a subsection for Common
Lisp that sorta sucks compared to the others.
--
"Most programmers use this on-line documentation nearly all of the
time, and thereby avoid the need to handle bulky manuals and perform
the translation from barbarous tongues." CMU CL User Manual
I looked at CL-Store, it seems that I would need to maintain any
structures in three places, once at definition, once at storage (which
seems to be slot by slot) and again at restore.
PLOB - "The inability to licence POSTORE essentially orphans PLOB!"
Elephant didn't work on cmucl last time I tried.
Perec was SBCL specific and I think only worked with a database.
---
Most of the packages I look at sound like global stores only, I
want/need multiple local stores (about 180 per year of interest).
Well there are about 410 XML files involved in a single day. There
are a lot of overlapping data-keys in all those files, and of course
they don't have identical values. So I need to only pick out some
data for each file, i.e. I have custom readers for them. Since speed
once the data is collected becomes an issue, I'm avoiding objects for
now, since I don't need anything more than the features structure
offer. If XMLisp will create structures instead of objects, then it
would probably be useful to employ it for reading the files. I can
always copy slots if overall it is simpler code... I suppose I could
do that for objects -> structures too.
I see that XMLisp isn't run on CMUCL however...
Doesn't that require the object to be printable? I would have a
Structure that will have structures in some of the slots. Doesn't
sound printable. One of the other simple schemes I recall was to mmap
the 'object' (in this case a hash-table) and just do a copy/load from
memory to a file. I'll track that stuff down again, maybe I will
understand it this time.
|> Re-reading your original message, I understand you do not want go the
|> XML route. Perhaps you don't need a serialization package and just
|> something like
|>
|> <URL:http://paste.lisp.org/display/898>
|
| Doesn't that require the object to be printable?
No. The hashtable/structure is never printed. The fasl file is created
from the read time value of a variable bound to the object. Thats the
trick.
Now there are other issues you'd have to take care of such as ensuring
the environment in which the object in the fasl file can be read back
in. But this is not an issue.
| I would have a Structure that will have structures in some of the
| slots. Doesn't sound printable.
I'd recommend you try it before dissing it :) -- You dont have
portability requirements and you can recreate the dump on demand. It is
unlikely to get simpler than this
| One of the other simple schemes I recall was to mmap the 'object' (in
| this case a hash-table) and just do a copy/load from memory to a file.
| I'll track that stuff down again, maybe I will understand it this
| time.
--
Madhu
see http://www.lispworks.com/documentation/HyperSpec/Body/v_pr_rda.htm#STprint-readablyST
and http://www.lispworks.com/documentation/HyperSpec/Body/v_pr_cir.htm
I didn't ever checked this in any implementation though.
> Doesn't that require the object to be printable?
Structures are printable/readable: http://www.lispworks.com/documentation/HyperSpec/Body/02_dhm.htm
Wrong. Please look more carefully at what you're advicing :)
Hmmm, I'm confused, I don't understand what would require you to do
manual structure maintenance.
cl-store should `give back` the hash-table with structures which are
roughly equivalent, somewhat like pythons 'pickle'.
If it's behaving in a different manner then it's a bug.
I used, almost, this exact same approach when storing data about a
parsed spam corpus for my spam filter.
sean.
Oops, my fault :)
(with-open-file (f tmp :direction :output :if-exists :supersede)
(format f "(cl:setf binstore::*hook #.binstore::*hook)"))
?
The Cl-Store examples actually call out the slot names when storing.
cl-user(1): (defclass foo ()
((bar :accessor bar :initarg :bar)))
cl-user(2): (cl-store:store (list (find-class 'foo)
#'bar
(make-instance 'foo :bar "bar"))
"test.out")
I looked at that and thought, "oh no, I need to list the slot names?"
I realized I could store anything if I disassembled it into components
first, saw that CL-Store saves structure definitions and decided that
a workable approach to serialization is divide-and-conqueor as long as
you stored details on re-assembly too. I keep thinking that the
BerkeleyDB code is what I need and that is distracting.
> cl-store should `give back` the hash-table with structures which are
> roughly equivalent, somewhat like pythons 'pickle'.
> If it's behaving in a different manner then it's a bug.
Well that brings up something I was just thinking about. In order to
bring back a hash-table, did you declare a hash-table type? How else
would you re-obtain a hash-table?
> I used, almost, this exact same approach when storing data about a
> parsed spam corpus for my spam filter.
Well that is very encouraging.
Yes, I will writeup a sample case with a Structure some of whose slots
are hash-tables of other structures (the game players) and try that
out. At least that will provide some code to discuss further and
point out errors.
;;; modified from http://paste.lisp.org/display/898/
;;; not tested at all! might even be unparseable!!!
(defpackage :binstore
(:use :cl)
(:export
#:bindump
#:binload))
(defvar *hook*)
(defun bindump (object pathname)
"Dumps OBJECT into as a FASL-file designated by PATHNAME."
(let ((tmp (make-pathname :type "lisp" :defaults pathname)))
(unwind-protect
(let ((*hook* object))
(with-open-file (f tmp :direction :output :if-
exists :supersede)
(format f "(cl:setf binstore::*hook*
'#.binstore::*hook*)~%"))
(compile-file tmp :output-file pathname))
(delete-file tmp)))
pathname)
(defun binload (pathname)
"Loads an object dumped by BINDUMP from PATHNAME."
(let ((*hook* nil))
(load pathname)
*hook*))
Well, not exactly. This example stores a list of 3 objects. The class
named FOO the generic-function
named BAR (although you can't really consider this to be serialized,
all that is saved is the name) and
an instance of FOO with the slot BAR set to "bar".
An equally valid example would be.
> (defclass foo () ((foo :initform nil :initarg :foo)))
#<STANDARD-CLASS FOO 20097813>
;; serialize an instance into a file.
> (cl-store:store (make-instance 'foo :foo 3) "/tmp/my-instance")
#<FOO 200E0867>
;; then you can restore the instance using.
> (describe (cl-store:restore "/tmp/my-instance"))
#<FOO 200D9963> is a FOO
FOO 3
Of course you wont get back an eq/eql/equal object but rather an
instance that can be considered similar.
sean.
Well, I wrote up a sample and tried a suggestion.
I had suspected that that suggestion wouldn't work and so far, it has not worked
Any pointers?
Sample Code:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; code posted at http://paste.lisp.org/display/74033
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Sample Code GOAL:
; put some non-simple data into a hash table
; store the hash table in the filesystem
;---
; recover the hash table from the filesystem
; verify data
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defpackage :dribble
(:use :cl))
(in-package :dribble)
(defvar *games-hash* (make-hash-table :test 'equal))
(defstruct (big-struct (:conc-name bs-))
gametag
hometeam
(players (make-hash-table)))
(defstruct player
pid
name
performance)
; these are parsed from the multitude of XML files
(defvar *players* '((4532 "Curtis" 30.5) (11283 "Jackson" 15.3)
(10605 "Celek" 8.3) (3619 "Westbrook" 13) (9695 "Avant" 11.5)
(9859 "Baskett" 14) (2670 "Buckhalter" 12) (4519 "Smith" 5)
(5412 "Lewis" -2) (5528 "Fitzgerald" 16.9) (4512 "Boldin" 8.5)
(9658 "Pope" 10.5) (5286 "Urban" 18) (8458 "Arrington" 16)
(1755 "James" 16) (10585 "Breaston" 10) (11383 "Hightower" 8)
(1682 "Warner" 4)))
(defvar *teamnames* '("Buffalo" "Chicago" "Cincinnati" "Cleveland"
"Dallas" "Denver" "Detroit" "Houston" "Jacksonville" "Oakland"
"Seattle" "Washington"))
; make up some games and players
(defun daze-games() "a game for every team"
(let ((count 0))
(dolist (team *teamnames*)
(setf count (1+ count))
(let* ( (tag (format nil "~@R" count))
(gameday (make-big-struct :gametag tag :hometeam team)) )
(game-players gameday)
(setf (gethash tag *games-hash*) gameday)))))
(defun game-players (game-struct) "make a random collection of players for the game"
(dolist (jock *players*)
(when (> 0.5 (random 1.0))
(let ( (rookie (make-player))
(player-ht (bs-players game-struct)) )
(destructuring-bind (key name outcome) jock
(setf (player-pid rookie) key)
(setf (player-name rookie) name)
(setf (player-performance rookie) outcome)
(setf (gethash key player-ht) rookie))))))
(daze-games)
; code such as
; (gethash 4512 (bs-players (gethash "II" *games-hash*)))
; extracts the data
(binstore:bindump *games-hash* "/tmp/dribble.ht")
; oh-oh
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; from http://paste.lisp.org/display/898
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;; Untested in this form, but the idea is sound FSVO sound.
;;;; Slightly hacked from almost identical code by Mario Mommer on cll.
(defpackage :binstore
(:use :cl)
(:export
#:bindump
#:binload))
(in-package :binstore)
(defvar *hook*)
(defun bindump (object pathname)
"Dumps OBJECT into as a FASL-file designated by PATHNAME."
(let ((tmp (make-pathname :type "lisp" :defaults pathname)))
(unwind-protect
(let ((*hook* object))
(with-open-file (f tmp :direction :output :if-exists :supersede)
(with-standard-io-syntax
(format f "(~S ~S '#.~S)~%"
'cl:setf 'binstore::*hook* 'binstore::*hook*)))
(compile-file tmp :output-file pathname))
(delete-file tmp)))
pathname)
(defun binload (pathname)
"Loads an object dumped by BINDUMP from PATHNAME."
(let ((*hook* nil))
(load pathname)
*hook*))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
==================================================================
;;;; (binstore:bindump *games-hash* "/tmp/dribble.ht") ...
; Python version 1.1, VM version Intel x86 on 22 JAN 09 09:40:14 am.
; Compiling: /tmp/dribble.lisp 22 JAN 09 09:40:14 am
; Compiling Creation Form for #<HASH-TABLE :TEST EQUAL :WEAK-P NIL :COUNT 12 {58505775}>:
;
;
; File: /tmp/dribble.lisp
; In: SETF BINSTORE::*HOOK*
; '#<HASH-TABLE :TEST EQUAL :WEAK-P NIL :COUNT 12 {58505775}>
; --> PROGN LISP::STUFF-HASH-TABLE
; ==>
; '(("XII" . #) ("XI" . #) ("X" . #) ("IX" . #) ("VIII" . #) ...)
; Error: (while making load form for #S(BIG-STRUCT :GAMETAG "XII" :HOMETEAM "Washington" :PLAYERS #<HASH-TABLE :TEST EQL :WEAK-P NIL :COUNT 10 {587C75FD}>))
;
; Error in function KERNEL:MAKE-STRUCTURE-LOAD-FORM: Structures of type BIG-STRUCT cannot be dumped as constants.
;
; --> PROGN
; ==>
; (LISP::STUFF-HASH-TABLE #<HASH-TABLE :TEST EQUAL :WEAK-P NIL :COUNT 12 {58505775}> '(# # # # # ...))
; Note: The second argument never returns a value.
;
; Compiling Init Form for #<HASH-TABLE :TEST EQUAL :WEAK-P NIL :COUNT 12 {58505775}>:
; Byte Compiling Top-Level Form:
; Compilation unit finished.
; 1 error
; 1 note
; /tmp/dribble.ht written.
; Compilation finished in 0:00:01.
| Well, I wrote up a sample and tried a suggestion.
| I had suspected that that suggestion wouldn't work and so far, it has
| not worked
Your code works as expected in Lispworks. In CMUCL the error
| ; Error in function KERNEL:MAKE-STRUCTURE-LOAD-FORM: Structures of
| type BIG-STRUCT cannot be dumped as constants. ;
Should have given a clue: You can read up on MAKE-LOAD-FORM in the ANS
It works here after defining
(defmethod make-load-form ((obj big-struct) &optional environment)
(make-load-form-saving-slots obj :environment environment))
(defmethod make-load-form ((obj player) &optional environment)
(make-load-form-saving-slots obj :environment environment))
before dumping.
--
Madhu
| Well, I wrote up a sample and tried a suggestion.
| I had suspected that that suggestion wouldn't work and so far, it has
| not worked
[...]
| ; Error in function KERNEL:MAKE-STRUCTURE-LOAD-FORM: Structures of
| type BIG-STRUCT cannot be dumped as constants. ;
Should have given a clue: You can read up on MAKE-LOAD-FORM in the ANS
Not without your pointer, I'm too much of a CLOS newbie. The full
version of this code is a learning experience for CLOS (one of many
I'm sure).
> It works here after defining
>
> (defmethod make-load-form ((obj big-struct) &optional environment)
> (make-load-form-saving-slots obj :environment environment))
>
> (defmethod make-load-form ((obj player) &optional environment)
> (make-load-form-saving-slots obj :environment environment))
>
> before dumping.
OK, that means I need one for every structure in the full code.
> --
> Madhu
Thank you!
r
I needed a tag to indicate what the file was, I'll just move it into
the filename.
> ;; It would have helped if you gave a fixed test case in addition to
> ;; random data
> (defun get-some-player (games-hash)
> (loop for x being each hash-key of (bs-players (gethash "II" games-hash))
> using (hash-value v)
> return v))
>
> * (daze-games)
If you run (daze-games) a few times, the player hash-table is then
full since there isn't any hash-clearing going on. I needed the
random effects for other testing, the overall data structure design is
new to me.
> * (binstore:bindump *games-hash* "/tmp/dribble") ; dumps /tmp/dribble.x86f
> * (setq $pid (player-pid (get-some-player *games-hash*))) ;=> 9658
> * (gethash $pid (bs-players (gethash "II" *games-hash*)))
> ;;=> #S(PLAYER :PID 4519 :NAME "Smith" :PERFORMANCE 5), T
>
> ;; maybe in another lisp
>
> * (setq $reconstructed-hash (binstore:binload "/tmp/dribble"))
> ;; loads the fasl file
> * (gethash $pid (bs-players (gethash "II" $reconstructed-hash)))
> ;;=> #S(PLAYER :PID 4519 :NAME "Smith" :PERFORMANCE 5), T
> --
> Madhu
Thanks again for your help.
cl-store does exactly what you appear to be looking for, provided you
are not
wed to dumping the file in fasl format
* (cl-store:store *games-hash* "/tmp/dribble.ht")
#<HASH-TABLE :TEST EQUAL :COUNT 12 {1236B871}>
* (gethash 4519 (bs-players (gethash "I" (cl-store:restore "/tmp/
game"))))
#S(PLAYER :PID 4519 :NAME "Smith" :PERFORMANCE 5)
T
....................................................................^
>
> #S(PLAYER :PID 4519 :NAME "Smith" :PERFORMANCE 5)
> T
I have two versions of the sample code, one for CL-STORE. The file
format isn't important, and I will be trying the CL-STORE solution
later when the XML parsing gets to me. The test code randomization is
part of the 'speed trials', so I can run both versions against the
clock (I want to know what does the additional flexibility of CL-STORE
cost). CL-STORE looks to be an old and well tested set of code, since
I found references about it dating back about a decade. I wouldn't
ignore that.
Well, the CL-STORE solution is much better in storage.
-rw-r--r-- users 191536 Jan 24 10:06 dribble-ht.x86f
-rw-r--r-- users 5080 Jan 26 09:58 dribble.ht
Looking into the bindump version at 191k shows a lot of repetition, it
appears that every element is wrapped in a descriptive enviroment.
I'd expect that once a recovered hash-table is reloaded, any CL-STORE
or bindump differences would disappear. Since save/reload are suppose
to be one-shot operations, the space savings is the important factor.
This is certainly looking a lot better then the current SQL versions.
Thanks again to both of you for your help.
--
Lisp : a multi-fetish language