In any case, here's a function to read in the indexes and values from a
sparse array into a hash table. Is there a more efficient way of going
about this.
(defun read-nonzero-lower-diagonal (fname)
(print "Loading similarity matrix...")
(let ((row-index 0)
(column-index 0)
(current-sim-value 0)
(sim-table (make-hash-table :test #'equal)))
(with-open-file (istream fname :direction :input)
(loop
(when (null (peek-char t istream nil nil)) (return))
(setf row-index (read istream nil nil))
(setf column-index (read istream nil nil))
(setf current-sim-value (read istream nil nil))
(setf (gethash (cons row-index column-index) sim-table)
current-sim-value)))
(print "Done loading similarity matrix")
sim-table))
This next one is supposed to read each line from a text file and return
it as a list of symbols (not as a string). It seems almost unbelievably
convoluted to me. Is there a better way?
(defun read-collection (filename)
(with-open-file (istream filename :direction :input)
(let ((collection (make-array 0 :adjustable t :fill-pointer t))
(current-clause '()))
(loop
(cond ((null (peek-char nil istream nil nil)) (return))
;;when next char is end of file return
((char= (peek-char nil istream) #\newline)
;;when next char is newline
(read-char istream)
;;pluck char off of istream
(vector-push-extend current-clause collection)
;;put the current clause on the end of the collection
(setf current-clause '()))
;;clear the contents of current clause
(t (loop
(if (or (null (peek-char nil istream nil nil))
;;if next char is end of file
(char= (peek-char nil istream) #\newline))
;;or newline
(return)
;;return
;;else push the next word onto the end of the
current clause
(push-end (read-preserving-whitespace istream
current-clause))))))
collection)))
The previous function uses a macro called push-end that a friend of mine
wrote. I's mighty convenient.
(defmacro push-end (object list-variable)
`(setf ,list-variable
(cond ((consp ,list-variable)
(nconc ,list-variable (list ,object)))
((null ,list-variable) (list ,object))
(t (format t "ERROR -- PUSH-END was given invalid
arguments")))))
This should make some difference on the first one:
(defun read-nonzero-lower-diagonal (fname)
(print "Loading similarity matrix...")
(let ((sim-table (make-hash-table :test #'equal)))
(with-open-file (istream fname :direction :input)
(loop
(when (null (peek-char t istream nil nil)) (return))
(setf (gethash (cons (read istream nil nil) (read istream nil nil)) sim-table)
(read istream nil nil)))
(print "Done loading similarity matrix")
sim-table))
On the second one, I would probably use a read-line to get a string then read-from-string.
I'm not sure that would be faster but it would be less convoluted.
It has seemed to me that Lisp's file reading functions are slow. This
probably depends on the Lisp implementation. If speed is important, you
might try another implementation.
--
Benjamin Shults Email: bsh...@math.utexas.edu
Department of Mathematics Phone: (512) 471-7711 ext. 208
University of Texas at Austin WWW: http://www.ma.utexas.edu/users/bshults
Austin, TX 78712 USA FAX: (512) 471-9038 (attn: Benjamin Shults)
| In the course of my information retrieval experiments, I often need to
| read large data files in from disk. I've got a couple of functions
| that do this successfully, but they seem awful damn slow to me. They
| also seem really convoluted for what should be a more straightforward
| task.
(this is not an answer to your question, but it could become one.)
the ANSI CL function `read-sequence' (`write-sequence') will replace
successive objects in a sequence (file) with objects in the file
(sequence). the efficiency gains hinge on equality of the element-type of
the stream and the sequence. `open' is not able to portable open a file
with anything but element-types that are subtypes of character, finite
subtypes of integer, signed-byte or unsigned-byte, barring an unusual
interpretation of :default.
would it be a non-conforming extension to pass floating-point types as the
element-type to `open' so one could do the following?
(let* ((type 'double-float)
(vector (make-array <dimensions>
:element-type type
:initial-element (coerce 0 type))))
(with-open-file (stream <filespec>
:element-type type)
(read-sequence vector stream))
vector)
IMHO, if this is a non-conforming extension, we have a problem. if it is a
conforming extension, vendors should be encouraged to supply ways to allow
files to handle user-specified extended element-types. it would perhaps
make sense to achieve consensus on the interface to such extensions.
(should this be posted to comp.std.lisp? or is the group defunct?)
#<Erik>
--
errare umanum hest