On Apr 19, 1996 17:40:14 in article <How to read lots of data from disk
quickly?>, 'Bruce L. Lambert, Ph.D. <lambe...@uic.edu>' wrote:
The answer depends on how much control you have over matters,
and how hard you are willing to work to speed things up.
For example, a significant amount of time is expended in
extracting numerical data from ASCII character strings. If
you could store the numbers in binary form, much time could
be saved by not doing the conversion as well as eliminating
the inherent inefficiencies in general-purpose routines such
If external constraints dictate text mode data, could you
at least write it in formatted; i.e., column-sensitive form?
If so, you could save some time by using READ-LINE and your
own custom integer extractors that convert text into numeric
from specified columns. I have written some that I could share.
An example of using one such would be:
(loop as data-line = (read-line infile nil nil)
as row = (extract-integer data-line 0 6) ;; cols 1-6
as col = (extract-integer data-line 7 13) ;; cols 8-13
as val = (extract-integer data-line 14 20)
(setf (gethash (cons row col) sim-table) val))
If you have no control over any of this, there's not much
you can do. (Somebody else's suggestion of eliminating
the intermediate variables will not make a noticeable
difference). You could easily eliminate the PEEK-CHAR call
but its effect is implementation-dependent and probably
will not make a difference worth beans.
- - - - -
Now here, a buch of improvements can be obtained; and,
the 'convolution' can definitely be diminshed.
First, if you have some idea of how big the array is
going to be, make it that size and save the overhead of
multiple GROW-ARRAY calls that are being made 'behind the
scenes'. If the range of values is great, waste some
space and initialize it to some reasonable size; e.g.,
1000. Alternately, and my favorite technique, is to
initially gather a list and then make an array out of
it -- see below for example.
Second, the PEEK-CHAR issue surfaces itself again. This
time, however, it's much more significant in that it
potentially gets called many, many times. The best (?
best is in the eyes of the beholder) method is to read
the entire line and process it in a more efficient manner.
This eliminates all of the peeking.
Third, although the PUSH-END works, it's potentially
somewhat inefficient. Each time you push an item to
the end of the list, you must traverse the list. If
your lists are lengthy, this adds up to a fair amount of
time. Better to build the list in reverse order, then
unreverse it just before storing into the array.
Here's how I would do it. The idea, as partially
outlined above, is to read a line at a time, creating
the list from the text line, then collecting a list
of lists. When done, build an array from the list.
It wastes a bit of cons space, but is potentially
a lot faster. Further improvements are also
possible, but at diminishing returns for the effort.
For example, I have a special read-line-into-reusable-
string that is noticeably faster, but only with large
(defun build-array (&optional (filename "e:\\data\\foo.bar"))
(let ((list-of-lists nil)
(with-open-file (f filename)
(loop as line = (read-line f nil nil)
as wordlist = (make-wordlist line)
(push wordlist list-of-lists)
;; you could eliminate the len var and just
;; take the length of the list instead.
(setf arr (make-array len))
(loop for wlist in list-of-lists
for i from (- len 1) by -1
(setf (aref arr i) wlist))
arr) ;; return the array to caller
(defun make-wordlist (text)
(loop with pos = 0
and list = nil
(multiple-value-bind (word npos)
(read-from-string text nil nil :start pos)
(return-from make-wordlist (nreverse list)))
(push word list)
(setf pos npos))))
Note: Tested on Allegro CL 3.0 on WinNT 3.51.
Software Engineering & development