Good I/O performance

Tim Bradshaw

unread,

Apr 15, 1998, 3:00:00 AM4/15/98

to

Does anyone have a comparison of I/O performance for any reasonably
recent CL implementations? (Schemes would also be interesting).

This is largely to save me work, I admit, as I could write some test
programs.

One of the things I've found is that for many programs the fact that
good CL implementations are perhaps a bit slower than good C
implementations is dwarfed by the fact that the I/O in the Lisps is
spectacularly slow compared with C, or in fact something like perl.
If you want to write programs which deal with large amounts (Gb) of
data, this can be absolutely crippling.

The kind of problem I have is that we have some scheme (scsh) scripts
which run our backups. These write log files which are simply files
full of forms saying what was done when. I have another program (also
scheme) which will go through these files and tell me useful things
like when the last level x dump of partition y on machine z was, or
what the label of the last zero dump was. This program takes many
minutes to run, which is really embarrassing, as some perl script
which read a line-based format would do it in seconds. Of course,
this is scsh not a CL, and we're actually going to stop using these
scripts anyway, but this kind of problem is really symptomatic of the
kind of things that have cause me problems in the past.

I guess there are three areas which are interesting (to me):

How slow is READ, is there any trick to make it faster (snarf
the file into a string, read from the string?). Similar for PRINT.

How slow is READ-LINE, similar for line-based printing.

How slow is READ-SEQUENCE/WRITE-SEQUENCE, especially when
intermingled with jumping around the file.

(OK four areas)

Does the implementation provide any additional I/O stuff --
for instance a version of READ-LINE which would stuff the
results into an existing string (indicating overflow suitably)
might be useful (though it is implementable of course). What
about mmap?

If anyone has any figures I'd be very interested...

--tim

Rainer Joswig

unread,

Apr 15, 1998, 3:00:00 AM4/15/98

to

In article <ey3yax7...@haystack.aiai.ed.ac.uk>, Tim Bradshaw
<t...@aiai.ed.ac.uk> wrote:

> I guess there are three areas which are interesting (to me):
>
> How slow is READ, is there any trick to make it faster (snarf
> the file into a string, read from the string?). Similar for PRINT.
>
> How slow is READ-LINE, similar for line-based printing.
>
> How slow is READ-SEQUENCE/WRITE-SEQUENCE, especially when
> intermingled with jumping around the file.
>
> (OK four areas)
>
> Does the implementation provide any additional I/O stuff --
> for instance a version of READ-LINE which would stuff the
> results into an existing string (indicating overflow suitably)
> might be useful (though it is implementable of course). What
> about mmap?
>
> If anyone has any figures I'd be very interested...

See what CL-HTTP does. There might be some code you can steal
(like buffered IO, tuned for platforms, ...).

--
http://www.lavielle.com/~joswig/

Erik Naggum

unread,

Apr 15, 1998, 3:00:00 AM4/15/98

to

* Tim Bradshaw
| What about mmap?

yeah, I _so_ want to map a string onto a file. what I do now is this:

(defun map-file-to-string (pathname)
"Create a string that contains all the characters of a file."
;;this should have used a memory mapping function
(with-open-file (file pathname :direction :input)
(let ((string (make-array (file-length file)
:element-type (stream-element-type file)
#+allegro :allocation #+allegro :old)))
(if (= (length string) (read-sequence string file))
string
(error 'file-error
:pathname pathname
:format-control "~@<~S could not be mapped to a string.~:@>"
:format-arguments (list pathname))))))

the conditionalization on Allegro CL allocates the string in such a way
that it is not copied by the copying garbage collector, and that little
optimization sped up my application quite noticeably. it would be so
cool if there was a subtype of string that was really a file on disk
through the available operating system mechanisms. this _may_ be
obtained with various smart ways to return strings from some foreign
function to Lisp, but I'm not that brave, yet. in my dreams, one could,
say, open the file the normal way, and then a function FILE-CONTENTS
would return a (vector (unsigned-byte 8)) or a string that would be
mapped onto the file.

#:Erik
--
religious cult update in light of new scientific discoveries:
"when we cannot go to the comet, the comet must come to us."

Bruce Tobin

unread,

Apr 17, 1998, 3:00:00 AM4/17/98

to Tim Bradshaw

Tim Bradshaw wrote:

> One of the things I've found is that for many programs the fact that
> good CL implementations are perhaps a bit slower than good C
> implementations is dwarfed by the fact that the I/O in the Lisps is
> spectacularly slow compared with C, or in fact something like perl.
> If you want to write programs which deal with large amounts (Gb) of
> data, this can be absolutely crippling.

There is a text file on how to do fast Lisp I/O in the CMU repository:

http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/io/fast_io/fast_io.txt

The file is very short; a copy of an old post to this newsgroup. It includes
code that looks like it should work with ACL, CMU, Lucid, and Symbolics Lisps.
The author claims to be within a few percentage points of C programs using
getchar().

Does anyone know why the standard Lisp I/O functions tend to be so slow? Do
they need to be? Lisp would be a wonderful language for writing data
transformation programs if it weren't for this I/O problem.

re...@ai.mit.edu

unread,

Apr 17, 1998, 3:00:00 AM4/17/98

to

In article <3536B704...@infinet.com>,
Bruce Tobin <bto...@infinet.com> wrote:

> Does anyone know why the standard Lisp I/O functions tend to be so slow? Do
> they need to be? Lisp would be a wonderful language for writing data
> transformation programs if it weren't for this I/O problem.

For per-byte or per-character I/O, the answer is usually that there are more
layers of abstraction in the implementation, i.e. you are getting a character
from a stream rather than just a buffer of 8-bit bytes. The latter can be
open-coded routinely, but the former requires function calls, possibly
multiple levels of them and possibly with generic dispatches thrown in, so
the overhead per byte/character goes way up. Apple Dylan addressed
this by sealing the stream-related classes so that even the generic case could
open-code, and got C-equivalent performance.

For I/O performed in bigger chunks, the answer is usually that the data needs
to be copied (often more than once). Symbolics's lisp has an in-place I/O
call (:read-input-buffer, used in the CMU example file), which returns the
operating system's buffer along with start and limit indices, so that user
code could operate directly on it without copying, but Common Lisp didn't
adopt anything like it. I do remember implementing a buffered read under MCL
using the MacOS calls directly and a non-lisp buffer, and it was as fast as
I/O in C as well.

Kalman Reti

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/ Now offering spam-free web-based newsreading

Rainer Joswig

unread,

Apr 17, 1998, 3:00:00 AM4/17/98

to

In article <6h71rk$pej$1...@nnrp1.dejanews.com>, re...@ai.mit.edu wrote:

> > Does anyone know why the standard Lisp I/O functions tend to be so slow? Do
> > they need to be? Lisp would be a wonderful language for writing data
> > transformation programs if it weren't for this I/O problem.
>
> For per-byte or per-character I/O, the answer is usually that there are more
> layers of abstraction in the implementation, i.e. you are getting a character
> from a stream rather than just a buffer of 8-bit bytes. The latter can be
> open-coded routinely, but the former requires function calls, possibly
> multiple levels of them and possibly with generic dispatches thrown in, so
> the overhead per byte/character goes way up.

MCL has for this a way to get the STREAM-READER and STREAM-WRITER.

> For I/O performed in bigger chunks, the answer is usually that the data needs
> to be copied (often more than once). Symbolics's lisp has an in-place I/O
> call (:read-input-buffer, used in the CMU example file), which returns the
> operating system's buffer along with start and limit indices, so that user
> code could operate directly on it without copying, but Common Lisp didn't
> adopt anything like it. I do remember implementing a buffered read under MCL
> using the MacOS calls directly and a non-lisp buffer, and it was as fast as
> I/O in C as well.

Well, one would need to agree on a CLOS based streams implementation.
Then one would need to have a common way to do fast IO
within this model.

Improving out-of-the-box I/O performance of plain Common Lisp
is needed (IMHO).

--
http://www.lavielle.com/~joswig/

Erik Naggum

unread,

Apr 17, 1998, 3:00:00 AM4/17/98

to

* Kalman Reti

| For I/O performed in bigger chunks, the answer is usually that the data
| needs to be copied (often more than once). Symbolics's lisp has an
| in-place I/O call (:read-input-buffer, used in the CMU example file),
| which returns the operating system's buffer along with start and limit
| indices, so that user code could operate directly on it without copying,
| but Common Lisp didn't adopt anything like it.

what about READ-SEQUENCE and WRITE-SEQUENCE?

I have found READ-SEQUENCE and WRITE-SEQUENCE to give reasonably good
performance, and they can indeed overwrite existing buffers. however, I
did notice a factor of four decrease in system CPU time when I tuned the
internal buffer size to equal a disk block. in Allegro CL:

(setf excl::stream-buffer-size 8192)

this also helped a little with the speed of ordinary I/O, but it was
obviously drowned by more expensive operations.

also important: in Allegro CL, one can allocate large buffers in "old
space" and avoid the copying garbage collector overhead, like I did in
the just posted MAP-FILE-TO-STRING, so the GC overhead is eliminated.
this makes READ-SEQUENCE an even better choice.

#:Erik
--
on the Net, _somebody_ always knows you're a dog.

Steve Gonedes

unread,

Apr 18, 1998, 3:00:00 AM4/18/98

to

Sam Steingold <s...@usa.net> writes:

< >>>> In a very interesting message <31017936...@naggum.no>
< >>>> Sent on 17 Apr 1998 09:21:34 +0000
< >>>> Honorable Erik Naggum <cle...@naggum.no> writes
< >>>> on the subject of "Re: Good I/O performance":

< >>
< >> what about READ-SEQUENCE and WRITE-SEQUENCE?
<

< yeah, what about them?
< Suppose I have to save and restore large lists.
< Currently I just WRITE them into files and then READ them back.
< Will I win/lose anything by switching to READ-SEQUENCE/WRITE-SEQUENCE?

I don't know about large lists; either way you look at it, your still
dealing with large lists. I usually just use pprint and read for that.

A while ago I wanted to save some structs (that contained
#<hashtables>) to a file. I came up with a really gross hack which I
hardly understand, but seems to work (not positive it's correct
though).

Here is a quick example of something close to what I did.

(defstruct data
description ; string - a description
stuff ; hash of some useless stuff
)

(defmethod make-load-form ((s data) &optional environment)
(declare (ignore environment))
;; Setup a method to load DATA structs?
;; This seems like it could be more complicated than
;; define-setf-method, but I really don't know anything about
;; CLOS so I can't really say for sure. I only seem to get a
;; headache from reading up on it.
(make-load-form-saving-slots s))

(defun fill-hash (hash)
"Fill hashtable HASH with useless data returning HASH"
(loop for i from 1 upto 100 do
(setf (gethash (format nil "~R" i) hash) i))
hash)

(let ((*stuff* (make-data
:description "Useless stuff"
:stuff (fill-hash (make-hash-table :test #'equal)))))
;; from what I can tell *stuff* must be special
;; because the compiler will not evaluate it? even with a #.?
;; It seems like the compiler tosses our lexical little world,
;; how else to tell her *stuff* is here?
(declare (special *stuff*))
(with-open-file
(output "out.data" :direction :output :if-exists :supersede)
;; save a parameter - ensure no compiler/package warnings
;; This is really gross, now that I'm thinking of it
(format output "(in-package #.(package-name *package*))~%~
(defparameter *table* #.*stuff*)"))
;; save the current value of *stuff*
(compile-file "out.data"))

I am almost certain there is a better way to do this, I just haven't
needed to do anything like this very often. Definitely a little bit
different than dumping structs in C.

But if you're using lists, this kludge probably isn't necessary...

Erik Naggum

unread,

Apr 18, 1998, 3:00:00 AM4/18/98

to

* Steve Gonedes

| I don't know about large lists; either way you look at it, your still
| dealing with large lists. I usually just use pprint and read for that.

I don't know how wide-spread such functions are, but Allegro CL gives
access to the FASL reader and writer, as well. the exported symbols from
the EXCL package are FASL-OPEN, FASL-READ and FASL-WRITE. I have not
used them much, but they can save a bunch of otherwise hard-to-save
structures, including hash tables, numbers as machine readable objects,
etc, and read them back in again a lot faster than anything else you can
write yourself. don't have any speed statistics, though.

Rob Warnock

unread,

Apr 18, 1998, 3:00:00 AM4/18/98

to

Erik Naggum <cle...@naggum.no> wrote:
+---------------
| ... it would be so

| cool if there was a subtype of string that was really a file on disk
| through the available operating system mechanisms. this _may_ be
| obtained with various smart ways to return strings from some foreign
| function to Lisp, but I'm not that brave, yet. in my dreams, one could,
| say, open the file the normal way, and then a function FILE-CONTENTS
| would return a (vector (unsigned-byte 8)) or a string that would be
| mapped onto the file.

+---------------

What an amazing coincidence! I was just mulling over similar ideas today as
possible solutions to some abyssmal performance problems in a Scheme-based
mail filter program I'm hacking on. Clearly the "obvious" approach is to
mmap the file (in Unix, at least) and to create [unfortunately, by hacking
into the internals of the implementation] am "indirect" flavor of string
that when garbage collected would un-mmap the file.

And then once you have such "indirect" strings, it should be nearly trivial
to then add shared-structure substrings, yes? (...provided that the base
string is read-only, to prevent side-effects.)

-Rob

-----
Rob Warnock, 7L-551 rp...@sgi.com http://reality.sgi.com/rpw3/
Silicon Graphics, Inc. Phone: 650-933-1673 [New area code!]
2011 N. Shoreline Blvd. FAX: 650-933-4392
Mountain View, CA 94043 PP-ASEL-IA

Kelly Murray

unread,

Apr 20, 1998, 3:00:00 AM4/20/98

to

> The kind of problem I have is that we have some scheme (scsh) scripts
> which run our backups. These write log files which are simply files
> full of forms saying what was done when. I have another program (also

Another case calling for Persistent Objects.
The byte-level or Lisp READ I/O speed issue simply disappears
into memory of the "good old days"...

-Kelly Murray k...@franz.com
http://charlotte.franz.com/silkos -- Lisp for the 00's