consider reading the whole file into a string with READ-SEQUENCE, then
extract strings from it with SUBSEQ or with displaced arrays if they are
longish and would only create garbage.
| to collect the characters into a string current-token.
well, one option is to collect the characters into a list, and finally
(apply #'concatenate 'string <list>), but that may seem wasteful to many.
another option is to create an adjustable string with a fill pointer (use
MAKE-ARRAY) and use VECTOR-PUSH-EXTEND to deposit characters into the
string as you read them. for extra optimization, you can reuse the
buffer and reset the fill pointer after you have copied out and returned
the string you're interested in.
incidentally, I think a buffering protocol would be very useful in Common
Lisp streams, such that one could put a "mark" in a buffer and extract
the string from the mark to the current read point. this would have
saved me a lot of hassle in copying strings from input files, considering
that a significant cost of some file read operations are in the copying
of the characters, and optimizing for that cost can lead to weird code.
#:Erik
--
http://www.naggum.no/spam.html is about my spam protection scheme and how
to guarantee that you reach me. in brief: if you reply to a news article
of mine, be sure to include an In-Reply-To or References header with the
message-ID of that message in it. otherwise, you need to read that page.
I'd also like to see such a thing. From my point of view, it would be
an extension to "CLOS streams", and also provide a MOP hook into "the
reader algorithim" (i.e. the code that all Lisp implementations have
that implements the reader algorithm, and is called by such functions as
READ).
In my view, on measure of the quality of the design of such a protocol
would be that the protocol itself would not assume extended characters
(i.e. could be implemented efficiently in base-char-only Lisps), but
could be easily extended by the implementation or users to support BOTH
one-to-one Lisp <-> external-format mappings such as extended-char <->
UCS-2/UCS-4 AND one-to-many mappings such as extended-char <->
multi-byte/UTF-8. (The latter probably incorporating some
double-buffering protocol.)
Any ideas?