string-trim creates a copy of the string, as does subseq. So, you generate
at least 92 bytes for each line in copies of the string, which should be
about 7MB.
Then there is the princ/prin1 which are going to generate some amount of
garbage that is going to be implementation specific. I suspect this is
the source of more garbage than the string copies. You might try using
write-char, which I think is the most efficient portable method of writing,
and from looking at your code it should be relatively easy to do.
--
Christopher J. Vogt - Computer Consultant - Lisp, AI, Graphics, etc.
http://members.home.com/vogt/
He would need to tune the GC of CMU CL to some sane settings,
I guess. This would make GC not so painful.
How about the generational GC of a new CMU CL. Does it work
already? On which platforms?
Rainer> How about the generational GC of a new CMU CL. Does it work
Rainer> already? On which platforms?
I believe the generational GC is the default for Linux and FreeBSD for
the last few "releases". I don't think it's in the 18a release.
Generational GC exists only for the x86 platform. I'd like to try to
get it working on Solaris, but not yet.
Ray
> How about the generational GC of a new CMU CL. Does it work
> already? On which platforms?
In the cmucl-imp mailing list today, it was stated that the gengc is
available only on the x86 platforms and it has been the default for a year or
so.
Mike McDonald
mik...@mikemac.com
As outlined in an earlier response, the subseq creates a new sequence.
You could perhaps use a fixed size "buffer" and "fill" it from line using
replace.
(replace buffer line :start1 0 :end1 23 :start2 0 :end 23 )
(replace buffer line :start1 0 :end1 23 :start2 46 :end 69 )
which negates the copies that subseq would produce. Since my-trim
makes a copy (in most cases) ... you can use the same buffer for each.
To be safe a seperate buffer for each case.
--
Lyman S. Taylor "Computers are too reliable to replace
(ly...@cc.gatech.edu) humans effectively."
Commander Nathan Spring, "Starcops"
presuming it is only consing that causes the waste, let me try and
synthesize some of the other responses and add a little of my own:
| (do
| ((line (read-line input-file nil *eof*)(read-line input-file nil *eof*))
READ-LINE frequently conses much more than one might think. if you
allocate a string buffer (henceforth called <buffer>) to write into with
(make-array <some guess>
:element-type 'character
:adjustable t
:fill-pointer 0)
and fill it with some loop like
(loop initially (setf (fill-pointer <buffer>) 0)
for character = (read-char <stream> nil nil)
when (null character)
do <some non-local exit>
until (char= character #\newline)
do (vector-push-extend character <buffer>))
you will keep the same buffer all the time, and it will grow to
accomodate the longest string in a space-economical fashion.
| (spelling "")
| (pronounciation "")
| (word-class-code "")
| (verb-pattern ""))
these appear to be mere substrings into the line you just read, but you
don't use them for anything other than to cons up new strings. I'd do
away with these altogether and instead use indices into the buffer.
| (setf spelling (mytrim (subseq line 0 23))) ;; doc: 1-25
(position-if-not #'whitespace-char-p <buffer> :start 0 :end 23)
(position-if-not #'whitespace-char-p <buffer> :start 0 :end 23 :from-end t)
will produce the starting and ending positions of a substring of buffer
that is the same under EQUAL as your expensive solution.
| (nsubstitute
| #\Space
| #\,
| (subseq line 46 69)))) ;; doc: 51-70
SUBSTITUTE and friends take :START and :END arguments -- use them.
| (princ "(setf (gethash " output-file)
| (prin1 spelling output-file)
| (princ " " output-file)
| (princ 'dict output-file)
| (princ ") '(" output-file)
| (prin1 word-class-code output-file)
| (princ "))" output-file)
if you are merely going to write out substrings, you can use WRITE-STRING
and use its :START and :END arguments to delimit the substring written.
this will require that you check for backslashes and quotation marks in
the string written, and that might not be cost-effective. if you can
statically determine that they don't appear in the material, you can save
a lot this way, however. you can write (write-string " dict) '(" ...)
directly, since PRINC will print the symbol-name of a symbol.
if you want to use strings, I would instead go for displaced arrays into
the pre-allocated buffer, and then you can do something even nicer.
preallocate the variables with this general form:
(make-array 0
:adjustable t
:element-type 'character
:displaced-to <buffer>
:displaced-index-offset 0)
and construct the list to be printed like this once and for all, i.e.,
outside of the loop:
`(setf (gethash ,spelling dict) '(,word-class-code))
now _move_ the displaced array instead of consing a new array header for
`spelling' and `word-class-code'.
(adjust-array <array> (- <end> <start>)
:adustable t
:displaced-to <buffer>
:displaced-index-offset <start>)
this should not cons a new array header and the value _should_ be EQ to
the <array> argument. (I don't find any guarantees to this effect right
now, but it appears to have been the intention. you might want to
check.) now you can print this whole list with PRINT, and it will take
care of itself. I don't see any reason why printing this list should
cons.
glue this stuff together and let me know how it works...
#:Erik
--
http://www.naggum.no/spam.html is about my spam protection scheme and how
to guarantee that you reach me. in brief: if you reply to a news article
of mine, be sure to include an In-Reply-To or References header with the
message-ID of that message in it. otherwise, you need to read that page.