Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

beginner question on input from large files

6 views
Skip to first unread message

Joseph Dale

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to
I'm trying to write a function that takes the name of a dictionary file
containing one word per line and returns the contents of that file as a
list of strings. The latest of several versions is this one (I know it's
not very elegant):

(defun get-words (filename)
(let ((*words* '()))
(defun read-loop (stream)
(setf *words* (cons (read-line stream nil 0) *words*))
(if (numberp (first *words*))
(rest *words*)
(read-loop stream)))
(with-open-file (stream filename :direction :input)
(read-loop stream))))

The function works with files containing less than around 10,000 words,
but no matter what I do, whenever I call this function with a file
containing more words, I get a stack overflow or a core dump. How can I
do this with larger files? I feel like there must be a better way.

Thank you for your time.

Joe

Marco Antoniotti

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to

Joseph Dale <jd...@uclink4.berkeley.edu> writes:

It looks like your CL system cannot do teil recursion optimization.

Anyway, a non-recursive version is here

(defun get-words (filename)
(with-open-file (f filename :direction :input)
(loop for line = (read-line f nil '$$get-word-eof$$)
when (eq line '$$get-word-eof$$)
do (return words)
collect line into words)))

Cheers


--
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - 06 68 10 03 17, fax. +39 - 06 68 80 79 26
http://www.parades.rm.cnr.it/~marcoxa

Marco Antoniotti

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to

Marco Antoniotti <mar...@parades.rm.cnr.it> writes:

> Joseph Dale <jd...@uclink4.berkeley.edu> writes:
>

> It looks like your CL system cannot do teil recursion optimization.
>
> Anyway, a non-recursive version is here
>
> (defun get-words (filename)
> (with-open-file (f filename :direction :input)
> (loop for line = (read-line f nil '$$get-word-eof$$)
> when (eq line '$$get-word-eof$$)
> do (return words)
> collect line into words)))
>

Since we are at it....

(defun get-words (filename)
(with-open-file (f filename :direction :input)

(labels ((read-loop (f lines)
(let ((line (read-line f nil '$$get-word-eof$$)))
(if (eq line '$$get-word-eof$$)
lines
(read-loop f (cons line lines))))))
(nreverse (read-loop f ())))))

Of course, if your CL compiler does not do tail recursion elimination,
this second function will stack overflow on large files.

Joseph Dale

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to
Marco Antoniotti wrote:
>
> Joseph Dale <jd...@uclink4.berkeley.edu> writes:
>
> > I'm trying to write a function that takes the name of a dictionary file
> > containing one word per line and returns the contents of that file as a
> > list of strings. The latest of several versions is this one (I know it's
> > not very elegant):
> >
> > (defun get-words (filename)
> > (let ((*words* '()))
> > (defun read-loop (stream)
> > (setf *words* (cons (read-line stream nil 0) *words*))
> > (if (numberp (first *words*))
> > (rest *words*)
> > (read-loop stream)))
> > (with-open-file (stream filename :direction :input)
> > (read-loop stream))))
> >
> > The function works with files containing less than around 10,000 words,
> > but no matter what I do, whenever I call this function with a file
> > containing more words, I get a stack overflow or a core dump. How can I
> > do this with larger files? I feel like there must be a better way.
> >
> > Thank you for your time.
>
> It looks like your CL system cannot do teil recursion optimization.

This is why I'm am in such shock, because neither this, nor any other
tail-recursive version of my function, seems to work in either CLISP or
ACL (on Red Hat 6.1). I was under the impression that these systems
would optimize tail-recursion. The non-recursive version is nice, but I
was really hoping to do it recursively.

>
> Anyway, a non-recursive version is here
>
> (defun get-words (filename)
> (with-open-file (f filename :direction :input)
> (loop for line = (read-line f nil '$$get-word-eof$$)
> when (eq line '$$get-word-eof$$)
> do (return words)
> collect line into words)))
>

Jason Trenouth

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to
On Thu, 10 Feb 2000 14:13:41 GMT, Joseph Dale <jd...@uclink4.berkeley.edu>
wrote:

> This is why I'm am in such shock, because neither this, nor any other
> tail-recursive version of my function, seems to work in either CLISP or
> ACL (on Red Hat 6.1). I was under the impression that these systems
> would optimize tail-recursion. The non-recursive version is nice, but I
> was really hoping to do it recursively.

Try

(PROCLAIM (OPTIMIZE (DEBUG 0)))

as Common Lisp systems sometimes have a 'factory setting' that turns off TRO to
assist debugging.

__Jason

Joseph Dale

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to


??????

In ACL (trial, by the way):

USER(1): (PROCLAIM (OPTIMIZE (DEBUG 0)))
Error: attempt to call `OPTIMIZE' which is an undefined function.
[condition type: UNDEFINED-FUNCTION]

Restart actions (select using :continue):
0: Try calling OPTIMIZE again.
1: Return a value instead of calling OPTIMIZE.
2: Try calling a function other than OPTIMIZE.
3: Setf the symbol-function of OPTIMIZE and call it again.
4: Return to Top Level (an "abort" restart)
5: Abort #<PROCESS Initial Lisp Listener>


In CLISP:

[1]> (PROCLAIM (OPTIMIZE (DEBUG 0)))

*** - EVAL: the function OPTIMIZE is undefined

Michael Kappert

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to

Joseph Dale wrote:
>
> Marco Antoniotti wrote:
> >
> > Joseph Dale <jd...@uclink4.berkeley.edu> writes:
> >
> > > I'm trying to write a function that takes the name of a dictionary file
> > > containing one word per line and returns the contents of that file as a
> > > list of strings. The latest of several versions is this one (I know it's
> > > not very elegant):
> > >
> > > (defun get-words (filename)
> > > (let ((*words* '()))
> > > (defun read-loop (stream)

As a side note: Although this looks like a local function definition, it isn't.
read-loop will only be defined when get-words ist first _executed_.
After that, read-loop will be _globally_ defined.

You may want to take a look at FLET and LABELS.

> > > (setf *words* (cons (read-line stream nil 0) *words*))
> > > (if (numberp (first *words*))
> > > (rest *words*)
> > > (read-loop stream)))
> > > (with-open-file (stream filename :direction :input)
> > > (read-loop stream))))
> > >
> > > The function works with files containing less than around 10,000 words,
> > > but no matter what I do, whenever I call this function with a file
> > > containing more words, I get a stack overflow or a core dump. How can I
> > > do this with larger files? I feel like there must be a better way.
> > >
> > > Thank you for your time.
> >
> > It looks like your CL system cannot do teil recursion optimization.
>

> This is why I'm am in such shock, because neither this, nor any other
> tail-recursive version of my function, seems to work in either CLISP or
> ACL (on Red Hat 6.1). I was under the impression that these systems
> would optimize tail-recursion. The non-recursive version is nice, but I
> was really hoping to do it recursively.

Hmmm... quite sure that ACL does optimize tail recursion.
Maybe read-loop was not compiled?


Regards,
Michael.

--
Michael Kappert
Fraunhofer IITB
Fraunhoferstr. 1 Phone: +49(0)721/6091-477
D-76131 Karlsruhe, Germany EMail: k...@iitb.fhg.de

Joseph Dale

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to
Joseph Dale wrote:
>
> Jason Trenouth wrote:
> >
> > On Thu, 10 Feb 2000 14:13:41 GMT, Joseph Dale <jd...@uclink4.berkeley.edu>
> > wrote:
> >
> > > This is why I'm am in such shock, because neither this, nor any other
> > > tail-recursive version of my function, seems to work in either CLISP or
> > > ACL (on Red Hat 6.1). I was under the impression that these systems
> > > would optimize tail-recursion. The non-recursive version is nice, but I
> > > was really hoping to do it recursively.
> >
> > Try
> >
> > (PROCLAIM (OPTIMIZE (DEBUG 0)))
> >
> > as Common Lisp systems sometimes have a 'factory setting' that turns off TRO to
> > assist debugging.
> >
> > __Jason
>
> ??????
>
> In ACL (trial, by the way):
>
> USER(1): (PROCLAIM (OPTIMIZE (DEBUG 0)))
> Error: attempt to call `OPTIMIZE' which is an undefined function.
> [condition type: UNDEFINED-FUNCTION]
>
> Restart actions (select using :continue):
> 0: Try calling OPTIMIZE again.
> 1: Return a value instead of calling OPTIMIZE.
> 2: Try calling a function other than OPTIMIZE.
> 3: Setf the symbol-function of OPTIMIZE and call it again.
> 4: Return to Top Level (an "abort" restart)
> 5: Abort #<PROCESS Initial Lisp Listener>
>
> In CLISP:
>
> [1]> (PROCLAIM (OPTIMIZE (DEBUG 0)))
>
> *** - EVAL: the function OPTIMIZE is undefined


(proclaim '(optimize (debug 0))) is accepted, but the function still
doesn't work.

Joseph Dale

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to
Marco Antoniotti wrote:
>
> Since we are at it....
>
> (defun get-words (filename)
> (with-open-file (f filename :direction :input)
> (labels ((read-loop (f lines)
> (let ((line (read-line f nil '$$get-word-eof$$)))
> (if (eq line '$$get-word-eof$$)
> lines
> (read-loop f (cons line lines))))))
> (nreverse (read-loop f ())))))
>
> Of course, if your CL compiler does not do tail recursion elimination,
> this second function will stack overflow on large files.

Interestingly, even after (proclaim '(optimize (debug 0))), this one
does even worse than my original function.

Marco Antoniotti

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to

Joseph Dale <jd...@uclink4.berkeley.edu> writes:

Minor mistake

(proclaim '(optimize (debug 0))) or
(declaim (optimize (debug 0)))

should work.

Joe Marshall

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to
According to ACL documentation:

The `tail-call-self-merge-switch' is true if SPEED is greater than 0.
The `tail-call-non-self-merge-switch' is true if SPEED is greater
than 1 and DEBUG is less than 0.

There was a bug where the wrong switch is examined for LABELS
functions, though.

You can also set the switches directly rather than rely on the SPEED
and DEBUG settings:

(setq compiler:tail-call-self-merge-switch t)
(setq compiler:tail-call-non-self-merge-switch t)

Joseph Dale <jd...@uclink4.berkeley.edu> wrote in message
news:38A2D3D8...@uclink4.berkeley.edu...

Joseph Dale

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to
Marco Antoniotti wrote:
>
>
> Minor mistake
>
> (proclaim '(optimize (debug 0))) or
> (declaim (optimize (debug 0)))
>
> should work.
>

Neither does. More stack overflows.

Robert Monfera

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to

Are you sure you have compiled your function? I don't think
tail-recursion is optimized in the interpreter. It's easy _not_ to
compile with the trial version, because loading the file won't do it. I
guess you do file loading, because you used PROCLAIM instead of
DECLARE. You could use EXCL:LOAD-COMPILED on the file or COMPILE the
function, and you can check your function with COMPILED-FUNCTION-P (?)
if it is actually compiled.

Others wrote about declarations already.

Robert

Joseph Dale

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to
Robert Monfera wrote:
>
> Are you sure you have compiled your function? I don't think
> tail-recursion is optimized in the interpreter. It's easy _not_ to
> compile with the trial version, because loading the file won't do it. I
> guess you do file loading, because you used PROCLAIM instead of
> DECLARE. You could use EXCL:LOAD-COMPILED on the file or COMPILE the
> function, and you can check your function with COMPILED-FUNCTION-P (?)
> if it is actually compiled.
>

Thanks!! That did it.

Joe

Pierre R. Mai

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to
Joseph Dale <jd...@uclink4.berkeley.edu> writes:

> This is why I'm am in such shock, because neither this, nor any other
> tail-recursive version of my function, seems to work in either CLISP or
> ACL (on Red Hat 6.1). I was under the impression that these systems
> would optimize tail-recursion. The non-recursive version is nice, but I
> was really hoping to do it recursively.

Why recursively? Why lists? These seem like very bad choices for
large file sizes...

If indeed you need a list, what's wrong with

(defun get-words-from-file (filename)
(with-open-file (stream filename)
(loop for line = (read-line stream nil nil)
while line
collect line)))

or for an array

(defun get-words-from-line (filename)
(with-open-file (stream filename)
(let ((line-array (make-array 1000 :adjustable t :fill-pointer 0)))
(do ((line (read-line stream nil nil) (read-line stream nil nil)))
((null line) line-array)
(vector-push-extend line line-array)))))

or any other iterative variant. Or if it indeed has to be recursive
(again why? Reading a list/array of lines from a file is not an
intuitively recursive operation, so why make it one?), at least use
labels instead of such ugly Scheme-isms like nested defuns...

> Marco Antoniotti wrote:
>
> > Anyway, a non-recursive version is here
> >

> > (defun get-words (filename)
> > (with-open-file (f filename :direction :input)

> > (loop for line = (read-line f nil '$$get-word-eof$$)
> > when (eq line '$$get-word-eof$$)
> > do (return words)
> > collect line into words)))

See above, the normal read eof-value hack is not needed for read-line,
since read-line will always return strings, which are disjunct from
any symbols (like nil). Also why the collect into with a when/do
return? Use unless/while, and let the loop return the collected list
automatically. If you've got to use LOOP, use it wisely, some might
say ;)

BTW: With an extensible LOOP, you can do things like this (code on
request):

(with-open-file (stream filename)
(loop for line being the lines of stream
collect line))

Or indeed

(with-open-file (stream filename)
(loop for (name age salary . info) being the lines of stream split-by #\Tab
collect (list name info (/ salary age))))

Regs, Pierre.

--
Pierre Mai <pm...@acm.org> PGP and GPG keys at your nearest Keyserver
"One smaller motivation which, in part, stems from altruism is Microsoft-
bashing." [Microsoft memo, see http://www.opensource.org/halloween1.html]

Erik Naggum

unread,
Feb 10, 2000, 3:00:00 AM2/10/00
to
* Joseph Dale <jd...@uclink4.berkeley.edu>

| I'm trying to write a function that takes the name of a dictionary file
| containing one word per line and returns the contents of that file as a
| list of strings.

first off, it's a reasonable thing to do.

| The latest of several versions is this one (I know it's not very elegant):

au contraire, it's _too_ elegant.

| (defun get-words (filename)
| (let ((*words* '()))
| (defun read-loop (stream)

| (setf *words* (cons (read-line stream nil 0) *words*))
| (if (numberp (first *words*))
| (rest *words*)
| (read-loop stream)))
| (with-open-file (stream filename :direction :input)
| (read-loop stream))))

whether this is compiled or not, this is needlessly complex and "elegant"
only in the Scheme sense, which is not elegant at all in Common Lisp.

(defun stream-lines (input-stream)
(do ((lines nil)
(line #1=(read-line input-stream nil :eof) #1#))
((eq line :eof) (nreverse lines))
(push line lines)))

if you're not as silly as to have an allergic reaction to LOOP, this is
even better:

(defun stream-lines (input-stream)
(loop for line = (read-line input-stream nil :eof)
until (eq line :eof)
collect line))

using a number for magic purposes and then testing for any number is
_really_ bad style. where did you pick this up? a Perl class?

| The function works with files containing less than around 10,000 words,
| but no matter what I do, whenever I call this function with a file

| containing more words, I get a stack overflow or a core dump. How can I
| do this with larger files? I feel like there must be a better way.

recursion has its uses. using recursion for iteration is not one of
them. Scheme educators got that one _seriously_ confused, and your code
is simply such Scheme code in Common Lisp. this is _not_ a reasonable
thing to do, not even in Scheme.

my suggestions:

1 get rid of the internal define.
2 use iteration forms when that's what you do.
3 use recursion for inherently recursive tasks, only.
4 learn Common Lisp if you want to think in Common Lisp.
5 use Scheme if you think in Scheme.

writing Scheme code in Common Lisp is also really bad for your karma.

#:Erik

Marco Antoniotti

unread,
Feb 11, 2000, 3:00:00 AM2/11/00
to

pm...@acm.org (Pierre R. Mai) writes:

> > Marco Antoniotti wrote:
> >
> > > Anyway, a non-recursive version is here
> > >
> > > (defun get-words (filename)
> > > (with-open-file (f filename :direction :input)
> > > (loop for line = (read-line f nil '$$get-word-eof$$)
> > > when (eq line '$$get-word-eof$$)
> > > do (return words)
> > > collect line into words)))
>
> See above, the normal read eof-value hack is not needed for read-line,
> since read-line will always return strings, which are disjunct from
> any symbols (like nil). Also why the collect into with a when/do
> return? Use unless/while, and let the loop return the collected list
> automatically. If you've got to use LOOP, use it wisely, some might
> say ;)

Yep, I stand corrected. I was on automatic... :{

Cheers

0 new messages