Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

how to call read from a non-character stream

24 views
Skip to first unread message

Jim Newton

unread,
Feb 7, 2019, 4:34:38 PM2/7/19
to
I have a file which contains characters which read-char is not able to read because their char code is apparently larger than 127.

The content of the file is not in my control. It is a standard format
https://people.sc.fsu.edu/~jburkardt/data/cnf/cnf.html
and apparently in the comments, non-ascii characters are allowed.

So I've opened the file with :element-type 'unsigned-byte.
With this I can read the file byte by byte explicitly ignoring these funny characters.
At some point I read to read an expression which I know does not have a funny character.
I want to call (read stream) to read what I know is an integer.
But cannot call read on a non-character stream.

#<SB-SYS:FD-STREAM for "file /Volumes/Disk2/jimka/sw/common-lisp/regular-type-expression/cl-robdd/data/dtba-sat.cnf" {10040DDF13}> is not a character input stream.
[Condition of type SIMPLE-TYPE-ERROR]

Is there a way for me to set the stream-element-type back to character before calling read?

What is the correct way to do this?


Kaz Kylheku

unread,
Feb 7, 2019, 4:43:58 PM2/7/19
to
On 2019-02-07, Jim Newton <jimka...@gmail.com> wrote:
> I have a file which contains characters which read-char is not able to read because their char code is apparently larger than 127.
>
> The content of the file is not in my control. It is a standard format
> https://people.sc.fsu.edu/~jburkardt/data/cnf/cnf.html
> and apparently in the comments, non-ascii characters are allowed.
> So I've opened the file with :element-type 'unsigned-byte.

The thing to do is find out how, in your Lisp implementation, to set up
the ISO-8859-1 encoding on the stream. Or perhaps UTF-8, if the comments
are actually valid UTF-8.

ISO-8859-1 is the "8 bit byte = character" encoding that is useful for
situations when you don't actually care about the true encoding of the
data, but need to pass it through cleanly and/or not choke on it.

(It's also rumored to be useful for situations when the data
*is* actually using ISO-8859-1.)

Jim Newton

unread,
Feb 7, 2019, 4:58:35 PM2/7/19
to
Thanks, Kaz, do you know how to do this for sbcl?

Jim Newton

unread,
Feb 7, 2019, 5:24:44 PM2/7/19
to
The following seems to work, if :external-format :utf-8 is used, then
(read-line stream nil EOF) will read the line without complaining.

(defun read-sat-file (fname)
"Read a DIMACS CNF file, as described by https://people.sc.fsu.edu/~jburkardt/data/cnf/cnf.html
The CNF file format is an ASCII file format.

(with-open-file (stream fname :direction :input :if-does-not-exist :error
:external-format :utf-8)
(let ((EOF (list nil))
clauses)
(labels ((read-to-eol ()
(read-line stream nil EOF))
(read-clause ()
(let (clause)
(loop :for num = (read stream)
:when (eql 0 num)
:do (loop-finish)
:do (push num clause)
:finally (push clause clauses)))))
(loop :for ch = (peek-char nil stream nil EOF)
:do (cond
((eql ch EOF)
(loop-finish))
((or (digit-char-p ch)
(eql ch #\-))
(read-clause))
(t
(read-to-eol)))))

clauses)))
0 new messages