With the simple file text IO as follows:
(with-open-file (stream "/some/file/name.txt")
(format t "~a~%" (read-line stream)))
I tried two text files, both are Traditional Chinese,
one is Big-5(Codepage 950), the other is UTF-8
[1]> (with-open-file (stream "/temp/Big5_Chinese.txt")
(format t "~a~%" (read-line stream)))
中文
NIL
This works in CLISP 2.39, but in LispBox (SLIME/ with CLISP upgraded to
2.39),
it shows
Character #\u4E2D cannot be represented in the character set
CHARSET:ISO-8859-1
[Condition of type EXT:SIMPLE-CHARSET-TYPE-ERROR]
[2]> (with-open-file (stream "/temp/UTF8_Chinese.txt")
(format t "~a~%" (read-line stream)))
*** - POSIX library error 42 (EILSEQ): Invalid multibyte or wide
character
The following restarts are available:
ABORT :R1 ABORT
Break 1 [3]> :R1
The Common Lisp standard specifies the standard character set to be exactly:
SP ! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~
Nothing less, nothing more.
So why are you expecting to be able to read a file of character
containing any other character than these, with only the standard API?
Now, if you read the error message some more closely, you might notice
something. Try to read it again:
Character #\u4E2D cannot be represented in the character set
CHARSET:ISO-8859-1
What does this error message tell us?
You may want to read again also the CLHS page about OPEN:
http://www.lispworks.com/documentation/HyperSpec/Body/f_open.htm
and the clisp Implementation Notes
http://clisp.cons.org/impnotes/stream-dict.html#open
(only for a start, don't hesitate to further follow links, like:
http://clisp.cons.org/impnotes/encoding.html#def-file-enc
).
--
__Pascal Bourguignon__ http://www.informatimago.com/
Nobody can fix the economy. Nobody can be trusted with their finger
on the button. Nobody's perfect. VOTE FOR NOBODY.
> With the simple file text IO as follows:
>
> (with-open-file (stream "/some/file/name.txt")
> (format t "~a~%" (read-line stream)))
>
> I tried two text files, both are Traditional Chinese,
> one is Big-5(Codepage 950), the other is UTF-8
Maybe you need the :external-format keyword option to with-open-file?
http://clisp.cons.org/impnotes/faq.html#faq-enc-err
--
Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 5 (Bordeaux)
http://camera.org http://thereligionofpeace.com http://memri.org
http://honestreporting.com http://jihadwatch.org http://mideasttruth.com
Marriage is the sole cause of divorce.
It works only for unicode files.
What about ISO-8859-1 files? What about ISO-2022-JP files? What
about BIG5 files? What about US-ASCII files?
--
__Pascal Bourguignon__ http://www.informatimago.com/
The rule for today:
Touch my tail, I shred your hand.
New rule tomorrow.
> http://en.wikipedia.org/wiki/Byte_Order_Mark
You read it?
"... Quite a lot of Windows software (including Windows Notepad) adds
one to UTF-8 files. However in Unix-like systems (which make heavy use
of text files for configuration) this practice is not recommended, as
it will interfere with correct processing of important codes such as
the hash-bang at the start of an interpreted script."
WBR, Yaroslav Kavenchuk.
I would guess that this relates to the coding system for communication
between Emacs and CLISP, if you are saying this works with plain CLISP
but not when connecting with SLIME.
In CLISP, after loading Swank but before starting the server, do:
(setq swank::*coding-system* :utf-8-unix)
In Emacs, after loading SLIME but before connecting to CLISP, do:
(setq slime-net-coding-system 'utf-8-unix)
I forget how to fix the inferior-lisp buffer to do this right, maybe
something about C-x <RET> f?
--
Stephen Compall
http://scompall.nocandysw.com/blog
>sailor...@gmail.com wrote:
>> [1]> (with-open-file (stream "/temp/Big5_Chinese.txt")
>> (format t "~a~%" (read-line stream)))
>> 中文
>> NIL
>>
>> This works in CLISP 2.39, but in LispBox (SLIME/ with CLISP upgraded to
>> 2.39),
>> it shows
>>
>> Character #\u4E2D cannot be represented in the character set
>> CHARSET:ISO-8859-1
>> [Condition of type EXT:SIMPLE-CHARSET-TYPE-ERROR]
>
>I would guess that this relates to the coding system for communication
>between Emacs and CLISP, if you are saying this works with plain CLISP
> but not when connecting with SLIME.
>
>In CLISP, after loading Swank but before starting the server, do:
>
>(setq swank::*coding-system* :utf-8-unix)
I don't think it is necessary, because the next step sets it up already:
>In Emacs, after loading SLIME but before connecting to CLISP, do:
>
>(setq slime-net-coding-system 'utf-8-unix)
Put this line into .emacs
--
|Don't believe this - you're not worthless ,gr---------.ru
|It's us against millions and we can't take them all... | ue il |
|But we can take them on! | @ma |
| (A Wilhelm Scream - The Rip) |______________|
Cheers,
Chris