I've got a bit of a problem so could anyone out there help me. As a computer officer in the Dept of Language Engineering at UMIST I am asked to supply software for various projects.
We tend to have to work in several languages with non-latin scripts, i.e. Greek Cyrillic and even Arabic. Does anyone have a suggestion as to a unicode compatible lisp that we can use. We have Allegro CL ver 5.0 has anyone any experience in using non-latin scripts with this, either under NT or Solaris7? Many Thanks
-- Paul Johnston System Admin Language Engineering UMIST Tel 0161 200 3111
paul johnston wrote: > We tend to have to work in several languages with non-latin scripts, > i.e. Greek Cyrillic and even Arabic. Does anyone have a suggestion as to > a unicode compatible lisp that we can use. > We have Allegro CL ver 5.0 has anyone any experience in using non-latin > scripts with this, either under NT or Solaris7?
Although I have very limited experience with unicode (I wrote a couple of strings in Italian once), there was an interesting discussion on c.l.l. a short while ago about issues related to this but I am not sure if it is exactly what you wanted[1].
I would also contact Franz directly. Finally, having run a quick search on the Franz website I found references to International Allegro CL which has support for Japanese (kanji, ganji &c) so I would have though Greek, Cyrillic, Arabic or whatever must be tractable.
Best Regards,
:) will
[1] This was the thread `strings and characters' see www.deja.com/getdoc.xp?AN=598005460. Also Deja is always a good starting point for researching historic postings from c.l.l.
> I've got a bit of a problem so could anyone out there help me. > As a computer officer in the Dept of Language Engineering at UMIST I am > asked to supply software for various projects.
> We tend to have to work in several languages with non-latin scripts, > i.e. Greek Cyrillic and even Arabic. Does anyone have a suggestion as to > a unicode compatible lisp that we can use.
Lispworks has reasonably good support for Unicode. I've used it to edit and process some Unicode files that contained characters from the ASCII, Latin1 and Cyrillic character blocks.
It was pretty easy to configure the editor so it could switch between two sets of keyboard bindings. (I think it would be easy to support Greek in the same way, but configuring the editor for working with Arabic would be a lot more difficult, as you probably know.)
I had a few small problems with Lispwork's Unicode support, but no major gotchas.
> Does anyone have a suggestion as to a unicode compatible lisp that > we can use.
The Linux Unicode HOWTO [1], section 5.3, answers your question:
The Common Lisp standard specifies two character types: `base-char' and `character'. It's up to the implementation to support Unicode or not. The language also specifies a keyword argument `:external-format' to `open', as the natural place to specify a character set or encoding.
Among the free Common Lisp implementations, only CLISP http://clisp.cons.org/ supports Unicode. You need a CLISP version from March 2000 or newer. ftp://clisp.cons.org/pub/lisp/clisp/source/clispsrc.tar.gz. The types `base-char' and `character' are both equivalent to 16-bit Unicode. The functions char-width and string-width provide an API comparable to wcwidth() and wcswidth(). The encoding used for file or socket/pipe I/O can be specified through the `:external-format' argument. The encodings used for tty I/O and the default encoding for file/socket/pipe I/O are locale dependent.
Among the commercial Common Lisp implementations, only Eclipse http://www.elwood.com/eclipse/eclipse.htm supports Unicode. See http://www.elwood.com/eclipse/char.htm. The type `base-char' is equivalent to ISO-8859-1, and the type `character' contains all Unicode characters. The encoding used for file I/O can be specified through a combination of the `:element-type' and `:external-format' arguments to `open'. Limitations: Character attribute functions are locale dependent. Source and compiled source files cannot contain Unicode string literals.
The commercial Common Lisp implementation Allegro CL does not support Unicode yet, but Erik Naggum is working on it.
* Bruno Haible | The Linux Unicode HOWTO [1], section 5.3, answers your question: : | The commercial Common Lisp implementation Allegro CL does not support | Unicode yet, but Erik Naggum is working on it.
Franz Inc has had Unicode support in Allegro CL for Windows for quite some time, now, thanks to the efforts of Charles Cox. He has also been working on Unicode support for Allegro CL for Unix for quite some time, now. Allegro CL 6.0 supports Unicode natively.
#:Erik -- If this is not what you expected, please alter your expectations.
Erik Naggum <e...@naggum.no> writes: > * Bruno Haible > | The Linux Unicode HOWTO [1], section 5.3, answers your question: > : > | The commercial Common Lisp implementation Allegro CL does not support > | Unicode yet, but Erik Naggum is working on it.
> Franz Inc has had Unicode support in Allegro CL for Windows for > quite some time, now, thanks to the efforts of Charles Cox. He has > also been working on Unicode support for Allegro CL for Unix for > quite some time, now. Allegro CL 6.0 supports Unicode natively.
Now the question is: are CLisp, ECLipse and ACL compatible in their treatment of Unicode?
Cheers
-- Marco Antoniotti ===========================================
* Marco Antoniotti <marc...@parades.rm.cnr.it> | Now the question is: are CLisp, ECLipse and ACL compatible in their | treatment of Unicode?
Since basically the only useful thing to do with Unicode (data) is to have _real_ wide strings, with characters at least 16 bits wide _each_ and real character types that reflect real Unicoditude, and since Unicode (the standard) defines pretty much what you can do in the outside world, the question of what it means to be compatible appears to be a question of how each Common Lisp treats _streams_ of Unicode characters.
#:Erik -- If this is not what you expected, please alter your expectations.
> * Marco Antoniotti <marc...@parades.rm.cnr.it> > | Now the question is: are CLisp, ECLipse and ACL compatible in their > | treatment of Unicode?
> Since basically the only useful thing to do with Unicode (data) is > to have _real_ wide strings, with characters at least 16 bits wide > _each_ and real character types that reflect real Unicoditude, and > since Unicode (the standard) defines pretty much what you can do in > the outside world, the question of what it means to be compatible > appears to be a question of how each Common Lisp treats _streams_ of > Unicode characters.
Not having played with it but just thinking about it for a second, I'd think there'd also the issue of what #\xxx you write to refer to such a character, and whether a unicode A is char= to a non-unicode A (intended to be constrained by the CL spec, but...), and probably many other little details. It would certainly be interesting to hear about differences people uncover.
Erik Naggum <e...@naggum.no> writes: > * Marco Antoniotti <marc...@parades.rm.cnr.it> > | Now the question is: are CLisp, ECLipse and ACL compatible in their > | treatment of Unicode?
> Since basically the only useful thing to do with Unicode (data) is > to have _real_ wide strings, with characters at least 16 bits wide > _each_ and real character types that reflect real Unicoditude, and > since Unicode (the standard) defines pretty much what you can do in > the outside world, the question of what it means to be compatible > appears to be a question of how each Common Lisp treats _streams_ of > Unicode characters.
Well, Bruno mentioned CHAR-WIDTH and STRING-WIDTH. He also mentioned the treatment of :EXTERNAL-FORMAT.
Does ACL have these functions?
Cheers
-- Marco Antoniotti ===========================================
* Marco Antoniotti <marc...@parades.rm.cnr.it> | Well, Bruno mentioned CHAR-WIDTH and STRING-WIDTH.
From the names, I guess these are relics of coding systems. If you think you need to work with coding systems, you are mistaken. If you still think you need to work with coding systems, measuring the width of characters in bytes is wrong.
| He also mentioned the treatment of :EXTERNAL-FORMAT.
The various external-formats you will need in the complex world of universal character sets are not covered by the standard. Nor should they. There are, however, several conflicting attempts to enumerate them outside of the Lisp world, and it is not necessarily useful to standardize on one of those.
| Does ACL have these functions?
I hope to <deity> that there won't be any char-width or similar cruftitude in Allegro 6.0. If anything, we should have learned from the Great Emacs Experience that exposing coding systems internals to users is Just Plain Wrong.
#:Erik -- If this is not what you expected, please alter your expectations.
Marco Antoniotti <marc...@parades.rm.cnr.it> asked:
> Now the question is: are CLisp, ECLipse and ACL compatible in their > treatment of Unicode?
Let me try to give a summary of the features in - CLISP 2000-03-06 - Eclipse - ACL 6.0 (not yet released) - LispWorks 4.0.1
* Character and string types
In Eclipse, ACL, LispWorks the type BASE-CHAR includes only Latin-1 characters, whereas the CHARACTER type includes all of Unicode (16 bit).
In CLISP, BASE-CHAR and CHARACTER are equivalent and include all of Unicode (16 bit). The memory representation of read-only strings (e.g. symbol print names and program literals) is optimized to 1 byte/character if possible.
* Supported external formats of streams
- CLISP 2000-03-06: Around 80 external formats, including all of the ones supported by browsers and Linux locales. - Eclipse: Only :ASCII (1 byte/character), :UCS (2 bytes/character), and :MULTI-BYTE (locale dependent multibyte representation, works only on OSes for which wchar_t is Unicode). - ACL 6.0: Lots of external formats, mostly table-driven. - LispWorks: Around 10 external formats, including Latin-1, Unicode (2 bytes/character), UTF-8, and the most important Japanese encodings (but not ISO-2022-JP).
Different end-of-line conventions are indicated to OPEN through the :external-format argument in CLISP and LispWorks, and through an extra argument to OPEN in Eclipse.
* Additional API
- CLISP: STRING-WIDTH returns the display width of a string, used by FORMAT ~T. - Eclipse: none. - ACL: unknown. - LispWorks: functions for guessing the encoding of a file (important for Japanese environments)
* FFI support
- CLISP: FFI can pass strings only with single-byte encodings. - Eclipse, ACL: unknown - LispWorks: a few specialized macros for passing strings from/to C.
> there'd also the issue of what #\xxx you write to refer to such > a character
LispWorks uses the syntax #\U+203E, CLISP uses #\U203E, and ACL and LispWorks have no such notation.
> whether a unicode A is char= to a non-unicode A
The entire idea of Unicode is that there is are no characters outside Unicode. The only non-Unicode characters I have ever seen in use in Web pages are Inuktitut (some Eskimo people in Canada).
The bits and font attributes in CL are a different issue, of course.
These functions are needed for anyone wanting to perform tabular output or word wrapping, assuming an output device with a fixed size font and a double size font, like kterm or xterm.
> If you still think you need to work with coding systems, measuring the > width of characters in bytes is wrong.
Measuring the width of characters in bytes is *only* useful when you deal with memory allocation, which you don't normally do in Lisp.
> If anything, we should have learned from the Great Emacs Experience > that exposing coding systems internals to users is Just Plain Wrong.
* Bruno Haible | Naggum, you are guessing wrong, because you neglected to look up the | documentation of the things you are talking about.
I love you, too.
| Measuring the width of characters in bytes is *only* useful when you deal | with memory allocation, which you don't normally do in Lisp.
Well, gee, _I_ think you may not have paid attention to the Great Emacs Experiment, but you wouldn't do something that would even make it _possible_ to claim you haven't looked up the documentation of the things you're talking about, would you? Nah, of course not.
| I agree with you.
I don't generally consider that comforting. This is no exception.
#:Erik -- If this is not what you expected, please alter your expectations.
hai...@clisp.cons.org writes: > Let me try to give a summary of the features in > - CLISP 2000-03-06 > - Eclipse > - ACL 6.0 (not yet released) > - LispWorks 4.0.1
Current version is LW 4.1, but the differences should be small. Liquid 5.0 has many of the same interfaces.
> * Supported external formats of streams > - LispWorks: Around 10 external formats, including Latin-1, Unicode > (2 bytes/character), UTF-8, and the most important Japanese encodings > (but not ISO-2022-JP).
LW support would probably help you, if you needed another external format. There's a relatively painless way of adding one.
Also, on Windows, all the installed codepages are available as external formats.
> * Additional API > - LispWorks: functions for guessing the encoding of a file (important > for Japanese environments)
Also functions for code conversions (see package EXTERNAL-FORMAT in the Reference Manual), lots of string and character types and predicates (package LISPWORKS), and *DEFAULT-CHARACTER-ELEMENT-TYPE* for controlling the default size of strings &c.
> * FFI support > - LispWorks: a few specialized macros for passing strings from/to C.
That's true, but lest people think that describes a limitation, the FLI is pretty C-oriented, and the macros provided together with the foreign types :EF-MB-STRING and :EF-WC-STRING allow passing of strings in any encoding. The types take an external-format parameter, that defaults to the encoding used by the current C locale (assuming you tell LW what that is, see SET-LOCALE in the FLI manual). -- Pekka P. Pirinen, Adaptive Memory Management Team, Harlequin Limited Controlling complexity is the essence of computer programming. - Kernighan
> Marco Antoniotti <marc...@parades.rm.cnr.it> asked: > Let me try to give a summary of the features in > - CLISP 2000-03-06 > - Eclipse > - ACL 6.0 (not yet released) > - LispWorks 4.0.1 > ...
I'm sorry to say that Bruno's information about ACL (both 5.0 and 6.0) is incorrect in a number of regards. Anyone wanting accurate information should get it directly from Franz, not usenet.