[cl-ppcre-devel] New CL-UNICODE release 0.1.1

8 views
Skip to first unread message

Edi Weitz

unread,
Jul 24, 2008, 11:01:32 AM7/24/08
to cl-ppcr...@common-lisp.net, cl-ppcre...@common-lisp.net
ChangeLog:

Version 0.1.1
2008-07-24
Make ADD-HANGUL-NAMES faster for ClozureCL

Download:

http://weitz.de/files/cl-unicode.tar.gz

Edi.
_______________________________________________
cl-ppcre-devel site list
cl-ppcr...@common-lisp.net
http://common-lisp.net/mailman/listinfo/cl-ppcre-devel

Dave Pawson

unread,
Jul 24, 2008, 12:18:21 PM7/24/08
to General interest list about cl-ppcre and cl-unicode, cl-ppcre...@common-lisp.net
So when will utf-8 be the natural encoding of ppcre?
(Yes, I know I'm greedy)

regards


--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk

Edi Weitz

unread,
Jul 24, 2008, 12:30:40 PM7/24/08
to General interest list about cl-ppcre and cl-unicode
On Thu, 24 Jul 2008 17:18:21 +0100, "Dave Pawson" <dave....@gmail.com> wrote:

> So when will utf-8 be the natural encoding of ppcre?

I don't understand the question. CL-PPCRE deals with strings and not
with arrays of octets.

Dave Pawson

unread,
Jul 24, 2008, 12:39:40 PM7/24/08
to General interest list about cl-ppcre and cl-unicode
2008/7/24 Edi Weitz <e...@agharta.de>:

> On Thu, 24 Jul 2008 17:18:21 +0100, "Dave Pawson" <dave....@gmail.com> wrote:
>
>> So when will utf-8 be the natural encoding of ppcre?

xml has long dealt with 'strings of characters' encoded in utf-8.
That way I can include an umlaut, an arabic glyph or a chinese symbol

Any reason lisp should not enjoy that level of internationalisation?


regards


--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk

Edi Weitz

unread,
Jul 24, 2008, 12:59:17 PM7/24/08
to General interest list about cl-ppcre and cl-unicode
On Thu, 24 Jul 2008 17:39:40 +0100, "Dave Pawson" <dave....@gmail.com> wrote:

> xml has long dealt with 'strings of characters' encoded in utf-8.

I think you are confused. In Lisp, characters and strings are really
characters and strings.

CL-USER 4 > #\ä
#\ä

CL-USER 5 > (type-of *)
CHARACTER

CL-USER 6 > (char-name **)
"Latin-Small-Letter-A-With-Diaeresis"

If you want to convert between octets and characters (that's where
encodings like UTF-8 make sense), most CL implementations have
facilities for this out of the box. For portable solutions see for
example here:

http://weitz.de/flexi-streams/
http://common-lisp.net/project/babel/

> That way I can include an umlaut, an arabic glyph or a chinese
> symbol

See above.

> Any reason lisp should not enjoy that level of internationalisation?

It does already.

HTH,
Edi.

Dave Pawson

unread,
Jul 24, 2008, 1:09:51 PM7/24/08
to General interest list about cl-ppcre and cl-unicode
2008/7/24 Edi Weitz <e...@agharta.de>:

> I think you are confused. In Lisp, characters and strings are really
> characters and strings.

> CL-USER 6 > (char-name **)
> "Latin-Small-Letter-A-With-Diaeresis"

Sorry ** doesn't look like u00e4

>
> If you want to convert between octets and characters (that's where
> encodings like UTF-8 make sense), most CL implementations have
> facilities for this out of the box. For portable solutions see for
> example here:
>
> http://weitz.de/flexi-streams/
> http://common-lisp.net/project/babel/

I don't want to convert, I want to read utf-8 from a file,
work in 'characters', build them into strings
and write them back to file, in utf-8


>> Any reason lisp should not enjoy that level of internationalisation?
>
> It does already.


seems we have a different definition of 'working'.

regards

--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk

Daniel Gackle

unread,
Jul 24, 2008, 1:19:43 PM7/24/08
to General interest list about cl-ppcre and cl-unicode
< Sorry ** doesn't look like u00e4 >

Hans Hübner

unread,
Jul 24, 2008, 3:25:03 PM7/24/08
to General interest list about cl-ppcre and cl-unicode
Dave,

sorry to be harsh, but the problem here is that you don't understand
external formats and how they relate to characters. Most modern Lisps
use Unicode as their character set, and most of them represent
characters as 16 or 32 bit integers internally. UTF-8, contrasted to
that, is an external encoding scheme for Unicode characters, and
again, most Lisps support reading and writing characters in UTF-8
encoding.

The external format of files read and written is usually specified
using the :external-format keyword argument to functions like OPEN,
WITH-OPEN-FILE etc. Also, there are portability libraries like BABEL
that can be helpful to convert Lisp strings to arbitary external
formats, for example when calling foreign functions or reading and
writing binary files.

CL-PPCRE uses Lisp characters and strings and works with Unicode
characters just fine. The CL-UNICODE library is a portability library
for working with Unicode directly, but most users never really need to
do that.

Please read up on external formats in your Lisp implementation's manual.

-Hans

Chris Dean

unread,
Jul 24, 2008, 3:41:14 PM7/24/08
to General interest list about cl-ppcre and cl-unicode

"Dave Pawson" <dave....@gmail.com> writes:
> I don't want to convert, I want to read utf-8 from a file, work in
> 'characters', build them into strings and write them back to file,
> in utf-8

This just works. You probably need to use external-format with OPEN
(or more likely WITH-OPEN-FILE) to indicate the encoding you are
using. This will read one line of file in LispWorks:

(with-open-file (in file-name
:external-format :utf-8
:element-type 'character)
(read-line in))

> seems we have a different definition of 'working'.

Please explain what doesn't work. Maybe a code sample would help.

Cheers,
Chris Dean

Edi Weitz

unread,
Jul 24, 2008, 3:44:33 PM7/24/08
to General interest list about cl-ppcre and cl-unicode
On Thu, 24 Jul 2008 18:09:51 +0100, "Dave Pawson" <dave....@gmail.com> wrote:

> 2008/7/24 Edi Weitz <e...@agharta.de>:
>
>> I think you are confused. In Lisp, characters and strings are really
>> characters and strings.
>
>> CL-USER 6 > (char-name **)
>> "Latin-Small-Letter-A-With-Diaeresis"
>
> Sorry ** doesn't look like u00e4

Get a good book about Common Lisp and come back once you've understood
the basic issues.

http://www.lispworks.com/documentation/HyperSpec/Body/v__stst_.htm

> I don't want to convert, I want to read utf-8 from a file, work in
> 'characters', build them into strings and write them back to file,
> in utf-8

Sigh...

> seems we have a different definition of 'working'.

Humor me - please give me a short description what I need to change to
make UTF-8 "the natural encoding" of CL-PPCRE. I'm really looking
forward to that.

Reply all
Reply to author
Forward
0 new messages