Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL?

166 views
Skip to first unread message

Emre Sevinc

unread,
Dec 25, 2008, 3:04:26 AM12/25/08
to
Dear Lispers,

I'm trying to connect to a MySQL and retrieve some data that are
encoded in utf-8 and contain some Turkish characters. First let me
show data correctly using commandline mysql:

cas@cas-laptop:~$ mysql --host=10.15.4.100 -D "flac_deneme" -u emres -
p
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.0.51a-3ubuntu5.4 (Ubuntu)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> SET NAMES 'utf8';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM Album WHERE ID=3;
+----+-------------------+--------+------+-------+--------+
| ID | Name | Artist | Date | Label | Studio |
+----+-------------------+--------+------+-------+--------+
| 3 | Agannaga Rüşvet | 362 | 1992 | 8 | |
+----+-------------------+--------+------+-------+--------+
1 row in set (0.00 sec)

mysql> exit

As you may see the critical part I'm interested in is the Name field
which contains the Turkish string "Rüşvet".

Now, let's see what kind of problem happens when I try to run the same
query within Allegro CL 8.1 (Free Express Edition) with the help of cl-
sql (some irrelevant message lines are ommited and shown as dots):

cas@cas-laptop:~/Documents/applications/acl81_express$ ./alisp
;;; Installing locale patch, version 1.
International Allegro CL Free Express Edition
8.1 [Linux (x86)] (Nov 27, 2008 11:52)
Copyright (C) 1985-2007, Franz Inc., Oakland, CA, USA. All Rights
Reserved.

This development copy of Allegro CL is licensed to:
Trial User

CL-USER(1): (push #P"/usr/share/common-lisp/systems/" asdf:*central-
registry*)

; Autoloading for package "ASDF":
; Fast loading
; /home/cas/Documents/applications/acl81_express/code/asdf.fasl
(#P"/usr/share/common-lisp/systems/" *DEFAULT-PATHNAME-DEFAULTS*)

CL-USER(2): (asdf:operate 'asdf:load-op 'clsql)
; loading system definition from
; /usr/share/common-lisp/systems/clsql.asd into #<The ASDF0 package>
; Loading /usr/share/common-lisp/systems/clsql.asd
; loading system definition from
; /usr/share/common-lisp/systems/uffi.asd into #<The ASDF1 package>
; Loading /usr/share/common-lisp/systems/uffi.asd
; registering #<SYSTEM UFFI @ #x71619482> as UFFI
; Loading /etc/clsql-init.lisp
.
.
.

NIL


Now I connect to the database on another machine:

CL-USER(3): (clsql:connect '("10.15.4.100"
"flac_deneme"
"emres"
"emres")
:database-type :mysql)
; loading system definition from
; /usr/share/common-lisp/systems/clsql-mysql.asd into
; #<The ASDF0 package>
; Loading /usr/share/common-lisp/systems/clsql-mysql.asd
; registering #<SYSTEM :CLSQL-MYSQL @ #x71781db2> as CLSQL-MYSQL
; loading system definition from
; /usr/share/common-lisp/systems/clsql-uffi.asd into
.
.
.
#<CLSQL-MYSQL:MYSQL-DATABASE 10.15.4.100/flac_deneme/emres OPEN @
#x717a7aa2>


And I run the same query:

CL-USER(4): (clsql-sys:query "SELECT * FROM Album WHERE ID=3")
((3 "Agannaga Rü?vet" 362 "1992" 8 ""))
("ID" "Name" "Artist" "Date" "Label" "Studio")

Here is the first problematic version above, it returns "Rü?vet" not
the correct "Rüşvet".

Let's try harder:

CL-USER(5): (clsql-sys:execute-command "SET NAMES 'utf8'")
CL-USER(6): (clsql-sys:query "SELECT * FROM Album WHERE ID=3")
((3 "Agannaga RüÅvet" 362 "1992" 8 ""))
("ID" "Name" "Artist" "Date" "Label" "Studio")

Another problematic version above, it returns "RüÅvet" instead of the
correct version "Rüşvet".

What is the *locale* of my Allegro Common Lisp?

CL-USER(7): *locale*
#<locale "en_US" [:UTF8-BASE] @ #x7130205a>

I give it another try even though it'll switch to iso-8859-9 encoding:

CL-USER(8): (setf *locale* (find-locale "tr_TR"))
;; Autoloading locale from #P"/home/cas/Documents/applications/
acl81_express/locales/tr_TR"
; Fast loading from bundle code/ef-iso8859-9.fasl.
; Fast loading from bundle code/efft-iso8859-9-base.fasl.
#<locale "tr_TR" [:ISO8859-9-BASE] @ #x716a72f2>

Here we go:

CL-USER(9): (clsql-sys:query "SELECT * FROM Album WHERE ID=3")
((3 "Agannaga RüÅvet" 362 "1992" 8 ""))
("ID" "Name" "Artist" "Date" "Label" "Studio")

No hope and it seems like en_EN.utf-8 returns the same result as
tr_TR.iso8859-9

My system is Ubuntu GNU/Linux Hardy. I'm using Allegro CL Express
Edition 8.1 (free version).

$ uname -a
Linux cas-laptop 2.6.24-22-generic #1 SMP Mon Nov 24 18:32:42 UTC 2008
i686 GNU/Linux

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Any ideas how I can get my utf-8 encoded Turkish characters from cl-
sql using Allegro CL?

Note: There are no problems reading and writing utf-8 encoded Turkish
characters to/from files on my local hard disk, Allegro CL can handle
that without problems:

$ cat /home/cas/Documents/source/Lisp/ilc/name.txt
Agannaga Rüşvet 362 1992 8

CL-USER(1): (with-open-file (stream "/home/cas/Documents/source/Lisp/
ilc/name.txt")
(format t "~a~%" (read-line stream)))
Agannaga Rüşvet 362 1992 8
NIL

Cheers,

--
Emre sevinc

Emre Sevinc

unread,
Dec 25, 2008, 3:43:56 AM12/25/08
to
On Dec 25, 10:04 am, Emre Sevinc <emre.sev...@gmail.com> wrote:

> mysql> SELECT * FROM Album WHERE ID=3;
> +----+-------------------+--------+------+-------+--------+
> | ID | Name              | Artist | Date | Label | Studio |
> +----+-------------------+--------+------+-------+--------+

> |  3 | Agannaga Rüþvet |    362 | 1992 |     8 |        |


> +----+-------------------+--------+------+-------+--------+
> 1 row in set (0.00 sec)
>
> mysql> exit
>
> As you may see the critical part I'm interested in is the Name field

> which contains the Turkish string "Rüþvet".

It seems like Google Groups messed up too!

I'm talkin about "Latin small letter s with cedilla" which I type as
"ş" (don't know if this will be rendered correctly when this message
shows up even though my Firefox's character encoding seems to be
utf-8).


--
Emre

Pascal J. Bourguignon

unread,
Dec 25, 2008, 6:41:59 AM12/25/08
to
Emre Sevinc <emre....@gmail.com> writes:
>> As you may see the critical part I'm interested in is the Name field
>> which contains the Turkish string "Rüþvet".
>
> It seems like Google Groups messed up too!

Yes.

> I'm talkin about "Latin small letter s with cedilla" which I type as
> "ş" (don't know if this will be rendered correctly when this message
> shows up even though my Firefox's character encoding seems to be
> utf-8).

This one shows ok.

AFAIK, clsql doesn't manage encodings. You have to do it yourself
explicitely. Here is what I did:

(defparameter *weubcwk*
#+sbcl :iso-8859-1
#+clisp charset:iso-8859-1
#-(or sbcl clisp)
(error "What encoding is used by clsql in ~A?" (lisp-implementation-type))
"Whatever Encoding Used By Clsql Who Knows")

(defun utf-8-bytes (data)
#-(or sbcl clisp)
(error "utf-8-bytes not implemented in ~A?" (lisp-implementation-type))
(let ((bytes
#+sbcl (sb-ext:string-to-octets data :external-format *weubcwk*)
#+clisp (ext:convert-string-to-bytes data *weubcwk*)))
#+sbcl (sb-ext:octets-to-string bytes :external-format :utf-8)
#+clisp (ext:convert-string-from-bytes bytes charset:utf-8)))

(defun utf-8-string (string)
#-(or sbcl clisp)
(error "utf-8-string not implemented in ~A?" (lisp-implementation-type))
(let ((bytes
#+sbcl (sb-ext:string-to-octets string :external-format :utf-8)
#+clisp (ext:convert-string-to-bytes string charset:utf-8)))
#+sbcl (sb-ext:octets-to-string bytes :external-format *weubcwk*)
#+clisp (ext:convert-string-from-bytes bytes *weubcwk*)))


Then, when you define your views in clsql, use the :db-reader and
:db-writer to convert the strings:

(clsql:def-view-class example ()
((text :db-kind :base
:db-constraints :not-null
:db-reader utf-8-bytes
:db-writer utf-8-string
:type (varchar 75)
:reader example-text))
(:base-table "EXAMPLE"))

--
__Pascal Bourguignon__

Jens Teich

unread,
Dec 25, 2008, 8:27:17 AM12/25/08
to
For these reasons I switched to PostgreSQL and postmodern
(http://common-lisp.net/project/postmodern/) and am fine
with utf8 on LispWorks and SBCL.

Jens

Alex Mizrahi

unread,
Dec 25, 2008, 11:15:55 AM12/25/08
to
ES> Another problematic version above, it returns "RüÅvet" instead of the
ES> correct version "Rüþvet".

CLSQL thinks that octets it recieves are in latin1, one-byte encoding, and
it just puts them one by one into a lisp string. what you see here is UTF-8
interpreted
as if it was latin1.

it does so using uffi:convert-from-cstring, which in its turn uses Allegro
CL function
excl:native-to-string. according to documentation:
http://www.franz.com/support/documentation/7.0/doc/operators/excl/native-to-string.htm
you can customize this convesion setting *locale*'s locale-external-format.
HTH

(alternatively you can decode strings you get from CLSQL into octets (using
latin1 format
that is used by default) and concert this octets back to string with UTF-8
external format.
that's what Pascal have wrote, but he have got format order reversed.)


Kenny

unread,
Dec 25, 2008, 1:23:36 PM12/25/08
to

I was not involved in the effort to get AllegroCL to read something
similar so I can offer little more than we too ended up with a
Postgres/Postmodern stack.

kt

Emre Sevinc

unread,
Dec 26, 2008, 6:56:56 AM12/26/08
to
On Dec 25, 6:15 pm, "Alex Mizrahi" <udode...@users.sourceforge.net>
wrote:

> ES> Another problematic version above, it returns "RüÅvet" instead of the
> ES> correct version "Rüþvet".
>
> CLSQL thinks that octets it recieves are in latin1, one-byte encoding, and
> it just puts them one by one into a lisp string. what you see here is UTF-8
> interpreted as if it was latin1.
>
> it does so using uffi:convert-from-cstring, which in its turn uses Allegro
> CL function excl:native-to-string. according to documentation:http://www.franz.com/support/documentation/7.0/doc/operators/excl/nat...

> you can customize this convesion setting *locale*'s locale-external-format.
> HTH
>
> (alternatively you can decode strings you get from CLSQL into octets (using
> latin1 format
> that is used by default) and concert this octets back to string with UTF-8
> external format.
> that's what Pascal have wrote, but he have got format order reversed.)

I gave it a try without success, here's what I did:

CG-USER(2): *locale*
#<locale "en_US" [:UTF8-BASE] @ #x71e2d7ea>

CG-USER(3): (LOCALE-EXTERNAL-FORMAT (FIND-LOCALE "en_EN"))
#<EXTERNAL-FORMAT :LATIN1 [(CRLF-BASE-EF :LATIN1)] @ #x71112242>

CG-USER(4): (find-external-format :utf-8)
#<EXTERNAL-FORMAT :UTF8 [(CRLF-BASE-EF :UTF8)] @ #x711121c2>

CG-USER(5): (SETF (LOCALE-EXTERNAL-FORMAT (FIND-LOCALE "en_EN")) (find-
external-format :utf-8))
#<EXTERNAL-FORMAT :UTF8 [(CRLF-BASE-EF :UTF8)] @ #x711121c2>

CG-USER(8): (LOCALE-EXTERNAL-FORMAT (FIND-LOCALE "en_EN"))
#<EXTERNAL-FORMAT :UTF8 [(CRLF-BASE-EF :UTF8)] @ #x711121c2>

So now it seems like I have utf-8 enabled, now I try:

CG-USER(11): (clsql:execute-command "SET NAMES 'utf8'")

CG-USER(12): (CLSQL:QUERY "SELECT * FROM gecici")
(("rüğşvetçi"))
("metin")

CG-USER(13): (string-to-native (CAR (CAR (CLSQL-SYS:QUERY "SELECT *
FROM gecici"))))
2685176784
22

CG-USER(14): (NATIVE-TO-STRING 2685176784 :external-format :utf-8)
"rüğşvetçi"
13
21

CG-USER(15): (NATIVE-TO-STRING 2685176784)
"rüğşvetçi"
13
21

Seems to have no effect. Now I cannot understand because you said "it


does so using uffi:convert-from-cstring, which in its turn uses

Allegro CL function excl:native-to-string", I'm trying to use native-
to-string directly but it still seems problematic.

--
Emre

Emre Sevinc

unread,
Dec 26, 2008, 7:08:21 AM12/26/08
to
> Then, when you define your views in clsql, use the :db-reader and
> :db-writer to convert the strings:
>
> (clsql:def-view-class example ()
>   ((text         :db-kind :base
>                  :db-constraints :not-null
>                  :db-reader utf-8-bytes
>                  :db-writer utf-8-string
>                  :type (varchar 75)
>                  :reader example-text))
>   (:base-table "EXAMPLE"))
>

Thank you very much!

I tried it and it seems to work:

CG-USER(19): (defun utf-8-bytes (data)
(let ((bytes
(string-to-octets data :external-format :iso-8859-1)))
(octets-to-string bytes :external-format :utf-8)))
UTF-8-BYTES

CG-USER(26): (clsql:def-view-class gecici ()
((metin
:db-kind :base
:type (varchar 75)
:db-reader utf-8-bytes
:accessor text
:initarg :text))
(:base-table "gecici"))
#<CLSQL-SYS::STANDARD-DB-CLASS GECICI>

CG-USER(27): (let ((new-text (car
(clsql:select 'gecici
:flatp t))))
(format t "~A" (text new-text)))
rüğşvetçi
NIL

--
Emre

Emre Sevinc

unread,
Dec 26, 2008, 7:13:33 AM12/26/08
to
On Dec 25, 6:15 pm, "Alex Mizrahi" <udode...@users.sourceforge.net>
wrote:
>  ES> Another problematic version above, it returns "RüÅvet" instead of the
>  ES> correct version "Rüþvet".
>
> CLSQL thinks that octets it recieves are in latin1, one-byte encoding, and
> it just puts them one by one into a lisp string. what you see here is UTF-8
> interpreted
> as if it was latin1.
>
> it does so using uffi:convert-from-cstring, which in its turn uses Allegro
> CL function
> excl:native-to-string. according to documentation:http://www.franz.com/support/documentation/7.0/doc/operators/excl/nat...

> you can customize this convesion setting *locale*'s locale-external-format.
> HTH

Hmm, after playing with Pascal's solution I guess I understood what
you meant, too. This time a correct example:

CG-USER(29): (string-to-native (CAR (CAR (CLSQL-SYS:QUERY "SELECT *
FROM gecici"))) :external-format :iso-8859-1)
2685292688
27

CG-USER(30): (NATIVE-TO-STRING 2685292688 :external-format :utf-8)
"rüğşvetçi ĞŞİıçö"
16
26

(Google Groups may mess up the characters but I had what I wanted in
my Allegro CL REPL happily :)

Cheers,

--
Emre

0 new messages