[AOLSERVER] ns_db and multibyte support

2 views
Skip to first unread message

Rajesh nair

unread,
Apr 2, 2008, 2:22:48 PM4/2/08
to AOLS...@listserv.aol.com
Hi all,

We have an existing Tcl service which provides the data from mysql to
clients as an HTML table.
We have some records in mysql with multibye characters which are not
being rendered correctly.

Simple ns_db getrow does not return me the correctly encoded data form
database.
I have the sample code snippet to replicate this

set db [ns_db gethandle]

set sql1 "use mydb";
ns_db exec $db $sql1;

set row [ns_db select $db "select title from channel"]
set numcols [ns_set size $row]

while {[ns_db getrow $db $row]} {
for {set i 0} {$i < $numcols} {incr i} {
ns_puts " :[ns_set value $row $i]"
}
ns_puts "<br>"
}

I understand that I need to specify the encoding to get the correct tcl
strings in ns_set but cannot find the way to do so. Any pointers?

Thanks in advance--
-- Rajesh Nair


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <list...@listserv.aol.com> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.

Bas Scheffers

unread,
Apr 2, 2008, 6:38:55 PM4/2/08
to AOLS...@listserv.aol.com
Some questions to help us, and maybe give you some hints as to what
might be wrong.

How did you determine the fact that the value isn't a correct Tcl
string?

What encoding is the database in? (UTF-8?) Are you 100% sure the data
in the database is actually correct?

I wouldn't think you would need to convert anything. Tcl uses UTF-8
internally. A string from the database should be nothing more than an
array of bytes and when a Tcl string is created, it is assumed these
bytes are UTF-8.

Did you specify the correct encoding in either the server headers
(preferred) or html tags? Unless you do that, the browser won't know
what character set it should use.

Regards,
Bas.

Tom Jackson

unread,
Apr 2, 2008, 6:55:54 PM4/2/08
to AOLS...@listserv.aol.com
Just to back up what Bas said, AOLserver has been, and probably still is,
light years ahead of most systems when handling encoding issues.

Tcl is essentially UTF-8, which is multi-byte. But there are so many issues
involved, you have to become something of an expert. The good news is that
AOLserver/Tcl has the tools to handle the issues.

tom jackson

Bas Scheffers

unread,
Apr 2, 2008, 8:42:02 PM4/2/08
to AOLS...@listserv.aol.com
My own special recipe is to not worry about it! :)

If you have a green-fields project with no existing database, all you
do is:

- Make sure the database is UTF-8
- Set the encoding to UTF-8 for any page returned to the client. (if
you have a form on a page and the page was set to UTF-8, the data is
submitted as UTF-8 by the browser, so no conversion needed by you)
- Make sure any files (ADP, resource files with messages, etc) are
saved as UTF-8 if they contain such data.

It then basically takes care of itself.

The only issues I ever faced was (CSV) file uploads, where the data
needed to be extracted and put into the database. This could contain
any encoding without me knowing. In practice it only ever contained
stupid Windows encoding, so I assumed that to be the case and used
Tcl's convert functions.

Bas.

Cynthia Kiser

unread,
Apr 3, 2008, 6:14:54 PM4/3/08
to AOLS...@listserv.aol.com
On Apr 2, 2008, at 5:42 PM, Bas Scheffers wrote:
>
> The only issues I ever faced was (CSV) file uploads, where the data
> needed to be extracted and put into the database. This could
> contain any encoding without me knowing. In practice it only ever
> contained stupid Windows encoding, so I assumed that to be the case
> and used Tcl's convert functions.

Hmmmm CSV + "stupid Windows encoding". Bas perhaps you have just what
I need for a character set issue. I have a data file - actually
delimited by upsidedown exclamation points, not commas. It comes from
a Windows box - apparently with the Windows 1252 character set. I am
trying to load that data into Oracle. I was trying to use SQLLDR to
do that but am having debugging issues. I *think* I have the correct
character set info and octal representation for ¡. But something is
funky.

It never occurred to me to try parsing this with Tcl instead. Is
there an AOLserver or straight Tcl module I should be using to parse
pseudo-CSV? Or is the answer keep it simple and just read lines and
split on ¡ with 'split'?

Michael A. Cleverly

unread,
Apr 3, 2008, 7:10:07 PM4/3/08
to AOLS...@listserv.aol.com
On Thu, Apr 3, 2008 at 4:14 PM, Cynthia Kiser <cki...@its.caltech.edu> wrote:
> Hmmmm CSV + "stupid Windows encoding". Bas perhaps you have just what I
> need for a character set issue. I have a data file - actually delimited by
> upsidedown exclamation points, not commas.

> It never occurred to me to try parsing this with Tcl instead. Is there an


> AOLserver or straight Tcl module I should be using to parse pseudo-CSV? Or
> is the answer keep it simple and just read lines and split on ، with
> 'split'?

tcllib (http://tcllib.sf.net) has a fairly robust csv package
(http://tcllib.sourceforge.net/doc/csv.html) if a straight [split
$line ،] proves lacking.

Michael

Bas Scheffers

unread,
Apr 3, 2008, 7:28:15 PM4/3/08
to AOLS...@listserv.aol.com
On 04/04/2008, at 8:44 AM, Cynthia Kiser wrote:
> It never occurred to me to try parsing this with Tcl instead. Is
> there an AOLserver or straight Tcl module I should be using to parse
> pseudo-CSV? Or is the answer keep it simple and just read lines and
> split on ¡ with 'split'?
Tcl lib has a CSV parser: http://tcllib.sourceforge.net/doc/csv.html.
But if the text doesn't contain that character elsewhere and no
quoting for values is used, split might work just fine. I was dealing
with something exported out of Excel, so if a value contained commas,
it would wrap the thing in quotes. Hence a complete CSV implementation
was easiest.

To make absolutely, 100% sure you have the correct character to
separate on, you could edit the file to just contain that one and read
it in.

Bas.

Jade Rubick

unread,
Apr 3, 2008, 7:31:12 PM4/3/08
to AOLS...@listserv.aol.com
I echo Bas here. The only issue I've ever had is when writing to or reading from files. You have to specify the encoding.

Jade
--
Jade Rubick
Acting Chief Technology Officer
United eWay
jade....@uwa.unitedway.org
tel (503)285-4963
fax (707)671-1333

www.UNITEDeWAY.org

Rajesh nair

unread,
Apr 9, 2008, 8:54:20 AM4/9/08
to AOLS...@listserv.aol.com
Bas,

Apologies for the delayed response!

Our setup is a complex set of components with a java component inserting records
and tcl based RESTful service fetching the records back.
I have isolated this issue to a tcl script and am sending this out to replicate the issue

1. My MySQL version is 5.1.22
2. create a table
          CREATE TABLE multibytetest (value VARCHAR(255)  CHARACTER SET utf8)
3. Copy the adp in your aolserver installation
   and run it. You will need to make a modification to change the database name

The adp present you with a HTML form to let you set a string which is inserted into the multibytetest table
using SQL
        INSERT INTO multibytetest (value) VALUES (_utf8'$value')
The idea to quote it in _utf8 is from this blog from dossy
I am using the MySQL query browser to see if the  character inserted into db is correct or not. The chinese characters gets inserted fine.

The adp also retrieves the same value back and renders it on the HTML page along with the tcl string which was inserted.
The Content-Type is set to "text/html; charset=utf-8" as well as adp mimetype.

Yet the record fetched back does not get rendered on browser correctly.

Here is a screenshot of a testrun

Screenshot of test run

Any ideas why I am not able to get the multibyte record correctly?

Couple of points :-

1. The records gets rendered fine if the column type is BLOB instead of VARCHAR. But I dont think we should need to convert the datatype of table in order to store multibyte characters .Java is able to insert and fetch the records correctly
2. If the INSERT sql is modifed to simple
     INSERT INTO multibytetest (value) VALUES ('$value')
both 'Inserted value' and 'Value retrieved from database' is rendered correctly but the record inserted in mysql is not correct. This is verified by viewing the record in mysql query browser as well as fetching it from a java program

Hoping to get any pointers here--
-- Rajesh Nair

Darren Ferguson

unread,
Apr 9, 2008, 9:29:47 AM4/9/08
to AOLS...@listserv.aol.com
I had similar issues and in the head component of my adp being generated
i had to add the following:

<meta http-equiv="content-type" content="text/html; charset=utf-8">

Once that was added the pages were rendered correctly.

Respectfully,
Darren Ferguson

Rajesh nair wrote:
> Oops , forgot to attach the adp
>
> -- Rajesh Nair


>
> Rajesh Nair wrote:
>> Bas,
>>
>> Apologies for the delayed response!
>>
>> Our setup is a complex set of components with a java component
>> inserting records
>> and tcl based RESTful service fetching the records back.
>> I have isolated this issue to a tcl script and am sending this out to
>> replicate the issue
>>
>> 1. My MySQL version is 5.1.22
>> 2. create a table

>> /*CREATE TABLE multibytetest (value VARCHAR(255) CHARACTER
>> SET utf8)*/


>> 3. Copy the adp in your aolserver installation
>> and run it. You will need to make a modification to change the
>> database name
>>
>> The adp present you with a HTML form to let you set a string which is
>> inserted into the multibytetest table
>> using SQL

>> /* INSERT INTO multibytetest (value) VALUES (_utf8'$value')*/


>> The idea to quote it in _utf8 is from this blog from dossy

>> <http://dossy.org/archives/000218.html>


>> I am using the MySQL query browser to see if the character inserted
>> into db is correct or not. The chinese characters gets inserted fine.
>>
>> The adp also retrieves the same value back and renders it on the HTML
>> page along with the tcl string which was inserted.
>> The Content-Type is set to "text/html; charset=utf-8" as well as adp
>> mimetype.
>>
>> Yet the record fetched back does not get rendered on browser correctly.
>>
>> Here is a screenshot of a testrun
>>

>> Screenshot of test run
>>
>> Any ideas why I am not able to get the multibyte record correctly?
>>
>> Couple of points :-
>>
>> 1. The records gets rendered fine if the column type is BLOB instead
>> of VARCHAR. But I dont think we should need to convert the datatype
>> of table in order to store multibyte characters .Java is able to
>> insert and fetch the records correctly
>> 2. If the INSERT sql is modifed to simple

>> /* INSERT INTO multibytetest (value) VALUES ('$value')*/

Rajesh nair

unread,
Apr 9, 2008, 11:22:00 AM4/9/08
to AOLS...@listserv.aol.com
Darren,

The Content-Type is set to text/html; charset=utf-8.
Though the way I have done is

ns_set put [ns_conn outputheaders] "Content-Type" "text/html; charset=utf-8"

I tried the meta tag equivalent too but no luck!
 
-- Rajesh Nair

Rajesh nair

unread,
Apr 9, 2008, 2:07:45 PM4/9/08
to AOLS...@listserv.aol.com
Found the problem cause and the solution

Firing the following mysql command lets me get the right result from mysql

  set character_set_results = NULL

That is after firing "use fresh"

   set sql1 "set character_set_results = NULL";
   ns_db exec $db $sql1

This is what mysql JDBC driver was doing to get the correct results.

This lead me to believe there is scope for enhancing nsmysql to set such variables (much like Mysql JDBC driver) to ease the life of developers.

Hope this helps others--
-- Rajesh Nair
Reply all
Reply to author
Forward
0 new messages