We have an existing Tcl service which provides the data from mysql to
clients as an HTML table.
We have some records in mysql with multibye characters which are not
being rendered correctly.
Simple ns_db getrow does not return me the correctly encoded data form
database.
I have the sample code snippet to replicate this
set db [ns_db gethandle]
set sql1 "use mydb";
ns_db exec $db $sql1;
set row [ns_db select $db "select title from channel"]
set numcols [ns_set size $row]
while {[ns_db getrow $db $row]} {
for {set i 0} {$i < $numcols} {incr i} {
ns_puts " :[ns_set value $row $i]"
}
ns_puts "<br>"
}
I understand that I need to specify the encoding to get the correct tcl
strings in ns_set but cannot find the way to do so. Any pointers?
Thanks in advance--
-- Rajesh Nair
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to <list...@listserv.aol.com> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
How did you determine the fact that the value isn't a correct Tcl
string?
What encoding is the database in? (UTF-8?) Are you 100% sure the data
in the database is actually correct?
I wouldn't think you would need to convert anything. Tcl uses UTF-8
internally. A string from the database should be nothing more than an
array of bytes and when a Tcl string is created, it is assumed these
bytes are UTF-8.
Did you specify the correct encoding in either the server headers
(preferred) or html tags? Unless you do that, the browser won't know
what character set it should use.
Regards,
Bas.
Tcl is essentially UTF-8, which is multi-byte. But there are so many issues
involved, you have to become something of an expert. The good news is that
AOLserver/Tcl has the tools to handle the issues.
tom jackson
If you have a green-fields project with no existing database, all you
do is:
- Make sure the database is UTF-8
- Set the encoding to UTF-8 for any page returned to the client. (if
you have a form on a page and the page was set to UTF-8, the data is
submitted as UTF-8 by the browser, so no conversion needed by you)
- Make sure any files (ADP, resource files with messages, etc) are
saved as UTF-8 if they contain such data.
It then basically takes care of itself.
The only issues I ever faced was (CSV) file uploads, where the data
needed to be extracted and put into the database. This could contain
any encoding without me knowing. In practice it only ever contained
stupid Windows encoding, so I assumed that to be the case and used
Tcl's convert functions.
Bas.
Hmmmm CSV + "stupid Windows encoding". Bas perhaps you have just what
I need for a character set issue. I have a data file - actually
delimited by upsidedown exclamation points, not commas. It comes from
a Windows box - apparently with the Windows 1252 character set. I am
trying to load that data into Oracle. I was trying to use SQLLDR to
do that but am having debugging issues. I *think* I have the correct
character set info and octal representation for ¡. But something is
funky.
It never occurred to me to try parsing this with Tcl instead. Is
there an AOLserver or straight Tcl module I should be using to parse
pseudo-CSV? Or is the answer keep it simple and just read lines and
split on ¡ with 'split'?
> It never occurred to me to try parsing this with Tcl instead. Is there an
> AOLserver or straight Tcl module I should be using to parse pseudo-CSV? Or
> is the answer keep it simple and just read lines and split on ، with
> 'split'?
tcllib (http://tcllib.sf.net) has a fairly robust csv package
(http://tcllib.sourceforge.net/doc/csv.html) if a straight [split
$line ،] proves lacking.
Michael
To make absolutely, 100% sure you have the correct character to
separate on, you could edit the file to just contain that one and read
it in.
Bas.
<meta http-equiv="content-type" content="text/html; charset=utf-8">
Once that was added the pages were rendered correctly.
Respectfully,
Darren Ferguson
Rajesh nair wrote:
> Oops , forgot to attach the adp
>
> -- Rajesh Nair
>
> Rajesh Nair wrote:
>> Bas,
>>
>> Apologies for the delayed response!
>>
>> Our setup is a complex set of components with a java component
>> inserting records
>> and tcl based RESTful service fetching the records back.
>> I have isolated this issue to a tcl script and am sending this out to
>> replicate the issue
>>
>> 1. My MySQL version is 5.1.22
>> 2. create a table
>> /*CREATE TABLE multibytetest (value VARCHAR(255) CHARACTER
>> SET utf8)*/
>> 3. Copy the adp in your aolserver installation
>> and run it. You will need to make a modification to change the
>> database name
>>
>> The adp present you with a HTML form to let you set a string which is
>> inserted into the multibytetest table
>> using SQL
>> /* INSERT INTO multibytetest (value) VALUES (_utf8'$value')*/
>> The idea to quote it in _utf8 is from this blog from dossy
>> <http://dossy.org/archives/000218.html>
>> I am using the MySQL query browser to see if the character inserted
>> into db is correct or not. The chinese characters gets inserted fine.
>>
>> The adp also retrieves the same value back and renders it on the HTML
>> page along with the tcl string which was inserted.
>> The Content-Type is set to "text/html; charset=utf-8" as well as adp
>> mimetype.
>>
>> Yet the record fetched back does not get rendered on browser correctly.
>>
>> Here is a screenshot of a testrun
>>
>> Screenshot of test run
>>
>> Any ideas why I am not able to get the multibyte record correctly?
>>
>> Couple of points :-
>>
>> 1. The records gets rendered fine if the column type is BLOB instead
>> of VARCHAR. But I dont think we should need to convert the datatype
>> of table in order to store multibyte characters .Java is able to
>> insert and fetch the records correctly
>> 2. If the INSERT sql is modifed to simple
>> /* INSERT INTO multibytetest (value) VALUES ('$value')*/