cometd and utf-8 characters

23 views
Skip to first unread message

Prutkar

unread,
Nov 4, 2009, 12:51:50 PM11/4/09
to cometd-users
Hello,

I have run into a problem I did not find a solution to yet, so I would
like to turn to this group, maybe somebody knows how to proceed.

My page is utf-8 charset:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://
www.w3.org/TR/xhtml1/DTD/xhtml-transitional.dtd">
<html>
<head>
<title>Test</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
...

I use default chat demo under jetty, when I submit a message
"Hääppöne" all works just fine, i.e.

1. ===============
var message = dojo.byId('phrase').value;
dojox.cometd.publish("/chat/room', {
text: message;
};

variable 'message' is filled from <input id="phrase" name="phrase">
field containing value "Hääppöne".


Observing transport in FireBug, I can see "Hääppöne" being sent, my
jetty server code of chat receives this string as "Hääppöne" when I
print it out using Log.info("Text: " + data.get("text").toString()); ,
if I write it to mysql database (utf-8 charset based)I can see it in
database table as "Hääppöne".

2. ===============
But, if I have user with username "Hääppöne" and he wants to subscribe/
publish to a channel using

var username = "Hääppöne"; //dynamically generated by php code -> echo
$username; -> $username is filled from another form where user eneters
username (all utf-8)
dojox.cometd.subscribe("/chat/room', chat, "_chat", {
user: username
}
dojox.cometd.publish("/chat/room', {
join: true,
user: username
};
I can see already in transport in FireBug some strange chars in places
of ä and ö, my server side printout prints "H��ppďż˝ne" and write
to the database creates "H??pp?ne"

3. ===============
I have tried to adjust number 2 with no success liek this:
user: encodeURI(username)


The difference is
var message = dojo.byId('phrase').value;
versus
var username = "Hääppöne";


Is anybody able to advice on this? Thank you.

William la Forge

unread,
Nov 5, 2009, 3:42:23 AM11/5/09
to cometd...@googlegroups.com
I've got UTF8 working great, but it was an effort.

Cometd was easy--it just works. Forms are a lot trickier. meta is one of only several places that need updating, and in fact may not even be helpful--we haven't seen a need for it yet.

Now we're using jsp, so this line you will need to translate into html:
<%@ page language="java" contentType="text/html; charset=UTF-8"
    pageEncoding="UTF-8"%>

In our servlet we have this:
req.setCharacterEncoding("UTF-8");
resp.setCharacterEncoding("UTF-8");
which for us was the final addition which made it all work.

Our test data is Arabic. We are doing some impossible things with internationalization, (utf-8 property files) but the format method works fine and we only needed to do a small fix (which can never work--but does) when using the localize method. :-)

We've also managed rtl. Haven't gotten to date/time yet but looking at this: http://joda-time.sourceforge.net/

You'll find all our code (LGPL) here: http://agilewiki.svn.sourceforge.net/viewvc/agilewiki/

Bill

Simone Bordet

unread,
Nov 5, 2009, 3:54:49 AM11/5/09
to cometd...@googlegroups.com
Hi,

On Thu, Nov 5, 2009 at 09:42, William la Forge <lafo...@gmail.com> wrote:
> I've got UTF8 working great, but it was an effort.
> Cometd was easy--it just works. Forms are a lot trickier. meta is one of
> only several places that need updating, and in fact may not even be
> helpful--we haven't seen a need for it yet.
> Now we're using jsp, so this line you will need to translate into html:
> <%@ page language="java" contentType="text/html; charset=UTF-8"
>     pageEncoding="UTF-8"%>

Just as an addendum, see here:
http://bordet.blogspot.com/2007/09/jsp-page-encoding.html

> In our servlet we have this:
> req.setCharacterEncoding("UTF-8");
> resp.setCharacterEncoding("UTF-8");
> which for us was the final addition which made it all work.
> Our test data is Arabic. We are doing some impossible things with
> internationalization, (utf-8 property files)

Again for the record:
http://bordet.blogspot.com/2007/01/utf-8-handling-for-resourcebundle-and.html

Simon
--
http://bordet.blogspot.com
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless. Victoria Livschitz

Simone Bordet

unread,
Nov 5, 2009, 4:00:50 AM11/5/09
to cometd...@googlegroups.com
Hi,

On Wed, Nov 4, 2009 at 18:51, Prutkar <peter...@gmail.com> wrote:
> 2. ===============
> But, if I have user with username "Hääppöne" and he wants to subscribe/
> publish to a channel using
>
> var username = "Hääppöne"; //dynamically generated by php code -> echo
> $username; -> $username is filled from another form where user eneters
> username (all utf-8)
> dojox.cometd.subscribe("/chat/room', chat, "_chat", {
> user: username
> }
> dojox.cometd.publish("/chat/room', {
> join: true,
> user: username
> };
> I can see already in transport in FireBug some strange chars in places
> of ä and ö, my server side printout prints "H��ppďż˝ne" and write
> to the database creates "H??pp?ne"

So if I understand correctly, you generate username on server side,
you send it to the client, and then send it back via Cometd, right ?
If that's correct, then I think that the page is sent to the client
with the wrong encoding.

Even if you put in the page the meta tag with charset=utf-8, it does
not mean that the page is sent using utf-8 encoding.
I'd check that first.

> 3. ===============
> I have tried to adjust number 2 with no success liek this:
> user: encodeURI(username)

encodeURI does a different thing, it is of no use here.

Prutkar

unread,
Nov 5, 2009, 11:43:51 AM11/5/09
to cometd-users
I found solution, php generated page needs to call utf8_encode
function like:

var username = "<?php echo utf8_encode($username);?>";
dojox.cometd.subscribe("/chat/room', chat, "_chat", {
user: username
}

Now it works. Thank you guys for your commentsand time.

Peter
Reply all
Reply to author
Forward
0 new messages