non-ASCII is NOT supported?

50 views
Skip to first unread message

pantechds806

unread,
Sep 14, 2010, 10:59:26 PM9/14/10
to android-c2dm
Hi,

I'm testing to send C2DM push message for messaging service app.

In the case for sending "ASCII" characters,
it works well. My phone receives the string that I sent on my server.

But, in the case for sending NON-ascii characters(ex. Korean,
Chinese, ..)
it does NOT works well. My phone receives a imperfect string.

For example:
sent ... abc("korean")def("chinese")ghi
received ... abcdefghi

I don't know what component omit the NON-ASCII characters.

If you know about this, give me some hint. ;)

Costin Manolache

unread,
Sep 15, 2010, 12:19:36 AM9/15/10
to androi...@googlegroups.com
Are you sending UTF8 ? Can you paste the string ? 

Costin

pantechds806

unread,
Sep 15, 2010, 1:54:55 AM9/15/10
to android-c2dm
Thanks for your attention.

I saved a file in my server at the time that making c2dm message.

000001f0h: 72 00 65 00 67 00 69 00 73 00 ;
r.e.g.i.s.
00000200h: 74 00 72 00 61 00 74 00 69 00 6F 00 6E 00 5F 00 ;
t.r.a.t.i.o.n._.
00000210h: 69 00 64 00 3D 00 41 00 50 00 41 00 39 00 31 00 ;
i.d.=.A.P.A.9.1.
00000220h: 62 00 45 00 4B 00 6E 00 62 00 46 00 49 00 47 00 ;
b.E.K.n.b.F.I.G.
00000230h: 6C 00 6E 00 58 00 38 00 44 00 48 00 78 00 41 00 ; l.n.X.
8.D.H.x.A.
00000240h: 2D 00 6E 00 79 00 42 00 74 00 62 00 39 00 69 00 ;
-.n.y.B.t.b.9.i.
00000250h: 59 00 6B 00 63 00 78 00 51 00 6C 00 5A 00 34 00 ;
Y.k.c.x.Q.l.Z.4.
00000260h: 6B 00 36 00 79 00 76 00 38 00 74 00 39 00 4A 00 ; k.6.y.v.
8.t.9.J.
00000270h: 75 00 45 00 48 00 4D 00 45 00 45 00 33 00 51 00 ;
u.E.H.M.E.E.3.Q.
00000280h: 39 00 6B 00 4C 00 6F 00 76 00 58 00 67 00 37 00 ;
9.k.L.o.v.X.g.7.
00000290h: 5F 00 4D 00 5F 00 38 00 47 00 65 00 4F 00 44 00 ; _.M._.
8.G.e.O.D.
000002a0h: 70 00 32 00 4C 00 48 00 57 00 66 00 73 00 68 00 ; p.
2.L.H.W.f.s.h.
000002b0h: 78 00 35 00 45 00 51 00 67 00 59 00 62 00 67 00 ; x.
5.E.Q.g.Y.b.g.
000002c0h: 79 00 5A 00 44 00 41 00 63 00 41 00 4A 00 4F 00 ;
y.Z.D.A.c.A.J.O.
000002d0h: 30 00 56 00 43 00 31 00 6B 00 56 00 75 00 63 00 ; 0.V.C.
1.k.V.u.c.
000002e0h: 73 00 5F 00 38 00 54 00 68 00 69 00 4D 00 54 00 ; s._.
8.T.h.i.M.T.
000002f0h: 42 00 36 00 72 00 2D 00 47 00 51 00 59 00 48 00 ; B.
6.r.-.G.Q.Y.H.
00000300h: 70 00 6B 00 26 00 63 00 6F 00 6C 00 6C 00 61 00 ;
p.k.&.c.o.l.l.a.
00000310h: 70 00 73 00 65 00 5F 00 6B 00 65 00 79 00 3D 00 ;
p.s.e._.k.e.y.=.
00000320h: 31 00 32 00 38 00 34 00 35 00 32 00 38 00 37 00 ;
1.2.8.4.5.2.8.7.
00000330h: 30 00 36 00 2E 00 39 00 38 00 31 00 36 00 26 00 ;
0.6...9.8.1.6.&.
00000340h: 64 00 61 00 74 00 61 00 2E 00 70 00 6B 00 67 00 ;
d.a.t.a...p.k.g.
00000350h: 3D 00 6D 00 75 00 6C 00 74 00 69 00 6D 00 73 00 ;
=.m.u.l.t.i.m.s.
00000360h: 67 00 26 00 64 00 61 00 74 00 61 00 2E 00 73 00 ;
g.&.d.a.t.a...s.
00000370h: 74 00 72 00 3D 00 61 00 62 00 63 00 5C D5 00 AE ;
t.r.=.a.b.c.\??
00000380h: 64 00 65 00 66 00 26 00 64 00 61 00 74 00 61 00 ;
d.e.f.&.d.a.t.a.
00000390h: 2E 00 64 00 62 00 6E 00 61 00 6D 00 65 00 3D
00 ; ..d.b.n.a.m.e.=.
000003a0h: 6D 00 79 00 64 00 62 00 26 00 64 00 61 00 74 00 ;
m.y.d.b.&.d.a.t.
000003b0h: 61 00 2E 00 6D 00 64 00 64 00 3D 00 6D 00 79 00 ;
a...m.d.d.=.m.y.
000003c0h: 68 00 6F 00 6D 00 65 00 26 00 64 00 61 00 74 00 ;
h.o.m.e.&.d.a.t.
000003d0h: 61 00 2E 00 6D 00 64 00 64 00 6F 00 70 00 74 00 ;
a...m.d.d.o.p.t.
000003e0h: 69 00 6F 00 6E 00 3D 00 79 00 65 00 73 00 ;
i.o.n.=.y.e.s.

Take a look at the line:
00000370h: 74 00 72 00 3D 00 61 00 62 00 63 00 5C D5 00 AE ;
t.r.=.a.b.c.\??
00000380h: 64 00 65 00 66 00 26 00 64 00 61 00 74 00 61 00 ;
d.e.f.&.d.a.t.a.

The message I sent is "abc(korean word)def".
"61 00 62 00 63 00 5C D5 00 AE 64 00 65 00 66 00" is the expression in
byte code.

You said that 'Was it UTF8?'.
'5C D5 00 AE' is NOT UTF8. It is unicode.


So, I changed my server. Take a look below.

00000000h: FF FE 72 00 65 00 67 00 69 00 73 00 74 00 72 00 ;
?.e.g.i.s.t.r.
00000010h: 61 00 74 00 69 00 6F 00 6E 00 5F 00 69 00 64 00 ;
a.t.i.o.n._.i.d.
00000020h: 3D 00 41 00 50 00 41 00 39 00 31 00 62 00 45 00 ; =.A.P.A.
9.1.b.E.
00000030h: 4B 00 6E 00 62 00 46 00 49 00 47 00 6C 00 6E 00 ;
K.n.b.F.I.G.l.n.
00000040h: 58 00 38 00 44 00 48 00 78 00 41 00 2D 00 6E 00 ; X.
8.D.H.x.A.-.n.
00000050h: 79 00 42 00 74 00 62 00 39 00 69 00 59 00 6B 00 ; y.B.t.b.
9.i.Y.k.
00000060h: 63 00 78 00 51 00 6C 00 5A 00 34 00 6B 00 36 00 ; c.x.Q.l.Z.
4.k.6.
00000070h: 79 00 76 00 38 00 74 00 39 00 4A 00 75 00 45 00 ; y.v.8.t.
9.J.u.E.
00000080h: 48 00 4D 00 45 00 45 00 33 00 51 00 39 00 6B 00 ; H.M.E.E.
3.Q.9.k.
00000090h: 4C 00 6F 00 76 00 58 00 67 00 37 00 5F 00 4D 00 ; L.o.v.X.g.
7._.M.
000000a0h: 5F 00 38 00 47 00 65 00 4F 00 44 00 70 00 32 00 ; _.
8.G.e.O.D.p.2.
000000b0h: 4C 00 48 00 57 00 66 00 73 00 68 00 78 00 35 00 ;
L.H.W.f.s.h.x.5.
000000c0h: 45 00 51 00 67 00 59 00 62 00 67 00 79 00 5A 00 ;
E.Q.g.Y.b.g.y.Z.
000000d0h: 44 00 41 00 63 00 41 00 4A 00 4F 00 30 00 56 00 ;
D.A.c.A.J.O.0.V.
000000e0h: 43 00 31 00 6B 00 56 00 75 00 63 00 73 00 5F 00 ; C.
1.k.V.u.c.s._.
000000f0h: 38 00 54 00 68 00 69 00 4D 00 54 00 42 00 36 00 ;
8.T.h.i.M.T.B.6.
00000100h: 72 00 2D 00 47 00 51 00 59 00 48 00 70 00 6B 00 ;
r.-.G.Q.Y.H.p.k.
00000110h: 26 00 63 00 6F 00 6C 00 6C 00 61 00 70 00 73 00 ;
&.c.o.l.l.a.p.s.
00000120h: 65 00 5F 00 6B 00 65 00 79 00 3D 00 31 00 32 00 ;
e._.k.e.y.=.1.2.
00000130h: 38 00 34 00 35 00 32 00 38 00 32 00 36 00 36 00 ;
8.4.5.2.8.2.6.6.
00000140h: 2E 00 32 00 39 00 31 00 38 00 26 00 64 00 61 00 ; ..
2.9.1.8.&.d.a.
00000150h: 74 00 61 00 2E 00 70 00 6B 00 67 00 3D 00 6D 00 ;
t.a...p.k.g.=.m.
00000160h: 75 00 6C 00 74 00 69 00 6D 00 73 00 67 00 26 00 ;
u.l.t.i.m.s.g.&.
00000170h: 64 00 61 00 74 00 61 00 2E 00 73 00 74 00 72 00 ;
d.a.t.a...s.t.r.
00000180h: 3D 00 61 62 63 ED 95 9C EA B8 80 64 65 66 26 00 ; =.abc?쒓?
def&.
00000190h: 64 00 61 00 74 00 61 00 2E 00 64 00 62 00 6E 00 ;
d.a.t.a...d.b.n.
000001a0h: 61 00 6D 00 65 00 3D 00 6D 00 79 00 64 00 62 00 ;
a.m.e.=.m.y.d.b.
000001b0h: 26 00 64 00 61 00 74 00 61 00 2E 00 6D 00 64 00 ;
&.d.a.t.a...m.d.
000001c0h: 64 00 3D 00 6D 00 79 00 68 00 6F 00 6D 00 65 00 ;
d.=.m.y.h.o.m.e.
000001d0h: 26 00 64 00 61 00 74 00 61 00 2E 00 6D 00 64 00 ;
&.d.a.t.a...m.d.
000001e0h: 64 00 6F 00 70 00 74 00 69 00 6F 00 6E 00 3D 00 ;
d.o.p.t.i.o.n.=.
000001f0h: 79 00 65 00 73 00 ; y.e.s.

Take a look at the line:
00000180h: 3D 00 61 62 63 ED 95 9C EA B8 80 64 65 66 26 00 ; =.abc?쒓?
def&.

The message I sent is "abc(korean word)def", too.
"61 62 63 ED 95 9C EA B8 80 64 65 66" is the expression in byte code
and
'ED 95 9C EA B8 80' IS UTF8.

But after changing my server, I can NOT receive even 'abcdef'. ;(
(In both case, i.e., Before change server and After change server,
I can extract 'pkg' value from getExtras/get("pkg") CORRECTLY.)

What shall I do?

pantechds806

unread,
Sep 15, 2010, 2:12:18 AM9/15/10
to android-c2dm
Sorry, I quoted the log file INCORRECTLY.

If you wait a moment, I will write a "correct" reply.

pantechds806

unread,
Sep 15, 2010, 3:02:37 AM9/15/10
to android-c2dm
I got the logging file again. Take a look below:


Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000 72 65 67 69 73 74 72 61 74 69 6F 6E 5F 69 64 3D
registration_id=
00000010 41 50 41 39 31 62 45 4B 6E 62 46 49 47 6C 6E 58
APA91bEKnbFIGlnX
00000020 38 44 48 78 41 2D 6E 79 42 74 62 39 69 59 6B 63 8DHxA-
nyBtb9iYkc
00000030 78 51 6C 5A 34 6B 36 79 76 38 74 39 4A 75 45 48
xQlZ4k6yv8t9JuEH
00000040 4D 45 45 33 51 39 6B 4C 6F 76 58 67 37 5F 4D 5F
MEE3Q9kLovXg7_M_
00000050 38 47 65 4F 44 70 32 4C 48 57 66 73 68 78 35 45
8GeODp2LHWfshx5E
00000060 51 67 59 62 67 79 5A 44 41 63 41 4A 4F 30 56 43
QgYbgyZDAcAJO0VC
00000070 31 6B 56 75 63 73 5F 38 54 68 69 4D 54 42 36 72
1kVucs_8ThiMTB6r
00000080 2D 47 51 59 48 70 6B 26 63 6F 6C 6C 61 70 73 65 -
GQYHpk&collapse
00000090 5F 6B 65 79 3D 31 32 38 34 35 33 32 37 31 36 2E
_key=1284532716.
000000A0 33 35 39 38 26 64 61 74 61 2E 70 6B 67 3D 6D 75
3598&data.pkg=mu
000000B0 6C 74 69 71 71 71 26 64 61 74 61 2E 73 74 72 3D
ltiqqq&data.str=
000000C0 61 62 63 ED 95 9C EA B8 80 64 65 66 26 64 61 74 abci?œe¸
€def&dat
000000D0 61 2E 64 62 6E 61 6D 65 3D 61 61 26 64 61 74 61
a.dbname=aa&data
000000E0 2E 6D 64 64 3D 61 61 26 64 61 74 61 2E 6D 64
64 .mdd=aa&data.mdd
000000F0 6F 70 74 69 6F 6E 3D 79 65 73 option=yes


Take a look at the line:
000000C0 61 62 63 ED 95 9C EA B8 80 64 65 66 26 64 61 74 abci?œe¸
€def&dat

The message I sent is "abc(korean word)def".
"61 62 63 ED 95 9C EA B8 80 64 65 66" is the expression in byte code.
You said that 'Was it UTF8?'.
'ED 95 9C EA B8 80' is the UTF8 format.

I can extract 'pkg' value from getExtras/get("pkg") CORRECTLY,
and I can extract 'dbname' value from getExtras/get("dbname")
CORRECTLY, too.
(same to 'mdd', 'mddoption') Only 'str' value is extracted
incorrectly.


I wish to get some hint from you. ;)

Costin Manolache

unread,
Sep 15, 2010, 3:51:56 PM9/15/10
to androi...@googlegroups.com
Can you try again with Content-Type to "application/x-www-form-urlencoded;charset=UTF8" ?

I'll file a bug for tracking.

Costin


pantechds806

unread,
Sep 15, 2010, 8:31:13 PM9/15/10
to android-c2dm
Thanks for your reply.

According your comment, I tried again.
But, the result is same.
(i.e.,
sent ... abc("korean")def
received ... abcdef)


Below, I show my code in the server.

......
$push_post_data = "registration_id=$got_regid&collapse_key=
$push_collapse&data.pkg=$got_pkg&data.str=$got_str&data.dbname=
$got_dbname&data.mdd=$got_mdd&data.mddoption=$got_mddoption";
$push_setheader = array("Content-type: application/x-www-form-
urlencoded;charset=UTF-8", "Content-Length: $push_post_length",
"Authorization:GoogleLogin auth=$got_auth");

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $push_url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, $push_setheader);
curl_setopt($ch, CURLOPT_POSTFIELDS, $push_post_data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$data = curl_exec($ch);
...


I hope finding a solution as soon as possible. ;)

pantechds806

unread,
Sep 16, 2010, 1:56:39 AM9/16/10
to android-c2dm
Thx, now it works correctly.

In the previous trial, I SIMPLY added the http header(~x-www-form-
urlencoded~).
I did NOT ENCODE the message that I want send to C2DM server. ;)

After use the function 'urlencode()' in php,
the push message body(?) is composed like this:

Offset(d) 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
...
00000176 6E 64 79 6D 61 6E 26 64 61 74 61 2E 73 74 72 3D
ndyman&data.str=
00000192 61 62 63 25 45 42 25 41 44 25 39 30 2B 25 45 43 abc%EB%AD
%90+%EC
00000208 25 39 44 25 42 34 25 45 42 25 39 45 25 39 38 25 %9D%B4%EB%9E
%98%
00000224 33 46 64 65 66 26 64 61 74 61 2E 64 62 6E 61 6D
3Fdef&data.dbnam
...

And this works correctly.

Now, I have new question;
In this way, push message body will be WASTE severely.
For example, a syllable of korean word consists of 3 byte in UTF-8
form. After urlencode, these will be 9 byte. In other words, just 3-
syllable korean word(in unicode system, it consists of 6 byte) will
occupy "27byte" in C2DM push message body.

As it is known, C2DM message body length limit is 1024byte. So, this
way is serious waste.

Could I be provided with an efficient way?
Reply all
Reply to author
Forward
0 new messages