UTF encodings 16 or 8?

31 views
Skip to first unread message

bugz...@gmail.com

unread,
May 21, 2009, 3:35:56 PM5/21/09
to TextMagic SMS Gateway API
Hello

I have doubts in the following issue:
what encoding should be used in case of non-english alphabet being
used.

http://api.textmagic.com/https-api - it is said here that UTF-8 should
be used, and the examples use utf-8

http://api.textmagic.com/https-api/supported-character-sets - it is
UTF-16 mentioned here.

I tried utf-16
username=xxxxx&password=yyyy&cmd=send&text=%FE%FF%04%21%04%32%04%3E
%04%31%04%3E%04%34%04%3D%04%30%04%4F+%FE%FF%04%4D%04%3D
%04%46%04%38%04%3A%04%3B%04%3E%04%3F
%04%35%04%34%04%38%04%4F&phone=9991234567&max_length=3&unicode=1

and got following response
{"message_id":{"8658465":"9991234567"},"sent_text":"","parts_count":1}

This is wrong response, isn't it?

Dawie Strauss

unread,
May 22, 2009, 5:17:56 AM5/22/09
to textma...@googlegroups.com
Hi Rafael,

I found your name in the oDesk "Team Room". I am working on the Python API wrapper. Pleased to meet you :-)

The following request works:
https://www.textmagic.com/app/api?cmd=send&username=xxx&password=yyy&text=%04%21%04%32%04%3E+%04%31%04%3E%04%34%04%3D%04%30%04%4F+%04%4D%04%3D+%04%46%04%38%04%3A%04%3B%04%3E%04%3F+%04%35%04%34%04%38%04%4F&phone=9991357903&unicode=1

It returns:
{"message_id":{"8660384":"9991357903"},"sent_text":"\u0004!\u00042\u0004> \u00041\u0004>\u00044\u0004=\u00040\u0004O \u0004M\u0004= \u0004F\u00048\u0004:\u0004;\u0004>\u0004? \u00045\u00044\u00048\u0004O","parts_count":1}

The problem is that the %FE%FF sequence must not be submitted as part of the unicode string. This unicode character (U+FEFF) is called the byte-order mark or BOM. You can read more about it at http://en.wikipedia.org/wiki/Byte-order_mark

In short the BOM is used at the start of a unicode text file to (a) indicate that it is a unicode file and (b) to indicate the byte-ordering of the file.

Dawie Strauss
Reply all
Reply to author
Forward
0 new messages