1. The API docs talk about a maximum of 160 (recommended 140)
*characters*, but as far as my tests go, twit seems to count 140
*bytes* - which of course are not the same thing in UTF-8; when I
use accented, typografical or Japanese characters the "recent" page
truncate updates at far less than 140 characters
If this is intended, I think the documentation should be clearer.
2. A pair of the characters are converted to HTML entities: '<' and
'>'. What's weird is that the counting algorithm counts the entity
size, not the character size, so that '<>' is an update with length
8 ("<>"). Curiously, '&' is not converted to "&".
So to count characters in twit.el I'm doing this: 1) expand '<' and
'>' and 2) count the byte size of the resulting string, encoded as
UTF-8. Is that correct? Sounds hackish to me, but as far as I tested,
it gives me exact results in Twitter.com.
I'm not familiar with the SMS standard, but it's *very* important to
clarify whether that's 140 characters, or 140 bytes in a given
encoding. Characters in UTF-8 may take up to 4 bytes each. 140
characters of UTF-8 Japanese takes much more than 140 bytes.
There's also the issue of HTML entities - '<' and '>' taking 4 bytes
each - which seems like a bug to me.
Compare e.g. http://twitter.com/eru/statuses/312496202 to
http://twitter.com/eru/statuses/312496292 ; both have exactly 140
characters, but only the later was truncated at the Twitter "recent"
page (and only the later gave a "message too long" warning in the IM
interface). For entities, see
http://twitter.com/eru/statuses/320385382 ; only 47 characters long,
but was truncated.
This is an excellent point to clarify. GSM SMS is 7-bit only, so as far
as I'm aware, it should be 140 bytes, not characters.
I agree that HTML escaped entities should not count as any more than 1 byte.
--
------------------------------------ personal: http://www.cameronkaiser.com/ --
Cameron Kaiser * Floodgap Systems * www.floodgap.com * cka...@floodgap.com
-- there's a dance or two in the old dame yet. -- mehitabel -------------------
No, it doesn't. It tells that to me when I try to send less than 140
characters, and over 140 *bytes* - contrariwise to what's plainly
stated in the API documentation. Which means that either the
documentation is wrong, or the implementation is defective. I just
want to know which one.
And also there's the issue with HTML entities. Is it a bug, or
undocumented behavior?
We have to encode HTML entities to prevent XSS attacks. Sorry about
the lost characters.
On 10/8/07, leoboiko <leob...@gmail.com> wrote:
>
--
Alex Payne
http://twitter.com/al3x
You're right, anyone twittering in any language other than English
(spanish in my case) bumps into this behaviour fast.
--
Manuel, que
piensa que eres una excelente persona y medra en torno a
http://simplelogica.net y/o http://simplelogica.net/logicola/
Recuerda comer mucha fruta y verdura.