Update API (with OAuth) failed on Unicode tweet

424 views
Skip to first unread message

Cmdr J0hn

unread,
Apr 12, 2009, 8:08:33 AM4/12/09
to Twitter Development Talk
Hello, fellow twitters,

My app encountered errors on twitter update API
when I try to send Unicode string.

My app is writtin in Python, I made a slight modification
in a library that is found on http://oauth.net/

My code is not prepared to be opened to public,
but I can say that when I send ASCII string, it works.

It's like this,

when I say "%=" (a percent sign, and an equal), my app try to sign on
a string like this,

POST&http%3A%2F%2Ftwitter.com%2Fstatuses%2Fupdate.xml&
oauth_consumer_key%3D...(omit)...%26status%3D%2525%253D

A request body is like this,

status=%25%3D

And, it works, like this: http://twitter.com/khopkun/status/1502555481

Now, I send a Unicode charactor, "あ "
(not sure displayed on your screen properly, it's Japanese)

Signed on a string:

POST&http%3A%2F%2F...(omit)...%26status%3D%25E3%2581%2582

And a body is:

status=%E3%81%82

(It's utf-8, I guess. 3 bytes needed for one Japanese charactor)

then got an error, "Failed to validate oauth signature or token"
status 401.

I am wondering why ASCII charactors are okay, and Unicode are not.

Any suggestion anyone?

Julio Biason

unread,
Apr 12, 2009, 9:35:56 AM4/12/09
to twitter-deve...@googlegroups.com
2009/4/12 Cmdr J0hn <kazuhir...@gmail.com>:

> Now, I send a Unicode charactor, "あ "
> (not sure displayed on your screen properly, it's Japanese)
[...]

> status=%E3%81%82
>
> (It's utf-8, I guess. 3 bytes needed for one Japanese charactor)

I think you're not encoding this properly. You're sending one
character, so you should send just one code, not three. Sure, Twitter
should not break if you do this but, at the same time, your encoding
is not right.

Looking at your example, it seems you're converting your UTF-8 to a
string of bytes and sending each byte separately, which should not be
the case.

(I have the slight impression that it should be something like
"status=%4054" or some other very right value, but, again, just one
character, not three.)

--
Julio Biason <julio....@gmail.com>
Twitter: http://twitter.com/juliobiason

Julio Biason

unread,
Apr 12, 2009, 9:37:52 AM4/12/09
to twitter-deve...@googlegroups.com
On Sun, Apr 12, 2009 at 11:35 PM, Julio Biason <julio....@gmail.com> wrote:
> (I have the slight impression that it should be something like
> "status=%4054" or some other very right value, but, again, just one
> character, not three.)

Correcting myself:

status=&#12354;

http://www.danshort.com/HTMLentities/index.php?w=hirag

Also... what happens when you don't try to urlencode the text? (e.g.,
send it as UTF-8 anyway)

Cameron Kaiser

unread,
Apr 12, 2009, 9:39:55 AM4/12/09
to twitter-deve...@googlegroups.com
> > (I have the slight impression that it should be something like
> > "status=%4054" or some other very right value, but, again, just one
> > character, not three.)
>
> Correcting myself:
>
> status=&#12354;
>
> http://www.danshort.com/HTMLentities/index.php?w=hirag

NO! The original poster is correct -- you encode the Unicode point as UTF-8,
then send the bytes. From RFC 3986:

When a new URI scheme defines a component that represents textual
data consisting of characters from the Universal Character Set [UCS],
the data should first be encoded as octets according to the UTF-8
character encoding [STD63]; then only those octets that do not
correspond to characters in the unreserved set should be percent-
encoded. For example, the character A would be represented as "A",
the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
as "%C3%80", and the character KATAKANA LETTER A would be represented
as "%E3%82%A2".

--
------------------------------------ personal: http://www.cameronkaiser.com/ --
Cameron Kaiser * Floodgap Systems * www.floodgap.com * cka...@floodgap.com
-- He is rising from affluence to poverty. -- Mark Twain ----------------------

Guan

unread,
Apr 12, 2009, 1:56:26 PM4/12/09
to Twitter Development Talk
On Apr 12, 8:08 am, Cmdr J0hn <kazuhiro.is...@gmail.com> wrote:
> Now, I send a Unicode charactor, "あ "
> (not sure displayed on your screen properly, it's Japanese)
>
> Signed on a string:
>
> POST&http%3A%2F%2F...(omit)...%26status%3D%25E3%2581%2582
>
> And a body is:
>
> status=%E3%81%82

> Any suggestion anyone?

I have exactly the same problem. I have checked with the OAuth signing
guide at http://www.hueniverse.com/hueniverse/2008/10/beginners-gui-1.html,
which even considers the case of non-English parameters that lead to
multibyte characters, and their signature matches mine. I think this
is a bug in the way Twitter verifies signatures when multibyte
characters are present, and I've filed a bug report with them.

Guan

Chen Jie

unread,
Apr 13, 2009, 1:23:03 AM4/13/09
to Twitter Development Talk
I have the sample problem too, can't post update with Chinese..

On Apr 13, 1:56 am, Guan <g...@yang.dk> wrote:
> On Apr 12, 8:08 am, Cmdr J0hn <kazuhiro.is...@gmail.com> wrote:
>
> > Now, I send a Unicode charactor, "あ "
> > (not sure displayed on your screen properly, it's Japanese)
>
> > Signed on a string:
>
> > POST&http%3A%2F%2F...(omit)...%26status%3D%25E3%2581%2582
>
> > And a body is:
>
> > status=%E3%81%82
> > Any suggestion anyone?
>
> I have exactly the same problem. I have checked with the OAuth signing
> guide athttp://www.hueniverse.com/hueniverse/2008/10/beginners-gui-1.html,

Swaroop

unread,
Apr 13, 2009, 2:11:29 AM4/13/09
to Twitter Development Talk

minim...@gmail.com

unread,
Apr 13, 2009, 9:05:36 AM4/13/09
to Twitter Development Talk
Same problem here. I can't post update too, with cyrillic characters.

Matt Sanford

unread,
Apr 13, 2009, 11:31:32 AM4/13/09
to twitter-deve...@googlegroups.com
Hi all,

    Anyone having the problem please add a comment to the Google Code issue [1]. Please include the following if possible:

1. What language, library and version are you using?
  » For Example: Ruby oauth gem v0.2.7, or PHP oauth-php r50

2. What application is this for?

3. This is the hardest one but hopefully a few people can provide it: What was the string passed into the signature method, and what was the resulting signature?
  » For Example: Input was 'POST&http…status=%E3%81%82' (please don't abbreviate it, this is what I'll use to compare) and the signature was '123454tfsdfY346rdfvs'
  » Side note: %E3%81%82 is the correct URL encoding of あ [2], Julio was thinking of HTML encoding.

    We updated our OAuth gem because it incorrectly handled non-ascii characters and either this new version has a bug (possible) or the bug in the old version also exists in other libraries (also possible, since many of these are based on the same example code). At this point I'm trying to figure out which one matches the spec and then we can make it work from there.

Thanks;
  — Matt Sanford

Mario Menti

unread,
Apr 15, 2009, 7:35:31 AM4/15/09
to twitter-deve...@googlegroups.com
This issue [1] is marked fixed, but for some reason I still have problems with some characters:

I have a status update that contains "\xc2\xa0" (which I believe is Unicode representation of & nbsp;), and trying to update the status with this always results in error 401. If I remove the "\xc2\xa0" the  update works fine.

I'm using the Perl Net::OAuth CPAN module.

The status is "laconi.ca - a decentralised twitter: I&#8217;ve just come across identi.ca,\xc2\xa0which on a first look appears to be jus.. ht

... which turns into:



Guan Yang

unread,
Apr 15, 2009, 10:47:58 AM4/15/09
to twitter-deve...@googlegroups.com
On Wed, Apr 15, 2009 at 07:35, Mario Menti <mme...@gmail.com> wrote:
> This issue [1] is marked fixed, but for some reason I still have problems
> with some characters:
> I have a status update that contains "\xc2\xa0" (which I believe is Unicode
> representation of & nbsp;), and trying to update the status with this always
> results in error 401. If I remove the "\xc2\xa0" the  update works fine.
> I'm using the Perl Net::OAuth CPAN module.
> The status is "laconi.ca - a decentralised twitter: I&#8217;ve just come
> across identi.ca,\xc2\xa0which on a first look appears to be jus.. ht
> tp://menti.net/?p=33"

I was able to post this here:

http://twitter.com/guan/status/1525625497

The non-breaking space is right after the colon; try to save the HTML
and check in a hexdump ;-)

Normalized query string:

oauth_consumer_key=rNc2JuVC6NxELft2jXUQ&oauth_nonce=5614691C245EE261FB06ED7C1370974497&oauth_signature_method=HMAC-SHA1&oauth_timestamp=1239806575&oauth_token=6631-AHu8rT9oznR3uUwHF7J99yU14s17D0vxR0OyKdRX54&oauth_version=1.0&status=a%20non-breaking%20space%3A%C2%A0wohoo

Signature base string:

POST&http%3A%2F%2Ftwitter.com%2Fstatuses%2Fupdate.json&oauth_consumer_key%3DrNc2JuVC6NxELft2jXUQ%26oauth_nonce%3D5614691C245EE261FB06ED7C1370974497%26oauth_signature_method%3DHMAC-SHA1%26oauth_timestamp%3D1239806575%26oauth_token%3D6631-AHu8rT9oznR3uUwHF7J99yU14s17D0vxR0OyKdRX54%26oauth_version%3D1.0%26status%3Da%2520non-breaking%2520space%253A%25C2%25A0wohoo

Guan

Mario Menti

unread,
Apr 15, 2009, 1:11:24 PM4/15/09
to twitter-deve...@googlegroups.com
On Wed, Apr 15, 2009 at 3:47 PM, Guan Yang <gu...@yang.dk> wrote:

I was able to post this here:

http://twitter.com/guan/status/1525625497

The non-breaking space is right after the colon; try to save the HTML
and check in a hexdump ;-)

Normalized query string:

oauth_consumer_key=rNc2JuVC6NxELft2jXUQ&oauth_nonce=5614691C245EE261FB06ED7C1370974497&oauth_signature_method=HMAC-SHA1&oauth_timestamp=1239806575&oauth_token=6631-AHu8rT9oznR3uUwHF7J99yU14s17D0vxR0OyKdRX54&oauth_version=1.0&status=a%20non-breaking%20space%3A%C2%A0wohoo

Signature base string:

POST&http%3A%2F%2Ftwitter.com%2Fstatuses%2Fupdate.json&oauth_consumer_key%3DrNc2JuVC6NxELft2jXUQ%26oauth_nonce%3D5614691C245EE261FB06ED7C1370974497%26oauth_signature_method%3DHMAC-SHA1%26oauth_timestamp%3D1239806575%26oauth_token%3D6631-AHu8rT9oznR3uUwHF7J99yU14s17D0vxR0OyKdRX54%26oauth_version%3D1.0%26status%3Da%2520non-breaking%2520space%253A%25C2%25A0wohoo

Guan


Thanks Guan - perhaps it's an issue with the signature base string not being encoded correctly at my end... let me dig into Net::OAuth a little more and see what I find.

Mario.

Mario Menti

unread,
Apr 16, 2009, 11:18:56 AM4/16/09
to twitter-deve...@googlegroups.com
On Wed, Apr 15, 2009 at 6:11 PM, Mario Menti <mme...@gmail.com> wrote:

Thanks Guan - perhaps it's an issue with the signature base string not being encoded correctly at my end... let me dig into Net::OAuth a little more and see what I find.


Quick update: yes, the issue in Net::OAuth was actually identical to the issue in the oauth gem reported in the original bug report [1]. I've changed the regexp used with uri_decode in the Perl Net::OAuth module, and now Unicode status updates appear to work fine.


alon

unread,
Apr 24, 2009, 5:49:31 AM4/24/09
to Twitter Development Talk
can someone assist with the php library? what todo?

Matt Sanford

unread,
Apr 24, 2009, 1:32:19 PM4/24/09
to twitter-deve...@googlegroups.com
Hi Alon,

    The main issue we've seen with extended UTF-8 is incorrect URL encoding of the values. We discussed this in depth in issue 433 [1], which I see you commented on. Without a little more information I can't really help. The information that would be most helpful is:

1. You mentioned using PHP, which PHP library are you using and what version?
  » Version is important here so I can check out the code.

2. The signature base string (see issue 433 [1] for examples) is a great indicator. I don't know the PHP libraries but I'm guessing there will be a signature method that takes a string like this. Add some log statements and capture that value.
  » Check out issue 433 [1] for examples of what they look like.

Thanks;
  – Matt Sanford / @mzsanford
      Twitter API Developer

Reply all
Reply to author
Forward
0 new messages