Japanese Characters in Image URLs

1 view
Skip to first unread message

Kyle Mulka

unread,
Oct 12, 2009, 3:00:06 AM10/12/09
to Twitter Development Talk
This might be more of a PHP and/or curl question than a Twitter API
question, but I figured I would ask here first because Twitter API
developers have to deal with non-ASCII characters in image URLs
because Twitter doesn't change the name the user gave their image file
to something cleaner.

The PHP code below is giving me the standard Amazon S3 access denied
error message, but if I copy the URL of the image and paste it into my
browser, that doesn't happen. What do I need to do to get this to
work?

$ch = curl_init('http://twitter.com/users/show.json?
screen_name=rennri');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$json = curl_exec($ch);
curl_close($ch);

$data = json_decode($json, true);

$ch2 = curl_init($data['profile_image_url']);
curl_exec($ch2);
curl_close($ch2);


--
Kyle Mulka
http://twilk.com - put your friends' faces on your Twitter background

Cameron Kaiser

unread,
Oct 12, 2009, 3:09:17 AM10/12/09
to twitter-deve...@googlegroups.com

I don't see anything here where you're trying to deal with multi-byte
encodings. PHP doesn't do that by default.

--
------------------------------------ personal: http://www.cameronkaiser.com/ --
Cameron Kaiser * Floodgap Systems * www.floodgap.com * cka...@floodgap.com
-- I've been in the van fifteen years, Harry! -- "True Lies" ------------------

Jim DeLaHunt

unread,
Oct 12, 2009, 1:24:13 PM10/12/09
to Twitter Development Talk
Happy Canadian Thanksgiving, Kyle:

On Oct 12, 12:00 am, Kyle Mulka <repalvigla...@yahoo.com> wrote:
>... Twitter API
> developers have to deal with non-ASCII characters in image URLs
> because Twitter doesn't change the name the user gave their image file
> to something cleaner.
>
> The PHP code below is giving me the standard Amazon S3 access denied
> error message, but if I copy the URL of the image and paste it into my
> browser, that doesn't happen. What do I need to do to get this to
> work?

I expect the URLs you are getting back are UTF-8 encoded strings. I
believe the authoritative spec is RFC-3986 <http://tools.ietf.org/html/
rfc3986>. My understanding of its contents is:
a) the path part of the URL is an octet stream, and
b) the web server may interpret that octet stream as it pleases, so it
may be in any encoding, but
c) it's good practice for agents to present and interpret path parts
of URLs as UTF-8 encoded text, and
d) any octet in the path part of the URL which aren't in the subset of
ASCII permitted in URLs should be percent-encoded, but
e) it would be nice for agents to accept unpermitted byte values in
the path part of the URL, and
f) it would be nice for agents to interpret path parts of URLs as
being encoded in UTF-8 unless they know otherwise.

As usual, Wikipedia also has a nice writeup. See
http://en.wikipedia.org/wiki/Percent-encoding and linked articles.

I did an experiment with Firefox 3 which showed it was respecting the
above spec. I pasted a URL with non-ASCII UTF-8 characters in it, and
it blithely accepted them, perhaps percent-encoded them, and
successfully requested the page. Then I visited a URL with percent-
encoded characters (non-English versions of Wikipedia are a bounty of
such URLs), pasted one of those URLs in to the Firefox location field,
and Firefox removed the percent-encoding and displayed the URL as a
UTF-8 string.

Thus you might want to experiment with revising your code to which
handles URLs and other strings from the Twitter API to have UTF-
encoded strings, or byte strings with no encoding interpretation. Be
ready to apply your own percent-encoding of received URLs per d)
above.

I don't know the PHP incantations for string encoding, sorry. I do
know it differs between PHP 4, PHP 5, and PHP 6. (I just brushed up on
the corresponding Python incantations last night, as it happens.)

> $ch = curl_init('http://twitter.com/users/show.json?
> screen_name=rennri');
> curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
> $json = curl_exec($ch);
> curl_close($ch);
>
> $data = json_decode($json, true);
>
> $ch2 = curl_init($data['profile_image_url']);
> curl_exec($ch2);
> curl_close($ch2);
>
> --
> Kyle Mulkahttp://twilk.com- put your friends' faces on your Twitter background

Hope this helps!

—Jim DeLaHunt, Vancouver, Canada, http://jdlh.com/ multilingual
websites consultant.
Reply all
Reply to author
Forward
0 new messages