Trouble with Unicode in URLs

30 views
Skip to first unread message

Gaius

unread,
Jan 15, 2010, 11:26:28 AM1/15/10
to Rack Development
I have a Rails app in which I'd like to use some Unicode URLs:

# in routes.rb:
map.resources 'proteges', :as => 'protégés', :only => [:index]

When I go to http://localhost:3000/protégés, I get

No route matches "/prot%C3%A9g%C3%A9s" with {:method=>:get}

That was on Mongrel, though I also tried Passenger. The fix was to
rewrite the REQUEST_URI environment variable in a Rack middleware:

require 'cgi'

class FixUnicodeUrlsMiddleware

ENVIRONMENT_VARIABLES_TO_FIX = [
'PATH_INFO', 'REQUEST_PATH', 'REQUEST_URI'
]

def initialize(app)
@app = app
end

def call(env)
ENVIRONMENT_VARIABLES_TO_FIX.each do |var|
env[var] = CGI.unescape(env[var]) if env[var] =~ /%[A-Za-
z0-9]/
end
@app.call(env)
end

end

I'm sure that implementation could cause some problems, though.

See also my question on Stackoverflow:
http://stackoverflow.com/questions/2051553/how-do-i-use-utf-in-a-rails-url

Does anyone have any thoughts on how to add Unicode support to Rack?

Iñaki Baz Castillo

unread,
Jan 15, 2010, 11:43:12 AM1/15/10
to rack-...@googlegroups.com
El Viernes, 15 de Enero de 2010, Gaius escribió:
> I have a Rails app in which I'd like to use some Unicode URLs:
>
> # in routes.rb:
> map.resources 'proteges', :as => 'protégés', :only => [:index]
>
> When I go to http://localhost:3000/protégés, I get
>
> No route matches "/prot%C3%A9g%C3%A9s" with {:method=>:get}
>
> That was on Mongrel,

Unicode symbols are not allowed in URL according to its BNF grammar. So the
client (the web browser in your case) hex-escapes these symbols.

This is: the client is sending a request like:

GET /prot%C3%A9g%C3%A9s HTTP/1.1

which is correct.

Then the server must hex-unescape it, and this is what you do with your Rack
middleware :)

Rack by itself doesn't require that the URL must be hex-unescaped before
passing then to the application, so if a task for your application to do it.

> though I also tried Passenger.

And the same happened? I don't think so as Apache unescapes the URL before
passing the request to the backend (in this case mod_rack). I've checked it
before: when a request with hex-escaped URL arrives to Apache it unescapes
before passing the data to mod_rack so you get the Rack variables hex-
unescaped (you should already see the unicode symbols).

I wonder how is possible your Apache not to unescape the URL before passing it
to Rack, could you please re-check it? which Apache version do you use?


--
Iñaki Baz Castillo <i...@aliax.net>

Gaius

unread,
Jan 15, 2010, 11:46:20 AM1/15/10
to Rack Development
I agree with your analysis of _why_ the server is getting the hex-
escaped version. (That's why I used CGI.unescape to fix the problem.)
I'm also quite sure that Apache isn't unescaping before passing the
request on to Rack.

My setup:

$ apachectl -v
Server version: Apache/2.2.13 (Unix)
Server built: Sep 28 2009 16:04:37

$ gem list passenger
*** LOCAL GEMS ***
passenger (2.2.4)


On Jan 15, 11:43 am, Iñaki Baz Castillo <i...@aliax.net> wrote:
> El Viernes, 15 de Enero de 2010, Gaius escribió:
>
> > I have a Rails app in which I'd like to use some Unicode URLs:
>
> >     # in routes.rb:
> >     map.resources 'proteges', :as => 'protégés', :only => [:index]
>

> > When I go tohttp://localhost:3000/protégés, I get

Iñaki Baz Castillo

unread,
Jan 15, 2010, 12:03:27 PM1/15/10
to rack-...@googlegroups.com
El Viernes, 15 de Enero de 2010, Gaius escribió:
> I agree with your analysis of _why_ the server is getting the hex-
> escaped version. (That's why I used CGI.unescape to fix the problem.)

Then what is the problem now? :)


> I'm also quite sure that Apache isn't unescaping before passing the
> request on to Rack.
>
> My setup:
>
> $ apachectl -v
> Server version: Apache/2.2.13 (Unix)
> Server built: Sep 28 2009 16:04:37
>
> $ gem list passenger
> *** LOCAL GEMS ***
> passenger (2.2.4)

Really interesting. Have you configured something in Apache2?
I have passenger 2.2.4 and apache2:

$ apache2ctl -v
Server version: Apache/2.2.11 (Ubuntu)
Server built: Nov 13 2009 22:06:57

In my case URL is unescaped by Apache2 ¿?

Gaius

unread,
Jan 15, 2010, 12:13:00 PM1/15/10
to Rack Development
Well, I don't have a problem. My point is that I had to build a
middleware to solve the problem. To me, that indicates that some part
of Rack (either core or contrib) might want this unencoding so others
don't have the same problem. Of course, if it's really an httpd
problem, I'd rather solve it there.

I haven't done anything to httpd.conf other than add Passenger, as
evidenced by the following diff:

$ diff /etc/apache2/httpd.conf{,.original}
486d485
< SSLSessionCache dbm:/var/log/apache2/ssl_gcache_data
490,507d488
<
< LoadModule passenger_module /Library/Ruby/Gems/1.8/gems/
passenger-2.2.4/ext/apache2/mod_passenger.so
< <IfModule passenger_module>
< PassengerRoot /Library/Ruby/Gems/1.8/gems/passenger-2.2.4
< PassengerRuby /System/Library/Frameworks/Ruby.framework/Versions/1.8/
usr/bin/ruby
< </IfModule>
<
<
< # Added by the Passenger preference pane
< # Make sure to include the Passenger configuration (the LoadModule,
< # PassengerRoot, and PassengerRuby directives) before this section.
< <IfModule passenger_module>
< NameVirtualHost *:80
< <VirtualHost *:80>
< ServerName _default_
< </VirtualHost>
< Include /private/etc/apache2/passenger_pane_vhosts/*.conf
< </IfModule>
\ No newline at end of file

Gaius

unread,
Jan 15, 2010, 12:16:09 PM1/15/10
to Rack Development
I just found this older thread by you. If only we could just switch
computers!

http://groups.google.com/group/rack-devel/browse_thread/thread/d16abdccdb9026e8

Gaius

unread,
Jan 15, 2010, 12:21:54 PM1/15/10
to Rack Development
One other interesting tidbit: it works great on my production Ubuntu
server:

PROD> apache2 -v
Server version: Apache/2.2.11 (Ubuntu)
Server built: Aug 18 2009 14:28:29

It's only on my Mac that it fails. Hmm.

On Jan 15, 12:13 pm, Gaius <james.a.ro...@gmail.com> wrote:

Iñaki Baz Castillo

unread,
Jan 15, 2010, 12:48:36 PM1/15/10
to rack-...@googlegroups.com
El Viernes, 15 de Enero de 2010, Gaius escribió:
> I just found this older thread by you. If only we could just switch
> computers!
>
> http://groups.google.com/group/rack-devel/browse_thread/thread/d16abdccdb90
> 26e8

Yes. I also asked in Apache IRC channel and nobody told me how to achieve it
(avoid Apache hex-unescaping the request URI), in fact based on the comments
received in that IRC session I would think that it's not possible to dissable
it (until you said that in your case it doesn't do it).

Iñaki Baz Castillo

unread,
Jan 15, 2010, 12:49:04 PM1/15/10
to rack-...@googlegroups.com
El Viernes, 15 de Enero de 2010, Gaius escribió:
> One other interesting tidbit: it works great on my production Ubuntu
> server:
>
> PROD> apache2 -v
> Server version: Apache/2.2.11 (Ubuntu)
> Server built: Aug 18 2009 14:28:29
>
> It's only on my Mac that it fails. Hmm.

Whith "fails" do you mean that the URI is hex-escaped by apache?

Gaius

unread,
Jan 15, 2010, 2:23:55 PM1/15/10
to Rack Development
Correct. On Prod (Apache 2.2.11, Ubuntu), I get proper unescaping by
the time the request hits Rack. On Dev (Apache 2.2.13, OSX), I get hex-
escaped URLs in Rack.

Iñaki Baz Castillo

unread,
Jan 15, 2010, 4:08:37 PM1/15/10
to rack-...@googlegroups.com
El Viernes, 15 de Enero de 2010, Gaius escribió:
> Correct. On Prod (Apache 2.2.11, Ubuntu), I get proper unescaping by
> the time the request hits Rack. On Dev (Apache 2.2.13, OSX), I get hex-
> escaped URLs in Rack.

Annoying... ¿?

Reply all
Reply to author
Forward
0 new messages