urlencoding a hash sign doesn't work for routing?

228 views
Skip to first unread message

Keith Irwin

unread,
Apr 13, 2011, 7:16:27 PM4/13/11
to scalatra-user
Folks--

I have a use case where I want a route like this:

delete /foo/:id/bar/:thing

If (from Javascript), I call delete (via jQuery) on something like the
following:

delete /foo/1/bar/value

it works great. The handler I've assigned to that method and route is
invoked.

However, if I do the following:

delete /foo/1/bar/#value

it does not work because the #value is not transmitted, for
understandable reasons. It comes through to scalatra as /foo/1/bar/.
This is non-suprising.

So!

I tried

delete /foo/1/bar/%23value

which I think means that my handler is going to get (for params
("thing")) the value "%23value" at worst, and at best, "#value".

However, the notFound handler is invoked instead, and when I print out
the URL that's not handled, I see:

/foo/1/bar/%23value

Is this a bug? Any suggestions?

Keith

PS. I'm solving the problem by converting the "#" symbol to ".hash."
on one side, and converting it back on the other, but this is kinda
ugly.

Ross A. Baker

unread,
Apr 13, 2011, 11:34:29 PM4/13/11
to scalat...@googlegroups.com
You're almost there:

delete("/foo/1/bar/#:value") { "value="+params("value")+"\n" }

Result:

% curl -X DELETE 'http://localhost:8080/foo/1/bar/%23baz'
value=baz

What's going on:
- The fragment is not submitted to the server, so you need to percent
encode it in the request.
- The URL is percent decoded before matching, so the '#' is correct in
the route.
- [the missing piece] Parameters always need a leading colon.

--
Ross A. Baker
ba...@alumni.indiana.edu
Indianapolis, IN, USA

Ivan Porto Carrero

unread,
Apr 14, 2011, 5:43:54 AM4/14/11
to scalat...@googlegroups.com
That is against the HTTP spec though

a fragment is a client thing only, I'm surprised they even get to the server



-- 
Met vriendelijke groeten - Best regards - Salutations

Ivan Porto Carrero
Co-founder & Developer at Mojolly
Mojolly
Visit us at www.mojolly.comflanders.co.nz
We tweet at @mojolly, @casualjim


Sent with Sparrow

Ross A. Baker

unread,
Apr 14, 2011, 10:12:05 AM4/14/11
to scalat...@googlegroups.com
That's why the '#' has to be percent encoded here. %23 is a literal
'#', not a fragment delimiter.

On Thu, Apr 14, 2011 at 5:43 AM, Ivan Porto Carrero <iv...@mojolly.com> wrote:
> That is against the HTTP spec though
> a fragment is a client thing only, I'm surprised they even get to the server
> http://stackoverflow.com/questions/2286402/url-fragment-and-302-redirects
>

Keith Irwin

unread,
Apr 14, 2011, 12:09:02 PM4/14/11
to scalatra-user


On Apr 14, 2:43 am, Ivan Porto Carrero <i...@mojolly.com> wrote:
> That is against the HTTP spec though
>
> a fragment is a client thing only, I'm surprised they even get to the server
>
> http://stackoverflow.com/questions/2286402/url-fragment-and-302-redir...

They don't get to the server. My point is that when you urlencode the
value (turn # into %23), the full URL gets to the scalatra server, but
the url doesn't match a route.

In other words:

/foo/bar/:thing

won't match to

/foo/bar/%23value

which seems really odd.

Keith

>
> --
> Met vriendelijke groeten - Best regards - Salutations
>
> Ivan Porto Carrero
> Co-founder & Developer at Mojolly
> Mojolly
> Visit us atwww.mojolly.com, flanders.co.nz
> We tweet at @mojolly, @casualjim
>
> Phone: +44.7513.233.465
> Sent with Sparrow
>
>
>
> On Thursday, 14 April 2011 at 04:34, Ross A. Baker wrote:
> > You're almost there:
>
> >  delete("/foo/1/bar/#:value") { "value="+params("value")+"\n" }
>
> > Result:
>
> >  % curl -X DELETE 'http://localhost:8080/foo/1/bar/%23baz'
> >  value=baz
>
> > What's going on:
> > - The fragment is not submitted to the server, so you need to percent
> > encode it in the request.
> > - The URL is percent decoded before matching, so the '#' is correct in
> > the route.
> > - [the missing piece] Parameters always need a leading colon.
>

Keith Irwin

unread,
Apr 14, 2011, 12:20:21 PM4/14/11
to scalat...@googlegroups.com
If I prefix a # to the route definition, doesn't that mean I won't match against words that do NOT have a # prefix? And what about when the # is embedded in the middle of a word?

Maybe using some sort of splat thing would work.

I'll have to creak out the code and check it out.

Keith

Keith Irwin

unread,
Apr 14, 2011, 12:22:02 PM4/14/11
to scalat...@googlegroups.com

Ross A. Baker

unread,
Apr 14, 2011, 12:25:39 PM4/14/11
to scalat...@googlegroups.com
On Thu, Apr 14, 2011 at 12:09 PM, Keith Irwin <keith...@gmail.com> wrote:
> In other words:
>
>  /foo/bar/:thing
>
> won't match to
>
>  /foo/bar/%23value
>
> which seems really odd.
>
> Keith

Parameters get translated into the regular expression: ([^/?#]+) That
is, they match every character except '/', '?', and '#'. That's why
/foo/bar/#:thing works and /foo/bar/:thing doesn't. This behavior is
copied from Sinatra, but I'm not sure why they made '#' special.

For use cases where the string routing fails, you can use a regular
expression instead.

Keith Irwin

unread,
Apr 14, 2011, 12:27:52 PM4/14/11
to scalat...@googlegroups.com

Ross A. Baker

unread,
Apr 14, 2011, 1:56:59 PM4/14/11
to scalat...@googlegroups.com
On Thu, Apr 14, 2011 at 12:27 PM, Keith Irwin <keith...@gmail.com> wrote:
> If I prefix a # to the route definition, doesn't that mean I won't match
> against words that do NOT have a # prefix? And what about when the # is
> embedded in the middle of a word?

Yes, if you wanted to match with and without the # prefix, you'd need
two routes. Parameters won't ever match a #, so embedded in the
middle of a word is tough. All of this behavior should be 100%
compatible with Sinatra, but it's sure ugly in this particular use
case.

> Maybe using some sort of splat thing would work.
> I'll have to creak out the code and check it out.
> Keith

You can also interactively test from the console:

scala> SinatraPathPatternParser("/foo/bar/:thing")
res3: org.scalatra.PathPattern = PathPattern(^/foo/bar/([^/?#]+)$,List(thing))

scala> SinatraPathPatternParser("/foo/bar/:thing")("/foo/bar/#thing")
res4: Option[org.scalatra.ScalatraKernel.MultiParams] = None

scala> SinatraPathPatternParser("/foo/bar/:thing")("/foo/bar/thing")
res5: Option[org.scalatra.ScalatraKernel.MultiParams] = Some(Map(thing
-> ListBuffer(thing)))

Ivan Porto Carrero

unread,
Apr 15, 2011, 6:35:52 AM4/15/11
to scalat...@googlegroups.com

Ross A. Baker

unread,
Apr 15, 2011, 9:30:26 AM4/15/11
to scalat...@googlegroups.com
On Fri, Apr 15, 2011 at 6:35 AM, Ivan Porto Carrero <iv...@mojolly.com> wrote:
> yes it is a bug...
> https://github.com/scalatra/scalatra/issues/52

There are cases where we want to encode, and cases where you don't.
See this Sinatra test:

it "allows using unicode" do
mock_app do
get('/föö') { }
end
get '/f%C3%B6%C3%B6'
assert_equal 200, status
end

There's nothing ambiguous about 'ö'. The route is made more readable
because of the decoding.

The use case that Keith brings up is a pain. Another tough one is
where slash characters are encoded (%2F) so they are part of a path
segment instead of delimiting a path segment. Under the current
implementation, that distinction is lost.

One possible compromise is to leave reserved characters encoded, but
decode everything else. I don't know of a library function offhand
that does this. We'd also have to start trimming context paths and
servlet paths ourselves, because the path info as provided by the
Servlet API is already decoded.

http://en.wikipedia.org/wiki/Percent_encoding#Percent-encoding_reserved_characters
-- the whole article is a good background on this subject.

Keith Irwin

unread,
Apr 15, 2011, 12:17:31 PM4/15/11
to scalat...@googlegroups.com
What surprises me is that encoding path elements between the / delimiters does NOT work with Scalatra. Isn't this what encoding is for?  (By principle of least surprise, I'd expect encoding path elements solves the problem.)

I expected the route matching to happen before the encoding, then the decoding to happen just before the params and multiParams are bound.

But I see it's a thorny problem for an API.

In my particular case, I want to do something like:

  delete /app/user/<id>/company/<code>

which expresses EXACTLY what it does, which is to delete that company code (which is provided by users). They seem to like those twitteresque hash marks.

I can work around this by doing my own encoding (s/#/.hash./g, but it feels like I'm working around a wart in the otherwise life-saving Scalatra, rather than just a painful design choice on my part. I can live with it, though.

Good luck!

Keith

Ivan Porto Carrero

unread,
Apr 15, 2011, 7:12:32 PM4/15/11
to scalat...@googlegroups.com, Keith Irwin
I agree that paths shouldn't be decoded but at the same time this might again be a design choice by the javax.servlet.servlet-api people because AFAICT from the codebase we don't do _any_ encoding on those paths and what's more I tried to fix this a while ago because we originally had cases where we wanted to have urls in the path
(in unencoded form that is) /channels/twitter://casualjim  (translates to: twitter%3A%2F%2Fcasualjim ) but didn't have the best of times trying to work that out.

And if it's not a servlet thing it might even be a container thing. 

To prove my point, here's where we handle stuff: 
global route matching: 

it all hinges on  def requestPath: String

So as you can see it's nothing we do wrt encoding but rather something the servlet is doing for you... this is probably just another case of java apathy for doing the right thing.

If the servlet allows us access to the raw unencoded info then we might be able to fix it. If not and it's a container thing, I'd say you're on your own.
And if it's a servlet thing I think we might be better of fixing it with SSGI then hacking around it in the scalatra code base.

Met vriendelijke groeten - Best regards - Salutations

Ivan Porto Carrero
Co-founder & Developer at Mojolly
Mojolly

We tweet at @mojolly, @casualjim

On Fri, Apr 15, 2011 at 5:17 PM, Keith Irwin <keith...@gmail.com> wrote:
What surprises me is that encoding path elements between the / delimiters does NOT work with Scalatra. Isn't this what encoding is for?  (By principle of least surprise, I'd expect encoding path elements solves the problem.)

I expected the route matching to happen before the encoding, then the decoding to happen just before the params and multiParams are bound.

But I see it's a thorny problem for an API.

In my particular case, I want to do something like:

  delete /app/user/<id>/company/<code>

which expresses EXACTLY what it does, which is to delete that company code (which is provided by users). They seem to like those twitteresque hash marks.

I can work around this by doing my own encoding (s/#/.hash./g, but it feels like I'm working around a wart in the otherwise life-saving Scalatra, rather than just a painful design choice on my part. I can live with it, though.

Good luck!

Keith

On Friday, April 15, 2011 at 6:30 AM, Ross A. Baker wrote:

On Fri, Apr 15, 2011 at 6:35 AM, Ivan Porto Carrero <iv...@mojolly.com> wrote:

There are cases where we want to encode, and cases where you don't.
See this Sinatra test:

it "allows using unicode" do
mock_app do
get('/fรถรถ') { }

end
get '/f%C3%B6%C3%B6'
assert_equal 200, status
end

There's nothing ambiguous about 'รรถ'. The route is made more readable

Ross A. Baker

unread,
Apr 18, 2011, 11:09:35 AM4/18/11
to scalat...@googlegroups.com
The servlet does give us the unencoded version: request.getRequestURI.
An app can override requestPath based on that, stripping the context
path (if not root). Also, whatever route parameters you capture would
still be encoded, so some extra work may be necessary there.
Reply all
Reply to author
Forward
0 new messages