Shiftspace in the URL

Zohar Arad

unread,

May 1, 2011, 8:32:10 AM5/1/11

to ShiftSpace

Hey All

After a couple of conversations with Mushon last week about possible
ways to create § unique URLs I went home and did a bit of hacking, and
at this stage I feel I need some advice from the group.

To get you all up to speed, I'll start by describing what I was trying
to achieve, followed by a description of my attempts to solve the
problem.

So - The problem can be described as this:

We'd like to create a § unique URL that contains a page on the Web + a
shift. So, if you take www.nytimes.com as the page and say ABCDEF as
the shift (please ignore the content of the shift for now), we'd like
to see a unique URL that represents the web page + the shift - For
example: www.nytimes.com#shift=ABCDEF

The real challenge here is to generate the shifted page without a
browser extension (or without any user-installed components), so I
could in theory send this URL to my friends etc.

To achieve that, I wanted to try and inject a bit of JS that will
handle the shifting into the requested page. From now on, I will refer
to this bit of JS as the § client.

My first attempt was to write a server-side software that will receive
a URL, get its content from the web, parse it, change all URLs inside
the page to be absolute (so CSS, JS, images download properly) and
then serve it back to the user.

To illustrate, lets suppose I visit www.shiftspace.org/zohar?url=www.nytimes.com
. I should get a parsed version of the NYTimes home page with the §
client added to the HTML by the server.

This method failed miserably because many sites will inject content to
the page after load, and will rely on the page's domain to be correct.
Since the page is not served from its original domain, things tend to
be very fragile. So, when testing, things like Flash and dynamic HTML
simply broke.

Next I tried using an iframe, but as you know, there's a tight
security sandbox that prevents programmers from interacting with
iframes outside the same subdomain. To add to that, some sites will
refuse to work well when viewed inside iframes.

In that respect, I may have missed a very obscure hack that will allow
me to inject a bit of JS to a page that lives inside an iframe... but
as i mentioned above, the method is still a bit problematic.

So, the question is - How can one inject a bit of JS into a page while
not changing the page's domain (or original address if you like).

At the moment, my only viable options are:

1) Use an intercepting proxy. This will require the user to modify
their connection configuration
2) Use a bookmarklet
3) Use a browser extension

To further emphasize the issue, the real motivation here is to
generate a mechanism that 1) does not require any installation / user
modification and 2) can be shared as is between users.

Any insights you may have will be appreciated.

Thanks
Zohar

David Nolen

unread,

May 1, 2011, 12:35:39 PM5/1/11

to shift...@googlegroups.com

On May 1, 2011, at 8:32 AM, Zohar Arad wrote:
> At the moment, my only viable options are:
>
> 1) Use an intercepting proxy. This will require the user to modify
> their connection configuration
> 2) Use a bookmarklet
> 3) Use a browser extension
>
> To further emphasize the issue, the real motivation here is to
> generate a mechanism that 1) does not require any installation / user
> modification and 2) can be shared as is between users.
>
> Any insights you may have will be appreciated.
>
> Thanks
> Zohar

We've explored the proxy path at length. There are even more issues around security. In my experience to attempt to do this without plugins is just not feasible. Modern browsers support extension - and projects like CrossRider are bringing this functionality to even IE.

David

Mushon Zer-Aviv

unread,

May 1, 2011, 12:57:21 PM5/1/11

to shift...@googlegroups.com

David,
I understand you have more faith in the #2 & #3 paths that Zohar mentioned and that you would like to explore, and I welcome that, but still there's room for more exploration of the proxy path, especially since it opens so many possibilities. Three questions:

What is the approach we took before? (the one used on http://shiftspace.org/api/sandbox/)
What was the latest conclusion of our attempt to take advantage of postMessage?

There's a project I found called Porthole (on Github) which is a small JS library that is trying to address this. Could be irrelevant, but I thought I'd pass it along…

Zohar Arad

unread,

May 1, 2011, 3:12:51 PM5/1/11

to ShiftSpace

Mushon, I took a quick look at Porthole and it doesn't look relevant,
sorry.

Ideally, we won't use iframes as they're evil and do open up a whole
different can of worms.

At the moment I'm out of creative ideas on how to achieve this without
an extension or a bookmarklet at the very least.

Maybe we're going at this the wrong way... not sure :(

On May 1, 7:57 pm, Mushon Zer-Aviv <mus...@shual.com> wrote:
> David,
> I understand you have more faith in the #2 & #3 paths that Zohar
> mentioned and that you would like to explore, and I welcome that, but
> still there's room for more exploration of the proxy path, especially
> since it opens so many possibilities. Three questions:
>

> 1. What is the approach we took before? (the one used on
> http://shiftspace.org/api/sandbox/)
> 2. What was the latest conclusion of our attempt to take advantage of

> postMessage?
>
> There's a project I found called Porthole

> <http://ternarylabs.com/2011/03/27/secure-cross-domain-iframe-communic...>
> (on Github <https://github.com/ternarylabs/porthole/>) which is a small

David Nolen

unread,

May 1, 2011, 3:19:54 PM5/1/11

to shift...@googlegroups.com

On Sun, May 1, 2011 at 3:12 PM, Zohar Arad <zo...@zohararad.com> wrote:

Mushon, I took a quick look at Porthole and it doesn't look relevant,
sorry.

Ideally, we won't use iframes as they're evil and do open up a whole
different can of worms.

They are relevant. I've been trying convince someone to port the semantics of Chrome/Safari Extensions on top of postMessage. That's the only way to safely interact w/ untrusted domains (which are delivered by proxy w/ minimal modification) while delivering the ability to directly interact with the contents of the original page.

It is still a suboptimal experience because of proxying. By adopting a Chrome/Safari compatible API, we can easily deliver a plugin experience for those people that want it.

David

Zohar Arad

unread,

May 5, 2011, 2:16:00 AM5/5/11

to ShiftSpace

So... i had a small breakthrough here.

Basically, I wrote a small EventMachine service that accepts a request
on my localhost and fetches a URL from a remote server (lets say
http://www.nytimes.com).

When my service receives the response from the remote server it adds a
<base> tag to the response HTML <head>, thus ensuring that any
relative URLs are handled correctly.

This method will work well in most cases.

It will fail where:

1. Site relies on Ajax to operate normally
2. Site HTML is so badly written that loading it outside the normal
domain will simply break (for example, the infamous ynet.co.il)

This is very much a POC but we can rely on such method to serve
content from the shiftspace domain and add any JS we need along the
way

David Nolen

unread,

May 5, 2011, 12:56:02 PM5/5/11

to shift...@googlegroups.com

On Thu, May 5, 2011 at 2:16 AM, Zohar Arad <zo...@zohararad.com> wrote:

So... i had a small breakthrough here.

Basically, I wrote a small EventMachine service that accepts a request
on my localhost and fetches a URL from a remote server (lets say
http://www.nytimes.com).

When my service receives the response from the remote server it adds a
<base> tag to the response HTML <head>, thus ensuring that any
relative URLs are handled correctly.

This method will work well in most cases.

It will fail where:

1. Site relies on Ajax to operate normally
2. Site HTML is so badly written that loading it outside the normal
domain will simply break (for example, the infamous ynet.co.il)

This is very much a POC but we can rely on such method to serve
content from the shiftspace domain and add any JS we need along the
way

Zohar,

We already have a proxy that does this and it works on many sites. Joe Moore worked on this a while back. The proxy even has basic postMessage support for safe communication with the iframe sandboxed content.

How will your proxy deal with malicious websites?

David

Zohar Arad

unread,

May 6, 2011, 3:39:16 AM5/6/11

to ShiftSpace

What do you mean by malicious websites?

Btw, how does your proxy work? what technologies is it written in?

David Nolen

unread,

May 6, 2011, 11:39:25 AM5/6/11

to shift...@googlegroups.com

On Fri, May 6, 2011 at 3:39 AM, Zohar Arad <zo...@zohararad.com> wrote:

What do you mean by malicious websites?

Btw, how does your proxy work? what technologies is it written in?

Consider a page carefully constructed to detect that it's being proxied. This page can steal any cookies served by your domain. It can also compromise your injected JS. And so forth.

Our proxy is 30 or so lines of Python. We use a Python lib that analyzes the DOM to figure what the links are and rewrites them.

David

Zohar Arad

unread,

May 7, 2011, 1:25:28 AM5/7/11

to ShiftSpace

Sounds very interesting.

Could you point me to the right place on the § github repo?

Mushon | Shual

unread,

May 7, 2011, 12:08:56 AM5/7/11

to shift...@googlegroups.com, shift...@googlegroups.com

Maybe if there's interest Joe can walk you (Zohar) through the way we built and used the proxy.

Few things re: security

Do we have to use cookies in the proxy?

Can we sanitize the API that is sent to the server? (for example, using jsonp or some other similar xss workaround method?

Sent from my bamboo hut on the Sinai beach,

Mushon Zer-Aviv

Mushon.com | Shual.com | @Mushon

--
You received this message because you are subscribed to the Google Groups "ShiftSpace" group.
To post to this group, send email to shift...@googlegroups.com.
To unsubscribe from this group, send email to shiftspace+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/shiftspace?hl=en.

David Nolen

unread,

May 7, 2011, 11:20:07 AM5/7/11

to shift...@googlegroups.com

On May 7, 2011, at 12:08 AM, Mushon | Shual wrote:

Maybe if there's interest Joe can walk you (Zohar) through the way we built and used the proxy.

Few things re: security
Do we have to use cookies in the proxy?

It's an inherently insecure design. It is in fact exactly why Chrome/Safari Extensions work the way they do and GreaseMonkey before them.

Can we sanitize the API that is sent to the server? (for example, using jsonp or some other similar xss workaround method?

Not sure what you mean.

David

David Nolen

unread,

May 7, 2011, 11:23:14 AM5/7/11

to shift...@googlegroups.com

On May 7, 2011, at 1:25 AM, Zohar Arad wrote:

> Sounds very interesting.
>
> Could you point me to the right place on the § github repo?
>

https://github.com/ShiftSpace/shiftspace/blob/master/server/server.py#L267

David

Zohar Arad

unread,

May 10, 2011, 2:47:24 PM5/10/11

to ShiftSpace

Thanks for that David

So, it looks like the server itself is not supposed to handle
malicious sites. That's the JS job... right?

Other than that, the two differences between this and the bit of code
i used for testing are:

1. I didn't do any DOM / URL manipulation but simply added a <base>
tag to handle the relative URLs. This means less processing time /
resources. The server is just a pipe line and the manipulation is done
at the response level (i.e. simple string manipulation)
2. I used EventMachine (Ruby), which can scale a bit better as far as
i know (and as the current solution is Python based, Twisted is
probably the more suitable option)

So, the question is what now :)

> https://github.com/ShiftSpace/shiftspace/blob/master/server/server.py...
>
> David

David Nolen

unread,

May 10, 2011, 2:58:28 PM5/10/11

to shift...@googlegroups.com

On Tue, May 10, 2011 at 2:47 PM, Zohar Arad <zo...@zohararad.com> wrote:

Thanks for that David

So, it looks like the server itself is not supposed to handle
malicious sites. That's the JS job... right?

Yup.

Other than that, the two differences between this and the bit of code
i used for testing are:

1. I didn't do any DOM / URL manipulation but simply added a <base>
tag to handle the relative URLs. This means less processing time /
resources. The server is just a pipe line and the manipulation is done
at the response level (i.e. simple string manipulation)

This strategy didn't work that well for us in the past.

2. I used EventMachine (Ruby), which can scale a bit better as far as
i know (and as the current solution is Python based, Twisted is
probably the more suitable option)

So, the question is what now :)

It would certainly be interesting to benchmark the various strategies (traditional, evented) for a centralized backend. Though I don't really consider this to be the hard part. ShiftSpace is a modest 4000 lines of server-side Python and 25,000+ lines of client-side JavaScript.

Building a communication protocol between an unsafe page and ShiftSpace that supports the current interactions will be challenging.

David

Reply all

Reply to author

Forward