Serving files directly from github

2,034 views
Skip to first unread message

Edward Elliott

unread,
May 8, 2021, 8:44:25 AM5/8/21
to brython
I did some digging into how serving files directly from github works.  Hope this info helps others.

This follows up on another recent thread, but is distinct enough for its own post.

Problem 

Let's say we want to render / run a file on github.  It could be html, or javascript, or another format.  How can we do that?

As an example, take www/tests/test_dict.py in the brython repo.  There are two urls to access it:
The view url is for viewing the file source in your browser.  It converts the file contents to an html table with line numbers and syntax highlighting, surrounded by the github interface.  This is not suitable for serving js or brython code for execution: your browser sees an ordinary html page, not the code itself.

The raw url serves what you need to execute: just the contents of the file itself.  However your browser won't render or execute it, because github serves the file as Content-type: text/plain.  Your browser will only display the file text contents, not interpret them as html or run them as js.  This is billed as a "security" feature, though it's actual security value is questionable.  (Not sure if brython follows the same rule as browsers and refuses to run a python script served as text/plain - my guess is it will but haven't tested it).

So your browser won't render or run any content served from a github raw url.  The contents are suitable, but the HTTP Content-type header is not.  

Solution

There are several ways around this: configure your browser to ignore Content-type, run an http proxy to change the content type header, use a browser extension to do the same, write a script to ajax the raw contents and dynamically parse them, etc.  But most of these solutions only work for a single user.

It turns out there are several websites that solve this problem.  They simply proxy content from github and change the content-type header to the correct value (text/html, text/javascript, application/javascript, etc).  One such site is githack.com.

If your github files use relative paths that match your github directory structure, you can simply access the files through githack.com instead of github.com and they will run correctly.  I've tested this and it works.  

For instance, if you want to run https://github.com/brython-dev/brython/blob/master/www/tests/index.html then just go to this githack url:

Voila, the page renders and the code runs.  Brython modules in /www/tests/ are imported correctly.

Limitations

Now there are some things that don't work, because the paths aren't setup correctly.  Importing the foobar module fails, because it lives in a path (www/src/Lib/site-packages/) that isn't known to the brython interpreter.  This could be fixed by passing it in the pythonpath argument to the brython() interpreter when it's invoked in index.html.

I'm not suggesting you should run brython code this way.  Setting up your source files with all relative paths may be inconvenient.  Absolute paths like /css/header.css will be different in your repo than on your server, due to the extra project and branch name dirs that githack needs, not to mention your web server root probably being different from your git tree root.

Lessons

My takeaway points are:
  1. hosting from github directly can work, if you really want it to and set things up properly
  2. understanding how to make it work requires a deeper understanding of how web applications work, which is helpful for any deployment whether using github or not.

Ray Luo

unread,
May 9, 2021, 7:24:47 AM5/9/21
to brython
Most git hosting services do not mean to serve a repo as a static website. Not all repos are static websites, anyway. But many popular git hosting services also provide dedicate service to enable a real static website for your repo, without any limitation. In github, that is Github Pages. Using that kind of service, you do not need any 3rd-party workarounds.

Ian and I worked out a Brython template, https://github.com/rayluo/brython-project-template

Regards,
Ray Luo

André

unread,
May 9, 2021, 4:27:34 PM5/9/21
to brython
At least 4 years ago, I found that serving Brython code directly from github pages work.  This is code that imports other files found in the repo, using regular import statements from Brython. I just checked, and it still works. See the readme of https://github.com/aroberge/reeborg-staging for a link.

André Roberge

P.S. This is for information only. I am sorry, but I won't have time to reply to messages for the next couple of weeks at the very least.

Edward Elliott

unread,
May 14, 2021, 6:43:41 AM5/14/21
to bry...@googlegroups.com
Thanks to you both for the additional info.  Very good to know.

I like to understand why things work / don't work, so the investigation was useful to me.  Even though I completely agree, your approaches are much more practical.  Github Pages is the proper way to host from a repo.

The Content Type restriction I sort-of understand.  Chrome is trying to be more secure by preventing unintended uses.  Though I question if it's a good policy, given that 1) many web servers are misconfigured and serve incorrect content types 2) how a server uses content on its own site has no bearing on how other sites may use that content and 3) it's fairly trivial to bypass anyway, just ajax the plain text document and create a new script element (for js at least).

However CORS restrictions make no sense whatsoever to me.  I can't even retrieve a document from another site via ajax unless that site has an Access-Control-Allow-Origin header that gives the requesting domain permission to access it.  But the security threat is not to the hosting domain - it's to the requesting domain & the end user.  The requesting domain should be the one that determines what third party domains are allowed.  The controls are precisely backwards.

It's like this: I have an account (web page) with a bank (domain).  I want to make a payment to another account at another bank (ajax request to another domain).  Instead of me telling my bank to authorize the transfer, I have to tell the other bank to authorize it.  The other bank doesn't know me, they won't add me to their approved transfer list.  How does that make any sense?

In other words: say I have an account with Barclay's.  I need to make payments to accounts at Santander, BNP Paribas, and HSBC.  I should tell Barclay's which banks I'm making payments to.  I shouldn't go to Santander, BNP Paribas, and HSBC begging them to let me make payments to their accounts.  That doesn't improve my account security in any way.  Just creates unnecessary friction.

Back to domains.  Why should another domain tell me whether I can load their resources?  A normal HTTP request returns the resource just fine.  It's publicly available.  Only when the request is ajax, my browser puts handcuffs on me and refuses to retrieve it, unless the host gives explicit permission to the requesting domain.  That's not improving security.  That's imposing arbitrary global limitations on who can access a public resource.  Makes no sense whatsoever.

Does anyone have an explanation for how CORS does anything remotely useful?  A default-deny rule for retrieving third party content is good, but it's the end user / requesting domain's job to define exceptions.  Not giving that control to third parties.

Thanks if anyone has insight on this.


--
You received this message because you are subscribed to the Google Groups "brython" group.
To unsubscribe from this group and stop receiving emails from it, send an email to brython+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brython/89f72175-4a29-4d42-aec7-76d2fa4d6fa1n%40googlegroups.com.

Ian Sparks

unread,
May 14, 2021, 2:43:04 PM5/14/21
to brython
I'm sure you'll get a better answer from someone more knowledgeable but in short it's about access control and not really about security.

Any decent browser will enforce the SOP (Same Origin Policy). For the uninitiated, this stops javascript on the page trying to make AJAX calls to hosts other than the one that the page was served from.

But this is very annoying in some situations. For example, if I want to have a chat widget on mysite.com that is hosted by chatwidget.com then the SOP stops mysite.com pages making AJAX calls to chatwidget.com. Cross Origin Resource Sharing (CORS) relaxes this. It allows chatwidget.com to accept AJAX requests from pages served by mysite.com

Now think of it from the chatwidget.com site side. Lets say they offer a cool ajax-enabled chat experience but they do it from html pages that they serve from their own domain which are plastered with ads in the html which is how they pay the bills. In theory anyone can make an AJAX call to their chat server but SOP means they can only do it from a page that chatwidget.com serves (and which is covered in ads). Now the folks from mysite.com turn up and they'd like to use the chat widget on their own site, from their own pages but they can't because of SOP. Instead they ask chatwidget.com to add them (mysite.com) to the chatwidget.com CORS and probably pay chatwidget.com for the privilege. 

CORS gives YOU (the domain owner) the chance to share/sell access to your content/service via ajax requests from other domains. 

Ray Luo

unread,
May 15, 2021, 2:15:53 AM5/15/21
to brython
Besides Ian's educational answer, Edward, I want to also point out that your bank payment analogy has one flaw. You were not making a deposit to that another bank, you were trying to withdraw. Of course they would have a say on whether rejecting you. You may argue that the stuff you want to retrieve from that "another bank" is already "publicly available". Then we can use a different analogy. You are sitting inside your local library, but somehow trying to get a "publicly available" food sample from the restaurant next door. The restaurant has a right to reject you, because you are not their patron.

Back to domains. Depends on the nature of their service, chatwidget.com may choose to let their widget be publicly available even to AJAX requests. But in general, a non-profit website's free tier service would not want to waste resource on indirect audience. Same Origin Policy protects them from being (ab)used by free-riders.

Edward Elliott

unread,
May 15, 2021, 9:55:43 AM5/15/21
to bry...@googlegroups.com
Thanks Ian for taking the time to explain. You may be right about the motivation for CORS.

But if so, CORS is a very poor implementation of that concept.  Access control is a type of security.

Access control belongs at the resource owner.  Relying on a client to voluntarily respect proper access control is awful security.  Resource owner can't enforce what the client does.  Resource owner can only decide not to serve the resource.

In your example, chatwidget.com should make those decisions.  It's not.  Even mychat.com is not doing it.  The browser of the user accessing mychat.com is where the decision to fetch from chatwidget.com is made.  chatwidget.com has to assume the client is trustworthy.  It's not.  I can code my own browser to ignore the Access-Control-Allow-Origin header and fetch from chatwidget.com anyway.

CORS is not authentication.  If chatwidget.com only wants to serve authenticated clients, it needs to have a login policy with unforgeable cryptographic tokens.  Anything else is trusting "good behavior" from the client.  Which is rule #1 not to do for security.

Thus CORS is not protection for chatwidget.com, nor for mychat.com.  It is protection for the user whose browser is going to mychat.com.  And it's the wrong implementation.

If your goal is voluntary access control at the client (which isn't secure, but let's just say that's what you want) then CORS is completely unnecessary.  The browser already sends a Referer header with the domain of origin.  Your browser goes to mychat.com, requests a url from chatwidget.com via ajax, the browser includes the header Referer: http://mychat.com/foo/barchatwidget.com can see the header and decide whether or not to serve this request.  Boom, access control.  Still flawed, still voluntary by the client.  But the decision is made at chatwidget.com whether to grant access.

The client can still lie to get around Referer, just like CORS.  The client can omit the Referer header, or set it to chatwidget.com, etc.  Just like the client deciding whether or not to respect Access-Control-Allow-Origin header.  But in this case, the access control decision is made at chatwidget.com

If chatwidget.com wants higher security, it can require not just a proper Referer, but a referer that includes a special cryptographic token that is very hard to forge.  Or require a cookie with such a token.  Real websites do this.  Ever tried downloading from a video site and the url doesn't work when you try it with wget?  Because the host is looking for a special token.

CORS has no such ability.  It's a binary yes/no, allowed / not allowed, based on the referring domain, not the user.  Anyone going to mychat.com can ajax from chatwidget.com in your scenario.  That's not access control.

The proper way to do CORS is this:
  1. User goes to mychat.com and loads a page.
  2. mychat.com serves a header saying cross-origin access to chatwidget.com is ok / "safe".
  3. browser sends request to chatwidget.com with Referer and other headers.
  4. chatwidget.com decides whether it wants to allow the request for its resources based on Referer and possibly the presence of a secure token.
  5. browser gets the resource.

If chatwidget.com wants to make an agreement with mychat.com regarding serving ads, etc, they are free to do that.  Then mychat.com has to add a secure token to the request that chatwidget.com recognizes (e.g. they can establish trusted public keys between them).  mychat.com includes a signed token in each page request it serves.  Script running in the page includes the token in the request to chatwidget.com.  Voila, secure access control.  If chatwidget.com doesn't need that much security, they can just trust the Referer header to see if the request comes from a partner domain.

So again, I ask: what is the purpose of CORS?  It does nothing that isn't better handled in other ways.

Sorry for the lengthy post.  It's not directly about brython, but it is relevant to any web app.




Edward Elliott

unread,
May 15, 2021, 10:26:50 AM5/15/21
to bry...@googlegroups.com
Hi Ray, thanks for sharing your thoughts.

My bank analogy does fit.  The threat model CORS addresses is the user running the browser.  If anything bad happens from serving the request, it happens to the user in their browser.  The domains mychat.com and chatwidget.com are unaffected.  Thus the payment analogy from my account is the correct one for my browser loading the page at mychat.com.  The fact that the browser is requesting something (a url) and the bank account is sending something (payment) is irrelevant.  The consequences occur at the sender.  That's why it's an analogy, of course the facts don't match perfectly.

Consider this scenario:
  1. There are two domains: good.com and evil.com
  2. good.com serves nice, safe content for its users
  3. evil.com used to serve good content, but has been taken over and is now malicious.
  4. evil.com has a js string library which good.com uses.
  5. evil.com still provides the string library, but now it includes a hidden cryptocurrency miner in the code that sends all mined coins to evil.com.
  6. The page good.com/myapp pulls in the script evil.com/stringlib.js
  7. User visits good.com/myapp to run their favorite app.  Uh oh!  Now they're running a cryptominer and making money for evil.com.
CORS is meant to "protect" users by disallowing cross-origin script requests.  How does CORS fix this scenario?
  1. Default policy of user's browser is to deny cross-origin request for evil.com/stringlib.js
  2. But wait!  According to CORS, first the browser checks evil.com/stringlib.js for an Access-Control-Allow-Origin header.
  3. evil.com says "sure, you're allowed to run my cross-origin script!  I wouldn't do anything evil, would I? you can totally trust me. (wink, wink, evil grin)".
  4. evil.com returns Access-Control-Allow-Origin: *
  5. User runs evil.com/stringlib.js and makes money for evil.com

Well that didn't quite work, did it?  For CORS to function correctly, evil.com has to say "hey this script that used to be trustworthy, is now totally evil.  I won't let you use it."  And set their Access-Control-Allow-Origin header so the browser denies the request.  That sure was nice of evil.com, wasn't it?

Or instead, you have this scenario with my alternate requesting-domain CORS policy described in my reply to Ian:
  1. Default policy of user's browser is to deny cross-origin request for evil.com/stringlib.js
  2. First, browser checks good.com for an Access-Control-Allow-Origin header.
  3. good.com says "hmmm evil.com used to be nice, but now their scripts are taking advantage of users.  I want to protect my users.  I'm not going to allow access to evil.com anymore."
  4. good.com returns header saying cross-origin access to evil.com should be denied,
  5. Users' browser says "whoa, this ain't good!  i'm staying away from evil.com.  No evil.com/stringlib.js for you!"
  6. evil.com says "curses, foiled again!"  twirls mustache, exit stage right.

Now which scenario is more likely?  Is it better to take security advice from the owner of the domain you're visiting, good.com?  Or take it from totally-trustworthy, not-at-all-comprised random third parties?  Before answering, remember that you're already loading a page on good.com so it can serve you anything it wants to.  If good.com goes bad, they don't need cross-origin requests to take advantage of you.

If your reaction is "why would good.com pull in content from evil.com if they know it's evil?"  The reason is many domains serve user content.  Messageboard.com can be sure they don't include evil.com in their own content.  But they can't verify that their user's posts only include safe content.  By setting Access-Control-Allow-Origin only to a few known trusted domains, messageboard.com can protect their users.

Users (i.e. their browser) shouldn't blindly trust messageboard.com either.  Users should be able to set their own cross-origin-access-control policies in their browser as well, that override the domain owner's list.  But having defense in depth like this is much safer than simply trusting one party to make all the right (safe, secure) decisions.  Especially when that trust is placed in every random domain on the internet (suspect.com: "hey guys, you can totally trust me!  look, my access-control-allow-origin header says so!").



Ian Sparks

unread,
May 15, 2021, 12:10:51 PM5/15/21
to brython
Not to argue with you Edward but just to explore this further:

> CORS is meant to "protect" users by disallowing cross-origin script requests.  

No, that's the Same Origin Policy that browsers self-impose. 

CORS was designed to allow servers to opt-in to interactions with other sites via AJAX APIs in contravention of the Same Origin Policy. 

That's all. 

I think your security arguments have more to do with the Same Origin Policy and while you make good points about who should actually be in control I believe the current setup has evolved to be the way it is to meet the expectation that Joe Sixpack can't have his online bank account drained just because he visits http://dodgysite.com while still allowing Joe to exchange chats with his pals via messenger.com from a page served by facebook.com. This "from a page served by" distinction is important I think. In your examples good.com allows resources to be loaded from evil.com because evil.com has a CORS policy that allows this. That misses the point that good.com must be serving pages that try to access evil.com and when good.com works out that evil.com is actually evil it can stop serving those pages. 

Edward Elliott

unread,
May 17, 2021, 2:33:04 PM5/17/21
to bry...@googlegroups.com
Hi Ian, no need to caveat.  A healthy discussion is always welcome. 🙂

Just sharing my perspective as someone who worked in security and web applications long before CORS existed.

> CORS is meant to "protect" users by disallowing cross-origin script requests.  
No, that's the Same Origin Policy that browsers self-impose. 
CORS was designed to allow servers to opt-in to interactions with other sites via AJAX APIs in contravention of the Same Origin Policy. 

We're talking about the same thing.  Same Origin is the default policy.  CORS is a way to provide exceptions to that policy.  They both serve the same purpose: access control on third-party script execution in the user's browser.

Your comment made me dig into the history a bit...

FYI there is another draft standard for the domain owner to specify allowed origins of external content: Content Security Policy.  It's what CORS should be: domain A says what external sources domain A's pages can access.

CORS seems to predate CSP.  CORS comes from the WHATWG, which split off from W3C when major tech companies didn't like the direction W3C was taking.  CORS dates from the time of the split. 

CSP is a W3C standard and much better thought out, with input from actual security professionals.  CORS smells like a hack that serves the interests of big domain owners rather than end users.  Who are the ones actually exposed.

In your examples good.com allows resources to be loaded from evil.com because evil.com has a CORS policy that allows this. That misses the point that good.com must be serving pages that try to access evil.com and when good.com works out that evil.com is actually evil it can stop serving those pages. 

Those are big caveats.  We've seen many ways a domain can turn from good to evil: sale / change of ownership, change of owner's purpose ("instead of serving a public service, let's monetize the **** out of our users"), domain hijacking, trojans in npm scripts (this happens frighteningly regularly), etc etc.

In practice good.com doesn't regularly monitor all the 200 sites it includes scripts from - it's doubtful good.com even realizes all the third party sites they rely on, between their frameworks relying on npm modules that pull in other modules, etc.

That's not even getting to domains like ebay that host third party html content.  Good luck finding all the domains user posts pulls in.

Then you have to regularly review all the scripts you use from all those domains, to be sure they haven't been compromised.  Who has time for that?  Once you get beyond a small website, it's impossible to keep up with.

Like SSL, the whole edifice is a house of cards.


Reply all
Reply to author
Forward
0 new messages