python and html sandboxes

Andrew Harrington

unread,

Apr 23, 2010, 6:00:59 PM4/23/10

to Athar Hameed, pyw...@googlegroups.com, Andre Roberge

Athar and Andre,

We have been discussing PyKata as a regular server application, not a Google App.

There are several untrusted data issues. Crunchy has dealt with them in various ways. Andre, I do not understand how server mode works. Does that address all our issues (expect taking too long in the App Engine)?

1. The Python code users run. Clearly the App Engine has tinkered to make this safe. Do we have a mechanism for a normal server? Andre, your documentation for foreign tutorials mentions something about embedded Python in pages. Could that feature be easily adapted to pyKata for the code students submit? Are you happy with both the safety and the ability to run a reasonable variety of code?

2. We want teacher/contributers to be able to upload html pages for lessons. We want to then serve them to site users. We need them to be safe. Crunchy has an elaborate safety mechanism when it retrieves a foreign page. As I remember Andre said that it took too long for the Google App constraints. On a regular server, where we control the time limits, this would not be such an issue.

Our use on the App Engine would also be different than Crunchy: since we need each page uploaded to be checked once, stored locally, and then it can be served up repeatedly easily. If worse came to worse we could queue pages uploaded for lessons, and analyze them, possibly in discrete steps, over time, but this would be a pain.

Andy

--
Andrew N. Harrington
Director of Academic Programs
Computer Science Department
Loyola University Chicago
512B Lewis Towers (office)
Snail mail to Lewis Towers 416
820 North Michigan Avenue
Chicago, Illinois 60611

http://www.cs.luc.edu/~anh
Phone: 312-915-7982
Fax: 312-915-7998
g...@cs.luc.edu for graduate administration
u...@cs.luc.edu for undergrad administration
aha...@luc.edu as professor

--
You received this message because you are subscribed to the Google Groups "PyWhip" group.
To post to this group, send email to pyw...@googlegroups.com.
To unsubscribe from this group, send email to pywhip+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pywhip?hl=en.

David MacQuigg

unread,

Apr 23, 2010, 7:02:26 PM4/23/10

to PyWhip

Hi Andy,

I haven't thought this far ahead, but perhaps now is a good time to
start the discussion. What I was thinking, but not even written down
yet, was that we will have certain trusted authors, and only an author
can upload a page to be displayed. That should eliminate any
malicious HTML, so we only have to worry about accidents. Do we need
any other restrictions on authors?

If we need restrictions, we could just have a template that has
variables for problems to be displayed. The author just fills out a
form with the reference numbers of problems he wants displayed in an
array on the page. The size of the array, the names of the problems,
the green check marks, everything is keyed off the list of problems.

Without these restrictions, our task is actually simpler, just provide
a copy of the page template to the author, and let him do whatever he
wants - add his logo, change background colors, even start from
scratch and make a whole new page. Maybe as a matter of policy, we
should keep some standard stuff in the header and footer - navigation
links, etc., but we can decide that later.

Andrew Harrington

unread,

Apr 23, 2010, 7:10:21 PM4/23/10

to pyw...@googlegroups.com

A list of trusted authors is one possibility, but we could not know a good author, or an unknown malicious author could send a bunch of good stuff first before zinging us. Crunchy has the code for this. In the short run we can stick with trusted authors, but in a while we should look at what Andre has done. His security info page in impressive, showing the amount of thought his crew has put into this! Andre has reminded us for some time that we have an issue here.

We want generalized lesson content to be allowed, so I do not see a template.

Andy

--
Andrew N. Harrington
Director of Academic Programs
Computer Science Department
Loyola University Chicago
512B Lewis Towers (office)
Snail mail to Lewis Towers 416
820 North Michigan Avenue
Chicago, Illinois 60611

http://www.cs.luc.edu/~anh
Phone: 312-915-7982
Fax: 312-915-7998
g...@cs.luc.edu for graduate administration
u...@cs.luc.edu for undergrad administration
aha...@luc.edu as professor

Andre Roberge

unread,

Apr 23, 2010, 8:35:29 PM4/23/10

to Andrew Harrington, Athar Hameed, pyw...@googlegroups.com

On Fri, Apr 23, 2010 at 7:00 PM, Andrew Harrington <aha...@luc.edu> wrote:

Athar and Andre,

We have been discussing PyKata as a regular server application, not a Google App.

There are several untrusted data issues. Crunchy has dealt with them in various ways. Andre, I do not understand how server mode works. Does that address all our issues (expect taking too long in the App Engine)?

I've never actually tested Crunchy as a server as I have only used it on my own. In theory, all that is required is to install it on a server and changed the base address (127.0.0.1) so that it correspond to the actual ip address of the server.

The main problem/question, which would not be unique to Crunchy, is the first one you wrote about (below).

1. The Python code users run. Clearly the App Engine has tinkered to make this safe. Do we have a mechanism for a normal server? Andre, your documentation for foreign tutorials mentions something about embedded Python in pages. Could that feature be easily adapted to pyKata for the code students submit? Are you happy with both the safety and the ability to run a reasonable variety of code?

This is the *main* problem: how to ensure that the user run code can not do damage to the server. The app engine has worked out this problem, creating a sandbox. I understand that it is possible to use PyPy to run an application within a sandbox so that it would not damage the server. I also understand that it is possible to set up servers (bsd jails?) so that they create a similar "safe" environment. However, I have no experience in this matter and would not really know where to start from.

I should point out that the basic idea used by PyKata is essentially the same as that of the doctest mode for Crunchy. They both have the same security problem.

I believe that PyKata does have a feature currently that is not yet implemented in Crunchy: the ability to track users and their attempt at running code. This is something I try to implement in an earlier version of Crunchy but which I dropped when I rewrote Crunchy (with some help from Johannes) to be easily extensible via plugins.

2. We want teacher/contributers to be able to upload html pages for lessons. We want to then serve them to site users. We need them to be safe. Crunchy has an elaborate safety mechanism when it retrieves a foreign page. As I remember Andre said that it took too long for the Google App constraints. On a regular server, where we control the time limits, this would not be such an issue.

If the application can run securely, then this is not an issue.

The reason why I implemented an elaborate safety mechanism in Crunchy is the following:
1. Crunchy is running on my computer (or yours ;-) having complete access.
2. the communication between the browser and the Python backend is done via javascript.
3. if some javascript (or java applet, or flash app, or ...) was to be left on the page, in theory it could contain code to communicate with the Python backend ... and execute Python code on my computer. So, a lot of care is taken to remove any "unsafe" code.
4. it is possible to embed some javascript code within style files (some browsers have such holes) ... so care has to be taken there.
5. When Crunchy loads an image from a website, that website could instead send some javascript code - again taking advantage of some browser holes. Usually such problems (4 and 5) are inconsequential: the javascript code is executed in the browser sandbox. However, Crunchy breaks that sandbox to allow communication with the Python backend. PyKata would have the same potential problem ... but it would be much easier to control in my opinion.

Our use on the App Engine would also be different than Crunchy: since we need each page uploaded to be checked once, stored locally, and then it can be served up repeatedly easily.

This is certainly true. And, if I were smarter, I'm sure I could have used some caching mechanism to achieve the same thing when I experimented with the App Engine.

The main point I wanted to allude to when I made my quit about using Crunchy is that, to me, the main issue is to be able to run code securely. If that can be achieved, then the rest is comparatively trivial to achieve. I'm not a proficient programmer ... but, based on my previous experiment with GAE and Crunchy, I bet I could write a "clone" of PyKata as it exists in a couple of days of coding (assuming I could forget about my day job and completely focus on this). It is not a complicated app. The problem(s) I see are 1) securing the server (which I don't know how to do) and 2) if using a safe server like GAE, dealing with its quirks when it comes to updating code samples within the database.

André

If worse came to worse we could queue pages uploaded for lessons, and analyze them, possibly in discrete steps, over time, but this would be a pain.

Andy

--
Andrew N. Harrington
Director of Academic Programs
Computer Science Department
Loyola University Chicago
512B Lewis Towers (office)
Snail mail to Lewis Towers 416
820 North Michigan Avenue
Chicago, Illinois 60611

http://www.cs.luc.edu/~anh
Phone: 312-915-7982
Fax: 312-915-7998
g...@cs.luc.edu for graduate administration
u...@cs.luc.edu for undergrad administration
aha...@luc.edu as professor

Andre Roberge

unread,

Apr 23, 2010, 8:37:47 PM4/23/10

to pyw...@googlegroups.com

On Fri, Apr 23, 2010 at 8:02 PM, David MacQuigg <macq...@box67.com> wrote:

Hi Andy,

I haven't thought this far ahead, but perhaps now is a good time to
start the discussion. What I was thinking, but not even written down
yet, was that we will have certain trusted authors, and only an author
can upload a page to be displayed. That should eliminate any
malicious HTML, so we only have to worry about accidents. Do we need
any other restrictions on authors?

If we need restrictions, we could just have a template that has
variables for problems to be displayed. The author just fills out a
form with the reference numbers of problems he wants displayed in an
array on the page. The size of the array, the names of the problems,
the green check marks, everything is keyed off the list of problems.

Without these restrictions, our task is actually simpler, just provide
a copy of the page template to the author, and let him do whatever he
wants - add his logo, change background colors, even start from
scratch and make a whole new page. Maybe as a matter of policy, we
should keep some standard stuff in the header and footer - navigation
links, etc., but we can decide that later.

And, if one uses some simple markup language (like markdown or reStructuredText) and make sure that no javascript code can be inserted, then the process should be completely safe - there would not be any need to put any restriction on authors.

The main problem is: how do we ensure that the Python code run by users is safe...

André

--
You received this message because you are subscribed to the Google Groups "PyWhip" group.
To post to this group, send email to pyw...@googlegroups.com.
To unsubscribe from this group, send email to pywhip+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pywhip?hl=en.

Andre Roberge

unread,

Apr 23, 2010, 8:39:24 PM4/23/10

to pyw...@googlegroups.com

On Fri, Apr 23, 2010 at 8:10 PM, Andrew Harrington <anharr...@gmail.com> wrote:

A list of trusted authors is one possibility, but we could not know a good author, or an unknown malicious author could send a bunch of good stuff first before zinging us. Crunchy has the code for this. In the short run we can stick with trusted authors, but in a while we should look at what Andre has done. His security info page in impressive, showing the amount of thought his crew has put into this! Andre has reminded us for some time that we have an issue here.

We want generalized lesson content to be allowed, so I do not see a template.

As I mentioned in another message, as long as we can keep javascript code off the template (and links to flash or java applets...), i.e. as long as what is taken is only text, then it should be very simple to do that securely - without following a set template.

André

Reply all

Reply to author

Forward