security and quality

0 views
Skip to first unread message

Andrew Harrington

unread,
Mar 10, 2010, 2:00:38 PM3/10/10
to pyw...@googlegroups.com

Issues of security and quality.

Security

Sean found that the Django code embedded untrusted text into the web page in safe mode, killing all tags, including <pre>. Prudent, but overkill for our purposes. It sounds like we need custom filters for the submitted problem description text. Here are some ideas.

  1. We would not expect all submitters to know RST, but is its translator automatically safe? If so it could be a particularly useful option.

  2. Since we do want <pre> </pre> <PRE>, </PRE>, we could have our filter just allow those particular sequences and convert all other '<' and '>' (? and unicode equivalents??) to the escaped embedded text versions &...; This would kill all other tags. The only markup we would then need to add is that double newlines are converted to <p> or <br><br>. Sean is working on something like this now.

Quality

We have a small site with public contributions. Good public contributions are great for growing the site organically. On the other hand if we get a significant proportion of bad public contributions, we can turn people off from the site in a big way. In my judgment, we should consider this as an issue.

At present there are not a lot of submissions, so I could imagine taking responsibility to somehow, after the fact, remove public links to pretty useless examples. After the fact is likely to cause some PR problems. Another idea is to do it immediately:

When a random person gets a login ID and submits a problem, it is only linked to pages show when s/he is logged in. S/he can let others try the problem by giving the exact problem name/URL. We could also have people be identified as students of somebody, and have the problems appear under some teacher or course URL. Site administrators/editors could manually grant some people higher editor status, where they can set their problems or others to be publicly linked.  Lots of wikis let you preview an addition before making it public.  The extra step of choosing to make a submission public probably makes sense anyway.

Particularly as the user base grows, and vetting problems gets onerous for a small group, another approach is to have editors possibly still recommend/highlight some problems manually, but have all problems go public when the author chooses, but appear in an unvetted category, and use the approach of social networking sites, and let users rate the problems. (I would hope we would not just give one raw number, but split responses by the responder's perceived level of expertise or responder's perceived level of the problem)

Another way to get good problem sequences together, is to allow another layer of uploaded pages, that point to an organized sequence of problems and maybe also interspersed with tutorial chunks. Not sure who we would allow to do this in public fashion, but such suggested sequences would certainly have raised visibility.

I'm not sure what is the best thing to do short or long term, but I would like some way to reduce the highlight on problems that do not aid the reputation of the site, without getting too much bad PR or causing too much work.

--
Andrew N. Harrington
 Director of Academic Programs
 Computer Science Department
 Loyola University Chicago
 512B Lewis Towers (office)
 Snail mail to Lewis Towers 416
 820 North Michigan Avenue
 Chicago, Illinois 60611

http://www.cs.luc.edu/~anh
Phone: 312-915-7982
Fax:    312-915-7998
g...@cs.luc.edu for graduate administration
u...@cs.luc.edu for undergrad administration
aha...@luc.edu as professor

David MacQuigg

unread,
Mar 10, 2010, 5:28:51 PM3/10/10
to PyWhip
Good suggestions, Andy. I need to get busy and put together a list of
requirements for our ultimate website. I'll post that in our Files
section and get more comments.

On the Security issue, I think we need so little formatting that we
could just assume the entire block is pre-formatted plain text. These
are problem descriptions, like you might see in the docstring of a
function. I would make priority one just getting the current form to
show plain text. Down the road we might want to add syntax coloring
(I see that is already in the TextArea widget.) or maybe highlighting
of links. We should never need to deal with arbitrary HTML.

On the Quality issue, you make some very good points. I've been
ignoring the garbage I see posted, assuming that anyone visiting the
site will not take it as our best effort. But you are right, it
leaves bad impression. My suggestion, fairly high priority, will be
to put all submitted problems not on the home page, but in a special
category "new", that is visible only if you click a small link (or
perhaps not even that, if we start getting vandalism).

Editors in each area can review the new submissions, select whatever
they think is good, add appropriate categories, and even do some
editing. I see this working a lot like Citizendium (not so much like
Wikipedia). Editors take an active role. The content is not
determined by who is the most aggressive.

I propose we set up three editorial "areas" in addition to the
individual teacher pages, where teachers can do whatever they want.
1) High School math & science
2) College math & science
3) Professional self study
The main difference is in how much skill and maturity we assume in the
students. My target audience (non-CS professionals) will probably
already know how to write a program, but have never seen Python. The
most important thing here is to get them going quickly on interesting
problems, and avoid the cruft of Java, C++ or whatever language they
may have studied in college. My help files, as you can see from what
I have posted so far, are very brief.

I assume the setup for high school students will be just the
opposite. Students here need a very gentle introduction, like what
Jeff has done in his "How to Think... " book. Also, in a school
setting (both college and high-school) we can assume more of a set
order to the topics, whereas my topics need to be much more stand-
alone - a few introductory modules, then branch off into examples from
physics, engineering, or whatever subject someone wants to specialize
in.

-- Dave.

Andrew Harrington

unread,
Mar 10, 2010, 8:56:51 PM3/10/10
to pyw...@googlegroups.com
David, responses intermingled:

On Wed, Mar 10, 2010 at 4:28 PM, David MacQuigg <macq...@box67.com> wrote:
Good suggestions, Andy.  I need to get busy and put together a list of
requirements for our ultimate website.  I'll post that in our Files
section and get more comments.

On the Security issue, I think we need so little formatting that we
could just assume the entire block is pre-formatted plain text.  These
are problem descriptions, like you might see in the docstring of a
function.  I would make priority one just getting the current form to
show plain text.  Down the road we might want to add syntax coloring
(I see that is already in the TextArea widget.) or maybe highlighting
of links.  We should never need to deal with arbitrary HTML.

I guess you are right.  We can ask people to put in newlines (easy to format a paragraph into lines in Idle for instance)

The part about making it live initially only to you will help there - it you give a full paragraph in an auto wrapping editor, you will see that lots of it goes off the screen.  We might go so far as to encourage people to format to 70 columns or 79 maybe.

 

On the Quality issue, you make some very good points.  I've been
ignoring the garbage I see posted, assuming that anyone visiting the
site will not take it as our best effort.  But you are right, it
leaves bad impression.  My suggestion, fairly high priority, will be
to put all submitted problems not on the home page, but in a special
category "new", that is visible only if you click a small link (or
perhaps not even that, if we start getting vandalism).

Editors in each area can review the new submissions, select whatever
they think is good, add appropriate categories, and even do some
editing.  I see this working a lot like Citizendium (not so much like
Wikipedia).  Editors take an active role.  The content is not
determined by who is the most aggressive.

Teachers, too, can have several direct relations to the site:  1) as someone with permission to see solution status for students who agree, 2) as someone to oversee submission of new problems that get immediately exposed to the teacher's students;  3) as one who can present sequencing or other information pages under their directory.


 
I propose we set up three editorial "areas" in addition to the
individual teacher pages, where teachers can do whatever they want.
1) High School math & science
2) College math & science
3) Professional self study
The main difference is in how much skill and maturity we assume in the
students.  My target audience (non-CS professionals) will probably
already know how to write a program, but have never seen Python.  The
most important thing here is to get them going quickly on interesting
problems, and avoid the cruft of Java, C++ or whatever language they
may have studied in college.  My help files, as you can see from what
I have posted so far, are very brief.

I assume the setup for high school students will be just the
opposite.  Students here need a very gentle introduction, like what
Jeff has done in his "How to Think... " book.  Also, in a school
setting (both college and high-school) we can assume more of a set
order to the topics, whereas my topics need to be much more stand-
alone - a few introductory modules, then branch off into examples from
physics, engineering, or whatever subject someone wants to specialize
in.


Clearly if we are tying into specific math or science, there is a sequence of standard courses we can refer to, that does typically go from HS into college.  On the programming front there is the issue of being a total newbie vs having different levels of experience with programming or computer science, but not Python, and there is a difference in general in the speed that people catch on, reading level....  I'm not sure that it is best to frame those differences in terms of age.   Maybe:

Gentle introduction for total newbies to programming
Accelerated Python introduction for programmers
Very concise presentation of syntax

 If you want to talk more about sequencing and pacing, I've already got another document I discussed some at PyCon, that I was holding back, thinking my brainstorming was getting way ahead of where we are, but I'll post it.

Anyhow for the short term I agree:
  • Assume plaintext descriptions, that get  <pre> added bracketing them, for now
  • Let a submitter see the result on the site before agreeing to release it with a public link.
  • Let a user have a page linking to all problems authored and all solution attempts
  • Allow a user to expose the solution summary page to a teacher.
  • New or unvetted area for newly authored problems?
  • permission levels for editors to expose, and correct(?) other authors' problems

Shortly:

sequencing suggestion pages (I'll send more shortly), under a teacher or in general
Allow teacher domains, with 1) sequencing pages or other information placed by the teacher.  2) also a way to organize student-authored problems - avoid these problem landing immediately in the general new/unvetted problem area.
Reply all
Reply to author
Forward
0 new messages